MakingGamesForAtari2600 Ebook Dec2018
MakingGamesForAtari2600 Ebook Dec2018
ATARI 2600
An 8bitworkshop Book
by Steven Hugg
Sold to
[email protected]
Making Games for the Atari 2600
Copyright ©2016, 2018 by Steven Hugg
Disclaimer
Although every precaution has been taken in the preparation of this
book, the publisher and author assume no responsibility for errors
or omissions, nor is any liability assumed for damages resulting
from the use of the information contained herein. No warranties
of any kind are expressed or implied.
Making Games for the Atari 2600 is an independent publication
and has not been authorized, sponsored, or otherwise approved by
Atari, Inc.
Trademarks
Brands and product names mentioned in this work are trademarks
or service marks of their respective companies. Use of a term in
this work should not be regarded as affecting the validity of any
trademark or service mark.
Inquiries
Please refer all inquiries to [email protected] to
[email protected]
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 Introduction to 6502 . . . . . . . . . . . . . . . . . . . . . 1
1.1 Bits, Bytes, and Binary . . . . . . . . . . . . . . . . . . 2
1.2 Hexadecimal Notation . . . . . . . . . . . . . . . . . . 2
1.3 Signed vs. Unsigned Bytes . . . . . . . . . . . . . . . . 3
1.4 The CPU and the Bus . . . . . . . . . . . . . . . . . . . 4
1.5 Writing Loops . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Condition Flags and Branching . . . . . . . . . . . . . . 10
1.7 Addition and Subtraction . . . . . . . . . . . . . . . . . 12
1.8 The Stack . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9 Logical Operations . . . . . . . . . . . . . . . . . . . . 13
1.10 Shift Operations . . . . . . . . . . . . . . . . . . . . . . 15
8 Color Sprites . . . . . . . . . . . . . . . . . . . . . . . . . . 47
10 Player/Missile Graphics . . . . . . . . . . . . . . . . . . . . 57
14 Indirect Addressing . . . . . . . . . . . . . . . . . . . . . . 70
14.1 Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
14.2 Indirect Indexed Addressing . . . . . . . . . . . . . . . 71
14.3 Indexed Indirect Addressing . . . . . . . . . . . . . . . 72
18 Scoreboard . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
19 Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
28 Multisprites . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
28.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 136
28.2 Position . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
28.3 Display . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
28.4 The Main Loop . . . . . . . . . . . . . . . . . . . . . . . 142
28.5 Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
28.6 Improvements . . . . . . . . . . . . . . . . . . . . . . . 144
37 Paddles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
42 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . 210
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Index . . . . . . . . . . . . . . . . . . . . . . . . . . Sold
. . . to
. . . 229
[email protected]
vi
List of Figures
Sold to
[email protected]
ix
List of Tables
Sold to
[email protected]
xi
Preface
Like many American kids in 1979, I woke up to find that Santa had
left a brand new Atari VCS1 under the tree (thanks, Mom and Dad,
for paying Santa’s invoice!). This was a pretty big deal for a six-
year-old who could tell you the location and manufacturer of every
standup arcade cabinet within a five mile radius. Having an “arcade
in your home” wasn’t just a thing you saw on Silver Spoons, it was
now a real thing.
The sights and sounds that jumped off of our little Panasonic color
TV probably deserve a gigantic run-on sentence worthy of Dylan
Thomas, as my brother and I bounced tiny pixellated missiles off
of walls in Combat, combed through the perplexing game modes
of Space Invaders, battled angry duck-like dragons in Adventure,
and became Superman as we put flickering bad guys in a flickering
jail. These cartridges were opaque obelisks packaged in boxes with
fantastically unattainable illustrations, available at K-Mart for $30
or so.
You could tell these species of video games weren’t related to arcade
games, though they had a unique look-and-feel of their own. We
also had an Apple ][ by this time, so I tried to fit all of these creatures
into a digital taxonomy. Atari games had colors and fast motion,
but not as much as arcade games, and they never were as complex
as Apple ][ games. What made them tick? Why were Activision
games so much more detailed? Would the missile still blow up your
spaceship if you turned the TV off? (Turns out the answer is yes.)
1 It wasn’t sold as “Atari 2600” until 1982. We’ll use “VCS” in this book, which
Sold to
stands for Video Computer System.
[email protected]
xii
An Atari 2600 four-switch "wood veneer" version, dating from
1980-1982 (photo by Evan Amos)
Soon afterwards, I would start dissecting the Apple ][, and never
really got my mitts on the viscera inside those VCS cartridges. It
wasn’t until the Internet came around that I’d discover the TIA chip,
scanlines, and emulators like Stella. I’d also read about the people
who wrote the games, often uncredited, who pushed the envelopes
of both game design and technology while working solo against
impossible deadlines.
It’s now been 37 years since that Christmas morning, and thanks to
the Web, very fast modern CPUs, and lots of open-source sharing,
you can program Atari VCS games in your browser. It’s probably
the most effort you can expend for the fewest number of pixels, but
it’s also really rewarding.
If the modern software ecosystem is a crowded and bureaucratic
megalopolis, programming the VCS is like tinkering in a tiny cabin
in the woods with 10-foot snow drifts outside. At least the stove is
working, and there’s plenty of wood. Enjoy.
Sold to
[email protected]
xiii
1
Introduction to 6502
All digital computers operate on bits and bytes and, on the VCS,
you’ll be manipulating them directly. Let’s review a few things
about them.
A bit is a binary value – it can be either zero (0) or one (1). A byte is
a sequence of eight bits.
We can create a written representation of a byte in binary notation,
which just lists the bits from left to right, for example: %00011011.
We can then shorten the byte notation by removing the leading
zeros, giving us %11011. The % denotes a binary number, and we’ll
use this notation throughout the book.
The eight bits in a byte are not just independent ones and zeros; they
can also express numbers. We assign values to each bit and then
add them up. The least-significant bit, the rightmost (our index
starts at zero, i.e. bit 0), has a value of 1. For each position to the
left, the value increases by a power of two until we reach the most-
significant bit, the leftmost (bit 7) with a value of 128. Here are the
values for an entire byte:
Bit # 7 6 5 4 3 2 1 0
Value 128 64 32 16 8 4 2 1
One more thing about bytes: We’ve described how they can be
interpreted as any value from 0 through 255, or an unsigned value.
We can also interpret them as negative or signed quantities.
Sold to
[email protected]
3
This requires a trick known as two’s complement arithmetic. If the
high bit is 1 (in other words, if the unsigned value is 128 or greater),
we treat the value as negative, as if we had subtracted 256 from it:
Note that there’s nothing in the byte identifying it as signed – it’s all
in how you interpret it, as we’ll see later.
Now that we know what bits and bytes are, let’s see how the CPU
manipulates them.
Sold to
[email protected]
4
Fetch Read Memory[PC++]
Result: $88
"no operand"
Decode "decrement register"
"Y register"
Execute Y = Y - 1
During each clock cycle, the CPU can read from or write to the bus.
The bus is a set of “lanes” where each lane can hold a single bit at
a time. The 6502 is an 8-bit processor, so the data bus is eight bits
(one byte) wide.
Devices like memory and graphics chips are attached to the bus,
and receive read and write signals. The CPU doesn’t know which
devices are connected to the bus – all it knows is that it either
receives eight bits back from a read, or sends eight bits out into
the world during a write.
6502 CPU
Data Address
Bus Bus
Besides the 8-bit data bus, the 6502 has a 16-bit address bus. The
address bus describes “where” and the data bus describes “what.”
Let’s look at what happens when the CPU executes this example
instruction, LDA (LoaD A):
lda $1234
Sold to
[email protected]
5
The CPU will set the pins on the address bus to the binary encoding
for $1234, set the read/write pin to “read,” and wait for a response
on the data bus. Devices on the bus look at the address $1234 and
determine whether the message is for them – by design, only one
device should respond. The CPU then reads the value from the data
bus and puts it in the A register.
Let’s say we are executing the STA instruction (STore A):
sta $1234
The CPU will set the address bus to $1234 and the data bus to
whatever is in the A register, then set the read/write pin to “write.”
Again, the bus devices look at the address bus and the write signal
and decide if they should listen or ignore it. Let’s say a memory
chip responds – the memory chip would read the 8-bit value off the
data bus and store it in the memory cell corresponding to address
$1234. The CPU does not get a response from a write; it just assumes
everything worked out fine.
You’ll note that both of these instructions operate on the A register.
The 6502 has three general-purpose registers: A, X, and Y. These
are all 8-bit variables that you can manipulate at will. You’ll often
have to use the registers as temporary storage, for instance: Load a
constant value into A, then store A to a given address.
You’ll notice that the CPU instructions have a three-letter format.
This is called a mnemonic, and it’s part of the human-readable
language used by the CPU, called assembly language. The CPU
doesn’t understand this, but it understands a compact code called
machine code. A program called an assembler takes the human-
readable assembly code and produces machine code.
Let’s take another example instruction:
The machine code for this instruction is three bytes, $ad, $34, and
$12. $ad is the opcode which identifies the instruction and addressing
mode. $34 and $12 are part of the operand, whichSold to case is
in this
[email protected]
6
a 16-bit number spanning two bytes. You’ll note that the $34 is
first and the $12 is second – this is because the 6502 is a little-
endian processor, expecting the least-significant parts of multibyte
quantities first.
Cycle CPU Address Data
Bus Bus
0 $F000
$AD
LDA $####
2
$F1
LDA $1234
lda #0 ; A <- 0
ldy #$7F ; Y <- 127
Loop sta $100,y ; store A in [$100+y]
dey ; decrement Y, set flags
bne Loop ; repeat until Y == 0
lda #0
ldy #$80 ; Y <- 128
Loop dey ; set flags
sta $100,y ; does not modify flags
bne Loop ; repeat while Y != 0
Since STA does not modify any flags, we can DEY first (which does
modify flags) and then exit the loop when Y==0 rather than Y<0.
There will be lots of opportunities to tweak loops like this for
optimal performance, and VCS programming often demands it.
Sold to
[email protected]
9
We could also count upwards from zero using the CPY (ComPare Y)
instruction:
lda #0
tay ; Y <- 0
Loop sta $100,y
iny
cpy #$80 ; set flags as if (Y - 128)
bne Loop ; branch until Y == 128
We’ve covered the Z (Zero) flag already, but there are others. Here’s
the list of condition flags you’ll be using most often:
Flag Name Description
Z Zero Set when the result is zero.
N Negative/Sign Set when the result is negative
(high bit set).
C Carry Set when an arithmetic operation
wraps and carries the high bit.
V Overflow Set when an arithmetic operation
overflows; i.e. if the sign of the
result changes due to overflow.
Table 1.3: Condition Flags
A lot of instructions just set the Zero and Negative flags, which
makes it easy to test for zero values or to test the high bit. The
Carry flag is set by compare, add, subtract, and shift operations.
The Overflow bit is less commonly used than the Carry bit, but
Sold
it’s worth explaining the difference between wrapping andtooverflow.
[email protected]
10
When we say a value wraps, we mean that an operation exceeds the
boundaries of its byte and the result is truncated. So if you add $01
to $FF, you’ll wrap around to $00.
Overflow is set when the result of a addition or subtraction changes
its sign – for example, $40 + $40 = $80 which overflows because $80
is a negative number in two’s complement representation. If you
are using unsigned numbers, you can generally ignore this flag.
Mnem. Description Flag Test Condition
BNE Not Equal Zero clear A != B
BEQ Equal Zero set A == B
BCC Carry Clear Carry clear A < B (unsigned)
BCS Carry Set Carry set A ≥ B (unsigned)
BMI Minus Negative set A < B (signed)
BPL Plus Negative clear A ≥ B (signed)
BVC Overflow clear no signed overflow
BVS Overflow set signed overflow
JMP Jump — always taken
The JMP instruction doesn’t test any flags but just moves the PC
directly to the target. The branch instructions can only modify the
PC by -128 to +127 bytes, so for longer distances you’ll need JMP.
It’s good to memorize the BCC (less than) and BCS (greater than or
equal) instructions, since these are used often. Also note that the
BPL and BMI instructions are the same for signed quantities, so we
could use them to stop when a value goes negative, like this:
lda #0 ; A <- 0
ldy #$7F ; Y <- 127
Loop sta $100,y ; store A in [$100+y]
dey ; decrement Y, set flags
bpl Loop ; repeat until signed(Y) < 0
Note that this technique would not work if we started with Y = $81
or higher, because the first DEY would result in a negative number,
exiting the loop on the first iteration! Sold to
[email protected]
11
1.7 Addition and Subtraction
There’s no INC or DEC for the A register, but you can add or subtract
the A register to/from another memory location or constant. ADC
adds, and SBC subtracts. An example of addition:
Note the CLC (Clear Carry Flag) instruction. The ADC instruction adds
the Carry flag to the result (0 or 1) so usually it must be cleared
before addition. For subtraction, it must be set first using SEC (Set
Carry Flag):
The “logical" instructions combine the bits of the A register and the
operand, performing a bit (logic) operation on each bit.
AND A&B Set bit if A and B are set.
ORA A|B Set bit if A or B (or both) are set.
EOR AˆB Set bit if either A or B are set, but not both
(exclusive-or).
BIT A&B Same as AND, but just set flags and throw
away the result.
Sold to
[email protected]
13
For example, let’s combine $55 and $f0 with the AND operation:
lda #$55
and #$f0
For AND, if a bit was set in both the A register and the operand, it’ll
be set in A after the instruction executes:
$55 01010101
AND $f0 11110000
---------------------
$50 01010000
The AND operation is useful for limiting the range of a value. For
example, AND #$1F is the same as (A mod 32), and the result will have
a range of 0..31.
What if we did an ORA instead?
$55 01010101
ORA $f0 11110000
---------------------
$f5 11110101
ORA sets bits if they are set in either A or the operand, i.e. unless they
are clear in both.
What about an EOR?
$55 01010101
EOR $f0 11110000
---------------------
$a5 10100101
EOR(exclusive-or) is like an OR, except that bits that are set in both A
and the operand are cleared. Note that if we do the same EOR twice,
we get the original value back.
Sold to
[email protected]
14
1.10 Shift Operations
ASL Shift Left Shift left 1 bit (multiply by 2), bit 7 → Carry
LSR Shift Right Shift right 1 bit (divide by 2), bit 0 → Carry
ROL Rotate Left Same as ASL except Carry → bit 0
ROR Rotate Right Same as LSR except Carry → bit 7
Table 1.6: Shift and rotate instructions
There is also the family of “shift" operations that move bits left and
right by one position within a byte. The bit that is shifted off the
edge of the byte (i.e. the high bit for shift left, and the low bit for
shift right) gets put into the Carry flag.
The “rotate” operations are similar, but they also shift the previous
Carry flag into the other end of the byte. So for rotate left, the
Carry flag is copied into the rightmost (low) bit. For rotate right,
it’s copied into the leftmost (high) bit.
Example of ASL (shift left):
lda #$83
asl ; shift left
$83 10000011
ASL -> $06 00000110 C
[Carry] [7] [6] [5] [4] [3] [2] [1] [0] 0 ASL (Shift Left)
0 [7] [6] [5] [4] [3] [2] [1] [0] [Carry] LSR (Shift Right)
[Carry]
ROL (Rotate Left)
[7] [6] [5] [4] [3] [2] [1] [0]
[Carry]
ROR (Rotate Right)
[7] [6] [5] [4] [3] [2] [1] [0]
Sold to
Figure 1.4: [email protected]
and rotate bit flow
15
Another example, this time of ROR (rotate right):
lda #$03
sec ; set carry flag
ror ; rotate right
ror ; rotate right
ror ; rotate right
Note that we SEC to set the carry first. Here’s the result:
$03 00000011 C
ROR -> $81 10000001 C
ROR -> $81 11000000 C
ROR -> $81 11100000
Note that if you ROL or ROR nine times in succession, you’d have the
original byte.
Now that you have a working knowledge of the 6502, we’ll use an
online tool to program it in the next chapter.
Sold to
[email protected]
16
2
The 8bitworkshop IDE
Assembler
Emulator
In this chapter, we’ll discuss the tools we’ll use to develop and test
our game code. These tools comprise our interactive development
environment, or IDE.
To start the IDE, visit http://8bitworkshop.com/ in a web browser that
supports Javascript (for best results, use a recent version of Google
Chrome, Mozilla Firefox, or Apple Safari).
The IDE includes an emulator which simulates the game console
hardware. The emulator we use is called Javatari by Paulo Augusto
Peccin[2]. It runs in a web browser, and attempts to simulate
the 6502 and all of the VCS hardware cycle-by-cycle as if it were
connected to a TV monitor.
The other tool is an assembler. The one we use is called DASM[3] and
also runs in the web browser, along with a web-based text editor.
Each time you make a change to the code, the IDE immediately
assembles it and then sends the final ROM image to the VCS
emulator, allowing you to see code changes instantly.
The last tool is a simple debugger that allows you step through 6502
instructions, view memory, and start and stop the program.
Sold to
[email protected]
18
Figure 2.3: IDE Pulldown
The IDE is packaged with several example 6502 assembly files, each
roughly corresponding to a chapter in this book. At the top left
of the screen, you can access a pulldown menu that allows you to
select a file to load. You can edit these files as much as you want – all
changes are persisted and they’ll be there if you close the browser
tab and come back. To reset and fully clear your changes, select
Revert To Original in the menu.
The buttons at the top of the screen perform several debugging
functions:
$80: 46 A8 34 8D E6 F2 01 FF 40 00 00 00 01 10 FF FF
$90: FF FF FF FF FF FF FF FF FF EF FF FF FF FF FF FF
$A0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
$B0: FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 0F 7F F2
You can click on the Settings icon in the lower-right of the emulator
window to display keyboard shortcuts. You usually have to click
on the emulator before using them. There are a few that are
particularly useful during development (Note: On Macs, Ctrl might
be Option/Alt):
Ctrl-G: Displays the number of scanlines drawn in the current
frame. This can vary between frames, and as we’ll see in upcoming
chapters, it’s important to make this a stable value (around 262).
Ctrl-D: Toggles between debug modes, which displays different
colors for various game objects.
Ctrl-C: Enable/disable collisions.
Sold to
[email protected]
21
3
VCS Memory Map
One vital aspect of the VCS that we must cover is where things are
located in address space. “Where” means at which addresses. Due
to its 16-bit address bus, there are 65,536 (216 ) possible addresses
that the 6502 can access. Most of those addresses are unused on the
VCS.
There are three components connected to the VCS bus:
• TIA (Television Interface Adapter) - The main video and
sound chip.
• PIA (Peripheral Interface Adapter) - RAM, timers, and con-
troller input.
• ROM (Read Only Memory) - The 6502 program code included
on the game cartridge.
Each component is responsible for handling read and write com-
mands for a range of addresses. We organize these addresses into
a memory map so that we can easily remember which addresses
correspond to which component. Figure 3.1 provides an overview
of this address space breakdown; a more detailed list is in Appendix
A: VCS Memory Map.
3.1 Equates
To use it, you just include the vcs.h file, which should be done in
pretty much every VCS program you write. For example:
include "vcs.h"
org $f000
lda #$ff ; pale yellow
sta COLUBK ; change background color
But this would load the value at address $5, not the number 5! So
we add a “#” to let the assembler know that StartLives should be
treated as a constant:
3.2 Segments
seg.u Variables
org $80
The assembler will reserve one byte at $80 for the DataByte variable,
one word (two bytes) at $81 for the DataWord variable, and 20 bytes
for the DataArray variable starting at $83.
This is often more convenient and foolproof than separate EQU
instructions, since the assembler ensures that variables do not
overlap in memory. Sometimes, though, you want multiple labels
to reference the same memory location – you can use EQU for that:
Sold to
[email protected]
24
Our “uninitialized” segment just reserves space, it doesn’t let you
generate code. When we’re ready to write code, we’ll declare an
initialized segment called Code (the name isn’t important). Then use
the ORG directive to set the code’s origin. This tells the assembler that
our code will start at a certain address. Our generated machine code
is not relocatable, which means that it must be loaded at a certain
address to work properly. For the VCS, that address starts at $f000:
seg Code
org $f000 ; start code at $f000
Note that because the VCS only has 13 address pins, and only
recognizes 8,192 ($2000) unique addresses, you could actually
declare the origin as $1000, $3000, $5000, etc. This is trivia at this
point, but we’ll revisit it later when we learn how to use multiple
memory banks.
Sold to
[email protected]
25
4
Writing Your First Assembly Code
We now know enough to write our first VCS ROM in 6502 as-
sembler. It won’t do very much at first – we’ll just draw some
lines on the screen, but it’ll introduce some key concepts we’ll use
throughout the rest of the book.
Our first line declares to the assembler that we are writing code for
a 6502. This line is actually optional in the Web IDE because we
add it automatically, but we’ll include it here for completeness:
processor 6502
Next, we’ll include some header files. There are a few standard
files commonly used in VCS programming: vcs.h provides names
for all of the hardware addresses you’ll need and macro.h defines
a few macros – templates of commonly-used functions that can be
included as needed.
include "vcs.h"
include "macro.h"
seg Code
org $f000 ; start code at $f000
(Oddly, most Atari 2600 cartridges from back in the day have a SEI
instruction at the beginning to disable interrupts even though the
interrupt pin is not even exposed on the 6509 chip in the console.
Maybe it’s a fear of spurious voltages on the pin, or maybe just a
superstition...who knows. Anyway, it’s just one byte.)
Next, we want to make sure the memory and the hardware is reset
to a known state, since in the “real world” (i.e. non-emulator), it
will be more or less in a random state. The easy way to do this is to
set the entire zero page region ($00-$FF) to zero, which includes the
entire TIA register space and the entire RAM space:
(Note: We could have left out the LDX #$ff since previous instruc-
tions have already set X to that value.)
The TIA chip doesn’t mind having all of its registers set to zero,
and will respond by generating an utterly black screen. VCS
programming is mainly about setting various TIA registers at the
appropriate time. For instance, we’ll tell it to make the background
color red:
Sold to
[email protected]
27
Normally, we’d do a lot more here, but since this is our first
program, we’ll make it short. We’ll tell the CPU to return to the
start (literally the label Start) and do everything all over again.
jmp Start
Finally, we’ll use the ORG directive again so we can do two things:
Fill out the ROM size to exactly 4K (4096 or $1000) bytes in size,
and tell the 6502 where our program will start. When the 6502
is reset, it reads a 16-bit address from location $FFFC/$FFFD and sets
the instruction pointer there. The .WORD directive will emit that two-
byte address verbatim:
org $fffc
.word Start ; reset vector at $fffc
.word Start ; interrupt vector at $fffe (unused in
VCS)
The second Start vector is used for interrupts, which the VCS
doesn’t use, but we include it anyway so that our ROM is exactly
the right size.
What do we see when we load this program? We should see
alternating thick black and white horizontal lines. This is because
we spent some time setting all of the TIA registers to zero, which
made the output black, then we set the background color to red,
then repeated the process forever. We never instructed the TIA to
tell the TV where the frame begins! We’ll correct this in the next
chapter.
Sold to
[email protected]
28
5
Painting on the CRT
262
s
c
a
n VISIBLE
l HORIZ.
i BLANK FRAME 192 lines
n
e
s
OVERSCAN 30 lines
Enough talk, let’s make some rainbows! We start with the same
preamble as last time:
include "vcs.h"
include "macro.h"
org $f000
We’re also going to use one of our predefined macros to save some
typing. Macros are code “templates” that are expanded on-demand.
You can define your own, but some are predefined in the above
macro.h file.
Start
CLEAN_START Soldand
; macro to safely clear memory toTIA
[email protected]
33
Macros are expanded inside of your code and take up additional
ROM space. They may also modify registers or flags – so be aware
of this when using them.
Now we’re ready to start outputting a frame! Because we’ll visit this
routine repeatedly, we’ll also give it a label. The first thing we do is
enable the VBLANK and VSYNC bits in the TIA:
NextFrame
lda #2 ; same as binary #%00000010
sta VBLANK ; turn on VBLANK
sta VSYNC ; turn on VSYNC
Now that we’re emitting a VSYNC signal, we need to hold it for three
scanlines. We strobe this register (i.e., write to it) to make it halt the
CPU until the next scanline begins. If we do this three times, the
TIA will have generated our three lines of VSYNC signal and can then
turn off the VSYNC bit:
WSYNC doesn’t care which value is stored – it triggers the CPU to wait
as soon as it receives any write command. A register that triggers
an action like this is commonly called a strobe register.
We’ll now let the TIA output the recommended 37 lines of VBLANK.
The TIA’s VBLANK bit is still set, so we’ll just loop 37 times, hitting
WSYNC each time. We use the X register to count down the number of
scanlines:
Sold to
[email protected]
34
At this point, we’re now ready to start drawing to the screen. Let’s
first disable VBLANK which releases the TIA to generate some color:
lda #0
sta VBLANK ; turn off VBLANK
Next, we’ll draw our 192 visible scanlines. We’ll use X again to
count. We’ll also load Y with the BGColor variable before the loop,
incrementing it each loop iteration. This will paint a different color
for each scanline, creating a venetian blind rainbow effect:
lda #2
sta VBLANK ; turn on VBLANK again
ldx #30 ; count 30 scanlines
LVOver
sta WSYNC ; wait for next scanline
dex ; decrement X
bne LVOver ; loop while X != 0
For the next frame, we’ll decrement the BGColor variable so that the
colors animate down the screen.
dec BGColor
jmp NextFrame
Sold to
[email protected]
35
We finish with the standard epilogue, as described in the previous
chapter:
org $fffc
.word Start
.word Start
Sold to
[email protected]
36
6
Playfield Graphics
Like everything else the TIA draws, the playfield has to be pro-
grammed line-by-line. If nothing changes, the TIA will just repeat
what was on the previous line. You can change theSold
colorsto and the
[email protected]
37
NORMAL PLAYFIELD MODE
Bit # 4 5 6 7 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 4 5 6 7 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7
Pixel 0 4 12 20 24 32 39
Bit # 4 5 6 7 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 7 6 5 4
Pixel 0 4 12 20 28 36 39
playfield bits at the beginning of each line if you like (or during, but
let’s not discuss that now!).
Here, we’re just incrementing a counter and loading the value into
each of the playfield registers. The result will look kind of like an
arch (see Figure 6.2).
Note that we set all of the TIA registers immediately after the WSYNC
strobe. We only have a limited number of cycles before the beam
moves out of the HBLANK region and onto the visible part of the
screen. This loop is relatively simple, but for some displays, it will
be very challenging to set all of the registers we need to before the
scanline begins drawing. Sometimes you might hear such video
display code called a kernel, denoting a small but well-optimized
routine that is timing-sensitive. Sold to
[email protected]
38
Figure 6.2: Example Symmetric Playfield
Sold to
[email protected]
39
Coordinates
It’s common on graphic displays to call the horizontal position
the X coordinate, and the vertical position the Y coordinate.
The X coordinate almost always goes left-to-right. The Y
coordinate usually goes from top-to-bottom, but there will be
times when it’ll be more efficient to make Y go from bottom-to-
top. Sometimes we’ll even use both coordinate systems!
Y DISPLAY Y
0 X
Figure 6.3: XY Coordinate Systems
Sold to
[email protected]
40
7
Players and Sprites
Now that we know how to draw the playfield, which usually serves
as the background, let’s draw some more detailed objects in the
foreground.
The VCS predates Pac-Man and Space Invaders, and so it was
designed with two particular 70s-era arcade games in mind: Pong
and Tank. These were both games with very simple monochrome
graphics and few moving objects. In Pong, a square ball is bounced
between two rectangular paddles. In Tank, two rotating tanks fire
at each other in a blocky playfield. The terminology for VCS’s
moveable objects – players, missiles, and ball – seem directly inspired
by those games.
This chapter covers the moveable objects called players. The TIA
supports two player objects, each eight pixels wideSold
andto
one pixel
[email protected]
41
high. They can be positioned anywhere horizontally on the scanline
and the TIA remembers their position.
Hex Bits Used
Addr Name 76543210 Description
06 COLUP0 xxxxxxx. Color-Luminance Player/Mis-
sile 0
07 COLUP1 xxxxxxx. Color-Luminance Player/Mis-
sile 1
10 RESP0 strobe Reset Player 0
11 RESP1 strobe Reset Player 1
1B GRP0 xxxxxxxx Graphics Bitmap Player 0
1C GRP1 xxxxxxxx Graphics Bitmap Player 1
Table 7.1: Player Registers
Note that we never said “sprites,” since the term had not yet been
invented! But you can draw sprites with the player objects by
changing registers on successive scanlines, stacking up horizontal
8-pixel slices. Going forward, we’re going to use player objects when
discussing the TIA hardware, and sprites when discussing CPU
routines that program the player registers on multiple scanlines.
The basic recipe for putting a player object on the screen:
1. Wait for the start of a scanline (do a WSYNC).
2. Set the player’s bitmap register for the current scanline.
3. Set the player’s color register (optional).
Like just about everything else in the TIA, the values you set persist
across scanlines unless you change them. So if you don’t need the
player’s color to vary line-by-line, you can set it before the frame
starts. You must also set the player’s bitmap to zero after the sprite
has finished drawing, or you’ll get a big smear of pixels going down
the screen.
Sold to
[email protected]
42
Here’s a simple example of a sprite routine that pulls 16 bytes of
data from an array named SpriteData in decreasing order:
In this routine, the sprite begins drawing on the next scanline from
wherever the TIA is currently drawing. It then goes through 16
scanlines before exiting the loop. At the start of each scanline, it
sets the GRP0 register, which changes the bitmap for the player 0
object.
Note that all of our instructions take place in the HBLANK period at the
beginning of each scanline. This guarantees that the player registers
will be set by the time the TIA gets to the visible part of the scanline,
no matter where the player is positioned horizontally.
ldx #5
sta WSYNC ; wait for scanline start
.loop
dex
bne .loop ; loop 5 times, 5 CPU cycles each
sta RESP0 ; fix player 0 horizontal position
Sold to
[email protected]
44
Let’s count both CPU cycles and the TIA clock as we execute each
instruction:
Instruction Cycles CPU TIA X Coord.
sta WSYNC 0 0 -
dex 2 2 6 -
bne .loop 3 5 15 -
dex 2 7 21 -
bne .loop 3 10 30 -
dex 2 12 36 -
bne .loop 3 15 45 -
dex 2 17 51 -
bne .loop 3 20 60 -
dex 2 22 66 -
bne .loop 2 24 72 4
sta RESP0 3 27 81 13
Table 7.2: Example Timing of Horizontal Positioning Loop
Between the STA WSYNC and the end of the STA RESP0, we’ve used up
27 CPU cycles, or 81 TIA color clocks. So on the next scanline, the
horizontal coordinate of the player will be (81−68) = 13 pixels from
the left.
sta WSYNC
dex
bne .loop
dex
bne .loop
dex
bne .loop
dex
bne .loop
dex
bne .loop
sta RESP0
You might notice that the DEX/BNE loop takes 5 CPU cycles per
iteration, which means 15 pixels will pass between each iteration.
This means we can only position objects in 15-pixel increments
using this method. This would lead to very jerky motion! The TIA
Sold to
[email protected]
45
designers accounted for this, and we’ll learn how to do better in the
next chapter.
Sold to
[email protected]
46
8
Color Sprites
Now that we know that player objects can be used to create sprites
on the VCS, and how to position them, let’s draw some sprites!
Remember that we have to program the TIA on a line-by-line basis.
It’s the same for sprites. There are many ways to go about it
depending on how detailed you want the sprite to be and how much
CPU time you have in a given scanline.
Typically, games include a lookup table containing the bytes that
define the on bits for each horizontal slice of the sprite. These go
directly into the GRP0 register for each successive scanline.
Often, there will also be a color table containing the colors for each
scanline. Sometimes this isn’t used – especially in older games
where it was common to have monochrome objects where the colors
were set at the beginning of the game or before each frame.
When these tables map one table entry to one scanline, they are
called “single-height” sprites. When a single table entry is used for
two successive scanlines, they are deemed “double-height” sprites.
Sometimes the bitmap table is single-height and the color table is
double-height. You’re writing the code, so it’s your call!
The height of a sprite is only limited by ROM memory; it can take
up an entire vertical screen column. It can be hard-coded or pulled
from a lookup table.
Let’s look at one example routine. We’re going toSold hard-code
to the
height of the sprite as a constant:[email protected]
47
SpriteHeight equ 17
Our sprite is really 16 scanlines high, but we’re going to add one
line for padding. The padding is a zero entry, and serves to clear
the TIA register when we’ve finished drawing the sprite.
We also define a variable YPos, which holds the Y coordinate of the
sprite:
lda #5
sta YPos ; YPos = 5
This places the sprite’s feet five scanlines from the bottom.
Our routine begins right after the 37-line VBLANK period:
LVScan
txa ; transfer X to A
sec ; make sure carry is set Soldto
sbc YPos ; subtract sprite Y coordinate
[email protected]
48
The SBC instruction subtracts its operand from the A register and
puts the result back in A. Note that we set the Carry flag with SEC
first – if it was addition, we would have cleared it with CLC.
Lookup
Index
16 __@@@___
15 @@@@@@@_
14 _@@@@@__
13 _@_@_@__
12 _@@@@@__
11 _@@@@@__
10 _@___@__
9 __@@@___
8 __@@@___
7 _@@@@@__
6 @_@@@_@_
5 @_@@@_@_
4 __@@@___
3 __@_@___
2 __@_@___
1 _@@_@@__
0 ________ YPos
Sold to
[email protected]
49
If the local coordinate is within the sprite’s vertical bounds, we keep
it as the index into the lookup table. Otherwise we set it to zero,
which loads the blank padding entry:
Now that we have our sprite index in the A register, we have to load
the sprite bitmap data from the lookup table. The A register can’t
index, so we transfer it into Y to perform the lookup:
InSprite
tay
lda Frame0,y ; load bitmap data
Next, we store it to the TIA register GRP0, which defines the pixels
for player 0. We do a STA WSYNC first so that this happens in the initial
HBLANK period of the scanline:
We can also look up a color entry for each line and set the player’s
COLUP0 register, which gives us a multicolored sprite:
After this, we just decrement X and repeat the loop until we have
completed the 192 scanlines:
dex ; decrement X
bne LVScan ; repeat next scanline until finished
Sold to
[email protected]
50
The bitmap and color tables are included in the program ROM,
usually with .BYTE or HEX directives. See Figure 8.2 for an example of
a 9-line (including 0 padding) sprite with bitmap and color tables.
These are defined in bottom-to-top order because that’s the way the
subtraction works out in the routine.
You can create your own sprites with a nifty web-based tool,
kirkjerk’s PlayerPal[5] .
Sold to
[email protected]
51
9
Sprite Fine Positioning
Sold to
[email protected]
52
First, we wait for the scanline to start, and strobe HMCLR which resets
any previous fine offsets that were pending:
DivideLoop
sbc #15 ; subtract 15
bcs DivideLoop ; branch while Carry still set
For each loop iteration, the SBC takes two CPU cycles, and the
BCS (Branch if Carry Set) takes three CPU cycles (two on the final
iteration). The TIA runs three times faster than the CPU, so it
moves (2 + 3) ∗ 3 = 15 color clocks (pixels) per loop iteration. We
also subtract this number from the A register during each iteration.
Sold to
[email protected]
53
As soon as A goes below zero, the loop ends and we’re left with a
remainder in the A register. We use this value to calculate a fine
offset that will correct the player position:
That tricky calculation with EOR and ASL converts the remainder into
a value appropriate for the horizontal motion register:
LEFT RIGHT
-7 -6 -5 -4 -3 -2 -1 +0 +1 +2 +3 +4 +5 +6 +7 +8 PIXELS
70 60 50 40 30 20 10 00 F0 E0 D0 C0 B0 A0 90 80
Now let’s fix the coarse position of the player, which as you
remember is solely based on timing. If you rearrange any of the
previous instructions, position 0 may not line up exactly on the left
side. (We’ll show different versions of this loop in future chapters
where this doesn’t apply.)
At this point, we’ve set the coarse position of the player, and we’ve
set the fine offset in the HMP0 register. But the fine offset isn’t applied
until you do another WSYNC and then strobe the HMOVE register:
sta WSYNC
sta HMOVE
The HMOVE strobe must be right after the WSYNC or funny things
happen.
There’s no requirement that HMP0 be changed before RESP0, but we
do it this way because the timing works out right. You can certainly
rewrite this routine to suit different purposes – and we will, later
on.
The HMOVE strobe applies fine offsets to all objects, so we could
set the position of several moveable objects and then only do the
STA WSYNC and STA HMOVE at the end of this process. It’s common to
do horizontal positioning in the off-screen VBLANK period since it
usually takes the CPU’s full attention for at least one scanline per
object.
There are other ways to perform horizontal positioning besides the
divide-by-15 trick – one early technique used by Raiders of the Lost
Ark (and copied by several 3rd party carts) is to use a lookup table
that stores the loop delay and the HMOVE register value in the same
byte. But this uses a lot more ROM space.
Sold to
[email protected]
55
Hex Bits Used
Addr Name 76543210 Description
20 HMP0 xxxx.... Horizontal Motion Player 0
21 HMP1 xxxx.... Horizontal Motion Player 1
22 HMM0 xxxx.... Horizontal Motion Missile 0
23 HMM1 xxxx.... Horizontal Motion Missile 1
24 HMBL xxxx.... Horizontal Motion Ball
2A HMOVE strobe Apply Horizontal Motion (fine offsets)
2B HMCLR strobe Clear Horizontal Motion Registers
Table 9.1: Horizontal Motion Registers
Sold to
[email protected]
56
10
Player/Missile Graphics
Besides the two 8x1 sprites (players), the TIA has two missiles and
one ball, which are just variable-length dots or dashes. They are
similar to the player objects, except instead of an arbitrary 8-pixel
bitmap, they are single dots that can be stretched to 1, 2, 4, or 8
pixels wide (using the NUSIZ0/NUSIZ1 registers, as we’ll see in Chapter
17).
These objects share the colors of other objects. Missile 0 shares
player 0’s color, and missile 1 shares player 1’s color. The ball shares
the same colors as the playfield.
You set the horiziontal position exactly the same way you set the
player objects using the RESM0/RESM1 and RESBL registers. But instead
of setting a bitmap register, you just turn them on and off with the
ENAM0/ENAM1 registers.
Missiles have one additional special ability – you can lock their
horizontal position to that of their corresponding player by setting
the 2nd bit of RESMP0/RESMP1. For example, you could set and then
clear this bit whenever the fire button is pressed, so that a missile
originates from the player’s position.
Sold to
[email protected]
57
Hex Bits Used
Addr Name 76543210 Description
06 COLUP0 xxxxxxx. Color-Luminance Player/Mis-
sile 0
07 COLUP1 xxxxxxx. Color-Luminance Player/Mis-
sile 1
08 COLUPF xxxxxxx. Color-Luminance
Playfield/Ball
12 RESM0 strobe Reset Missile 0
13 RESM1 strobe Reset Missile 1
14 RESBL strobe Reset Ball
1D ENAM0 ......x. Enable Missile 0
1E ENAM1 ......x. Enable Missile 1
1F ENABL ......x. Enable Ball
22 HMM0 xxxx.... Horizontal Motion Missile 0
23 HMM1 xxxx.... Horizontal Motion Missile 1
24 HMBL xxxx.... Horizontal Motion Ball
28 RESMP0 ......x. Reset Missile 0 to Player 0
29 RESMP1 ......x. Reset Missile 1 to Player 1
Table 10.1: Registers for Missiles and Ball
Sold to
[email protected]
58
11
The SetHorizPos Subroutine
Note that the label .DivideLoop begins with a “.” – this is called a
local label. Local labels are only accessible within the subroutine.
This ensures that label names don’t collide across subroutines.
To use the subroutine:
1. Load the X register with the index from 0-4 of the object you
wish to set (see comment section above).
2. Load the A register with the desired horizontal position.
3. Call the subroutine with JSR SetHorizPos.
4. Repeat steps 1-3 for other objects that need positioning.
5. To apply the fine offsets, do a STA WSYNC followed by STA HMOVE.
For example, to set the X-coordinate of player object 0 to 70, we’d
do the following:
lda #70
ldx #0
jsr SetHorizPos
sta WSYNC
sta HMOVE
Also don’t forget that HMOVE updates the position of all moveable
objects, so you might need to strobe HMCLR or zero out any unwanted
HMxx registers individually.
Sold to
[email protected]
61
12
The PIA Timer
WaitForTimerDone
lda INTIM ; load timer value
bne WaitForTimerDone ; wait until == 0
include "xmacro.h"
TIMER_WAIT
lda #0
sta WSYNC ; wait for end of scanline
sta VBLANK ; turn off VBLANK
Sold to
[email protected]
64
processor 6502
include "vcs.h"
include "macro.h"
include "xmacro.h"
org $f000
Start
CLEAN_START
NextFrame
; 1 + 3 + 37 + 192 + 29 = 262 scanlines
VERTICAL_SYNC ; 1 VBLANK + 3 lines VSYNC
TIMER_SETUP 37 ; 37 VBLANK
TIMER_WAIT
TIMER_SETUP 192 ; 192 visible scanlines
TIMER_WAIT
TIMER_SETUP 29 ; 29 VBLANK
TIMER_WAIT
jmp NextFrame
org $fffc
.word Start
.word Start
Sold to
[email protected]
65
13
Joysticks and Switches
This saves you one instruction and keeps you from having to modify
the A register.
Many games only check the reset switch and, in fact, there’s a
shortcut you can use to do this. Just put the following in your
main loop (assuming you’re within 128 bytes of the Start routine,
Sold
otherwise it’s too far for a branch and you’ll need a JMP): to
[email protected]
67
lsr SWCHB ; shift bit 0 -> Carry
bcc Start ; Carry clear?
This reads the SWCHB byte and shifts it right, which moves bit 0 (the
Game Reset bit) into the Carry bit. It’ll also perform a write, but
it’ll be ignored. We then branch back to the Start label if the Carry
bit is clear (which means the switch is depressed).
13.2 Joysticks
Joystick switches work the same way as the console switches. The
directions are read from SWCHA and are mapped to bits as follows
(0 = moved, 1 = not moved).
Bit # Bitmask Direction Player
7 80 right 0
6 40 left 0
5 20 down 0
4 10 up 0
3 08 right 1
2 04 left 1
1 02 down 1
0 01 up 1
Table 13.3: SWCHA Switches
ldx XPos0
bit SWCHA
bvs .SkipMoveLeft ; check bit 6 set
dex
.SkipMoveLeft
bit SWCHA
bmi .SkipMoveRight ; check bit 7 set
inx
.SkipMoveRight
stx XPos0
Sold to
[email protected]
68
Buttons are mapped to bit 7 of INPT4 (player 0) and INPT5 (player 1),
so you can check both of them with the BIT instruction:
bit INPT4
bmi .SkipButton0
jsr Player0Button
.SkipButton0
bit INPT5
bmi .SkipButton1
jsr Player1Button
.SkipButton1
There are other controllers on the VCS, like paddles and a 12-button
keypad, but the joystick is by far the most popular, and it’s pretty
easy to support. We’ll cover other controllers later.
Sold to
[email protected]
69
14
Indirect Addressing
lda SpriteData,y
This loads a value from the SpriteData address directly. But what
if we wanted to choose between multiple sprites, or animate the
sprite? We’d need a way to switch betweem different lookup tables.
14.1 Pointers
Sold to
[email protected]
70
We can then load them with the address of our lookup table like
this:
lda #<SpriteData
sta SpritePtr ; store lo byte
lda #>SpriteData
sta SpritePtr+1 ; store hi byte
The #< and #> syntax tells the assembler to extract the low and
high byte of the SpriteData address, respectively. (Remember when
we talked about little endian – this means the low byte, or least
significant, is first followed by the high byte.)
ldy #5
lda (SpritePtr),y ; load value at SpritePtr+Y
sta Value
Sold to
[email protected]
71
14.3 Indexed Indirect Addressing
The other indirect mode is called indexed indirect where the addition
takes place before the lookup:
ldx #4
lda (SpritePtr,x)
Sold to
[email protected]
72
15
A Complex Scene, Part I
Now that we’ve learned about the playfield, player objects, and how
to draw sprites, we can create a complex scene with a background
and foreground. For this demo, the playfield will take up the entire
screen, and we’ll have a single sprite overlapping it.
This is a bit more difficult, because there are a lot of registers that
need to potentially change during each scanline. To change the
playfield, we need to set three different registers, and to draw the
sprite, we’ll need to set a bitmap register and a color register. We’ll
also need to look up all of this data in tables. This will make our
kernel much more complex than previous examples.
Sold to
Figure 15.1: Two sprites over playfield
[email protected]
73
We’ll achieve this using a two-line kernel, which means that each
iteration of the main loop draws two scanlines instead of one. This
gives us a little more wiggle room to move around loads and stores
so that they happen at the right time. The tradeoff is that the
playfield and sprites must align to even scanline numbers, but that’s
not a huge problem for many games.
To avoid confusing a single TIA scanline with a pair of scanlines,
we’ll call pairs of scanlines 2xlines.
There are two parts to our loop. One sets up the playfield registers,
and the other loads the sprite data. We don’t want to store all
192 ∗ 3 bytes of the playfield in memory, so we’ll use a compressed
storage format. We’ll store the playfield in segments. Each segment
is defined by a 2xline height and the three playfield registers. We’ll
only set the playfield registers when a new segment begins. In
summary:
1. For the next playfield segment, fetch its height (in 2xlines) and
playfield values.
2. WSYNC and store values to the playfield registers.
3. For each 2xline in this playfield segment, look up and set
sprite bitmap and color data.
4. Go back to step 1 until we see a height 0 playfield segment.
The data defining the playfield looks something like this:
Each playfield segment takes four bytes, starting with the height
and then the PF0/PF1/PF2 register values. The ALIGN directive
Sold to ensures
[email protected]
74
that it starts on a page boundary (i.e. the low byte is $00) because
the 6502 adds an extra CPU cycle when an indexed lookup crosses
a page boundary, and this could mess with our timing.
We’ll use the (pointer),y addressing mode here, as discussed in
Chapter 14. This allows us to switch between different playfields
– useful if we want the player to walk between rooms, for example.
We’ll also use this mode to look up sprite data so that we can switch
between sprite graphics.
We’ll first load the PFPtr pointer with the address of our playfield
table:
lda #<PlayfieldData
sta PFPtr ; store lo byte
lda #>PlayfieldData
sta PFPtr+1 ; store hi byte
NewPFSegment
ldy PFIndex ; load index into PF array
lda (PFPtr),y ; load length of next segment
beq NoMoreSegs ; == 0, we’re done
sta PFCount ; save for later
iny
lda (PFPtr),y ; load PF0
tax ; PF0 -> X
iny
lda (PFPtr),y ; load PF1
sta Temp ; PF1 -> Temp
iny
lda (PFPtr),y ; load PF2
iny
sty PFIndex
tay ; PF2 -> Y
Note that we’ve also loaded the memory location Bit2p0 and stored
that in GRP0, the player 0 bitmap register. That’s because we just
did a WSYNC, and the sprite data can potentially change on every
individual scanline. We’ll compute that value later in the part of
the loop that loads the sprite data.
Now we move on to the sprite loop, loading X with the number of
2xlines in the current playfield segment. We then see if our current
scanline intersects the sprite:
Here we’ve PLAed the original offset and multiplied it by 2 with ASL.
This is because our bitmap table contains a unique value for each
scanline, and we have to read both of them.
Now, we WSYNC and quickly set all the values for our first upcoming
scanline:
sta WSYNC
sta GRP0 ; 1st line of sprite -> GRP0
lda Colp0
sta COLUP0 ; Colp0 -> COLUP0
dex
beq NewPFSegment ; fetch another playfield segment
Otherwise, we do another WSYNC and set the bitmap value for the
second line in the 2xline, then go back for the next pair:
sta WSYNC
lda Bit2p0
sta GRP0 ; 2nd line of sprite -> GRP0
jmp KernelLoop ; repeat sprite-drawing loop
As you can see from the timing diagram in Figure 15.2, all of
our register stores take place in the HBLANK period, and we use
Sold to
the visible scanline period to lookup values for future scanlines.
[email protected]
77
This guarantees no visual artifacts no matter where the sprite is
positioned horizontally.
Lookup playfield values
Set PF0, PF1, PF2, GRP0
Lookup sprite values loops
Set GRP0, COLUP0
Set GRP0
Our kernel loop isn’t 100% optimized, and there are certainly a
couple of cycles to save here and there. But you can see the tradeoff
in VCS programming – visual complexity vs. code complexity. Our
kernel only draws the playfield and a single multicolor sprite. If
we wanted to add a second sprite, missiles, or a ball, we’d have to
do even more code gymnastics. The code to lookup the playfield
registers takes almost an entire scanline, and we don’t have time to
do much else when this happens.
We could also make further tradeoffs, like have two monochrome
sprites instead of a single color sprite, or lower-resolution sprites,
or a simpler playfield. It all comes down to how much you can get
done in 76 cycles per scanline. This example shows you that you
can get a lot done if you spread your logic across multiple scanlines
and choose tradeoffs that have minimal visual impact.
TIP: To review and modify this code in the VCS emulator, visit
8bitworkshop.com and select the Playfield + Sprite I example.
Sold to
[email protected]
78
16
A Complex Scene, Part II
Sold to
[email protected]
79
Fetch sprite values
KernelLoop
; Phase 0: Fetch PF0 byte
jsr DrawSprites
ldy PFOfs ; no more playfield?
beq NoMoreLines ; exit loop
dey
lda (PFPtr),y ; load value for PF0
sty PFOfs
sta tmpPF0
; Phase 1: Fetch PF1 byte
jsr DrawSprites
ldy PFOfs
dey
lda (PFPtr),y ; load value for PF1
sty PFOfs
sta tmpPF1
; Phase 2: Fetch PF2 byte
jsr DrawSprites
ldy PFOfs
dey
lda (PFPtr),y ; load value for PF2
sty PFOfs
sta tmpPF2
; Phase 3: Write PF0/PF1/PF2 registers
jsr DrawSprites
lda tmpPF0
sta PF0 ; store PF0
lda tmpPF1
sta PF1 ; store PF1
lda tmpPF2
sta PF2 ; store PF2
; Go to next scanline Sold to
[email protected]
80
jmp KernelLoop
Our DrawSprites routine from the last chapter looks similar, except
this time, we draw two sprites instead of one. Because timing is
tight, we rely on one clever design feature of the TIA chip. If we
set the VDELP0 flag (see Table 16.1), we can write to the GRP0 register
for player 0, but the change won’t take effect until we write to GRP1.
(We’ll explain this more in the next chapter.) This saves us from
having to store and load a temporary value.
DrawSprites
; Fetch sprite 0 values
lda #SpriteHeight ; height in 2xlines
sec
isb yp0 ; INC yp0, then SBC yp0
bcs DoDraw0 ; inside bounds?
lda #0 ; no, load the padding offset (0)
DoDraw0
tay ; -> Y
lda (ColorPtr0),y ; color for both lines
sta colp0 ; -> colp0
lda (SpritePtr0),y ; bitmap for first line
sta GRP0 ; -> [GRP0] (delayed due to
VDELP0)
; Fetch sprite 1 values
lda #SpriteHeight ; height in 2xlines
sec
isb yp1 ; INC yp0, then SBC yp0
bcs DoDraw1 ; inside bounds?
lda #0 ; no, load the padding offset (0)
DoDraw1
tay ; -> Y
lda (ColorPtr1),y ; color for both lines
tax
lda (SpritePtr1),y ; bitmap for first line
tay
; WSYNC and store sprite values
lda colp0
sta WSYNC
sty GRP1 ; GRP0 is also updated due to
VDELP0
stx COLUP1
sta COLUP0 ; store player colors
Sold to
; Return to caller
[email protected]
81
Hex Bits Used
Addr Name 76543210 Description
25 VDELP0 .......x Vertical Delay Player 0
26 VDELP1 .......x Vertical Delay Player 1
27 VDELBL .......x Vertical Delay Ball
Table 16.1: Vertical Delay Registers
rts
The critical timing happens after the WSYNC, where we write the
registers for player objects. Thanks to the VDELP0 flag, we only
have to perform three writes here. But in Phase 3, we write to the
PF0/PF1/PF2 registers right after this subroutine returns. We’ve got
just barely enough time left in the HBLANK period to write the first
playfield register before the TIA starts drawing it. So we can’t do
much more after the WSYNC without affecting things downstream.
Before the WSYNC, we’ve actually got a lot of cycles left over (about 40
or so, depending on how many table lookups cross page boundaries)
to do other things, like drawing missiles. But since our timing
before the WSYNC isn’t precise, we might update a register after
the TIA has already drawn the object. This might be acceptable,
depending on the game, because the missile will likely appear on
the next line.
Sold to
[email protected]
82
17
NUSIZ and Other Delights
Let’s say you have a little person that runs left and right. You’d like
the sprite to face left when running left, and right when running
right. Instead of having separate sprites for left/right, you can use
the REFP0 and REFP1 reflection bits.
Sold to
[email protected]
83
To display a mirror image of a sprite, set bit 3 (#$08) of the reflection
register for the desired player object (REFP0 for player 0, REFP1 for
player 1). Clear bit 3 to restore the original image.
You can set both player and missile values for a NUSIZ register by
combining bits, for example setting NUSIZ0 to the hex value $25 sets
missile 0 to 4 pixels wide (2) and player 0 to double-size (5).
Sold to
[email protected]
84
17.3 VDELP: Vertical Delay
We’ve seen that sprite kernels can be very tight timing-wise. The
TIA designers realized this and added a feature called vertical delay
which especially helps with two-line kernels. We covered it briefly
in Chapter 16, but we’ll go into more detail here.
Internally, the TIA keeps two GRP registers for each player. For
player 0, we’ll call them GRP0(a) and GRP0(b). Every time there’s a
write to GRP1, GRP0(a) gets transferred into GRP0(b). The VDELP0 reg-
ister selects whether the TIA uses GRP0(a) or GRP0(b) for outputting
pixels. Similarly, there’s a pair of GRP1(a) and GRP1(b) registers that
are shifted whenever GRP0 is written.
In two-line kernels, the sprite registers are updated every two lines.
This effectively halves your vertical sprite resolution, as well as your
vertical positioning resolution. But the VDELP registers can give you
full vertical positioning, if you alternate GRP0 and GRP1 writes on
alternate scanlines. In this case, setting VDELP effectively delays the
player’s output by one scanline, so you can consider it a fine vertical
adjustment of +1 scanline.
The VDELP registers also help when it’s inconvenient or impossible to
set a GRP register in the HBLANK period. For example, let’s say we’ve
set the VDELP0 bit. You can then set GRP0 in the visible portion of
the scanline, and then when you set GRP1 in the HBLANK period, it’ll
trigger the output for GRP0 simultaneously.
The ball object also has a vertical delay bit (VDELBL) which works
the same way for the ball enable (ENABL) bit. The missile registers,
however, don’t have any vertical delay feature.
When objects overlap, the TIA assigns each object a priority and
displays the object with the highest priority. The CTRLPF allows you
to change these priorities so that you can have sprites that appear
to overlap the playfield background, or vice-versa.
The normal priority assignments are as follows: Sold to
[email protected]
85
Priority Objects
1 Player 0, missile 0
2 Player 1, missile 1
3 Ball, playfield
Table 17.3: Normal CTRLPF Priority Assignments
When bit 1 (#$2) of the CTRLPF register is set, the playfield is put
into score mode. This makes the playfield take two distinct colors:
it assumes player 0’s color in COLUP0 for the left side, and player 1’s
color in COLUP1 for the right side. We’ll use this feature in the next
chapter to draw a scoreboard at the top of the screen.
Sold to
[email protected]
86
18
Scoreboard
Displaying letters and numbers on the VCS requires the same do-it-
yourself attitude as everything else. Early Atari games didn’t have
much except a simple numeric scoreboard, but soon cartridges like
Warren Robinett’s BASIC Programming pushed the 2600 to its limits
by drawing full lines of text. In this chapter, we’ll take a look at
some common techniques for drawing text.
Sold to
[email protected]
87
This makes it easy for code to combine two digits into an eight-
pixel-wide bitmap using the following procedure:
• Look up the most-significant digit’s bitmap
• AND #$0F to extract the left digit
• Look up the least-significant digit’s bitmap
• AND #$F0 to extract the right digit
• ORA the two values to combine the two bitmaps
Here we use a convenient feature of the 6502 called BCD mode.
This allows numbers to be manipulated in a more human-readable
format. Each four-bit half of the byte contains a decimal digit,
so that the value expressed in hexadecimal reads the same as the
decimal representation. For example, $00 to $09 are the same, but
10 is stored as $10, 11 is $11, etc. all the way up to $99.
The following routine takes a BCD-encoded number and looks up
bitmap data for each digit separately, combining them into a 5-byte
array in memory:
GetBCDBitmap subroutine
; First fetch the bytes for the 1st digit
pha ; save original BCD number
and #$0F ; mask out the least significant digit
sta Temp
asl
asl
adc Temp ; multiply by 5
tay ; -> Y
lda #5
sta Temp ; count down from 5
.loop1
lda DigitsBitmap,y
and #$0F ; mask out leftmost digit
sta FontBuf,x ; store leftmost digit
iny
inx
dec Temp
bne .loop1
; Now do the 2nd digit
pla ; restore original BCD number
lsr
lsr
lsr
Sold to
[email protected]
88
lsr ; shift right by 4 (in BCD, divide by 10)
sta Temp
asl
asl
adc Temp ; multiply by 5
tay ; -> Y
dex
dex
dex
dex
dex ; subtract 5 from X (reset to original)
lda #5
sta Temp ; count down from 5
.loop2
lda DigitsBitmap,y
and #$F0 ; mask out leftmost digit
ora FontBuf,x ; combine left and right digits
sta FontBuf,x ; store combined digits
iny
inx
dec Temp
bne .loop2
rts
The previous routine does most of the heavy lifting, so our kernel
is relatively simple. We draw the digits by loading the bitmaps of
the scoreboard from memory, then writing the playfield registers
twice – once for the left side, followed by a delay of several cycles,
followed by a write for the right side:
Note the SLEEP 28 before the second STA PF1 write to the playfield
registers. We need to wait until the TIA has finished drawing the
left side of the playfield before we reset the playfield registers. Our
SLEEP gives us 28 extra cycles, so if we add up all of the cycle times of
the instructions before the STA WSYNC, we’ll write to PF1 on CPU cycle
48, which corresponds to TIA clock 48*3 = 144, which corresponds
to visible pixel (144−68) = 76, right before the center of the display.
Set PF1
Set PF1
X XXX X X XXX
X X X X X X
HBLANK X XXX XXX XXX
X X X X
X XXX X XXX
Sold to
[email protected]
90
19
Collisions
Games need to know when objects collide. In the VCS, the TIA tells
you when the pixels of any two objects overlap. This is one thing on
the VCS that’s actually easy!
The TIA has 15 different collision flags that can detect a pixel-
perfect collision between any of the moveable objects – players,
missiles, ball, and playfield. You can check these flags at any time
– at the end of the frame is pretty common. When you’re done
checking (or before drawing the next frame) you clear them all at
once by writing to CXCLR.
To see if two objects collided, just look up the register and bit index
in Table 19.1. For example, to see if player 0 and player 1 collided,
we’d look at the second row from the bottom, which has register
CXPPMM and bit 7. The CX registers conveniently have all of their flags
in bit 6 or bit 7, so we can use the BIT instruction:
Sold to
[email protected]
91
Register Bit # P0 P1 M0 M1 PF BL
CXM0P 7 X X
CXM0P 6 X X
CXM1P 7 X X
CXM1P 6 X X
CXP0FB 7 X X
CXP0FB 6 X X
CXP1FB 7 X X
CXP1FB 6 X X
CXM0FB 7 X X
CXM0FB 6 X X
CXM1FB 7 X X
CXM1FB 6 X X
CXBLPF 7 X X
CXPPMM 7 X X
CXPPMM 6 X X
Table 19.1: Collision Registers
With the collision flags, we can easily test the player for collisions
with the playfield and other objects. If we want the player to stop
when they hit a playfield wall, we can just restore the previous
position, like this:
Sold to
[email protected]
93
Then, to draw the bitmap, we just iterate through the arrays
(high-to-low is easier), copying each of the six array values to the
PF0/PF1/PF2 registers. We introduce a little pause between the left
and right sides:
ScanLoop
sta WSYNC
lda Data0,y
sta PF0 ; store 1st playfield byte
lda Data1,y
sta PF1 ; store 2nd byte
lda Data2,y
sta PF2 ; store 3rd byte
nop
nop
nop ; 6-cycle pause
lda Data3,y
sta PF0 ; store 4th byte
lda Data4,y
sta PF1 ; store 5th byte
lda Data5,y
sta PF2 ; store 6th byte
dey
bne ScanLoop ; repeat until all scanlines drawn
Sold to
[email protected]
94
For a full-screen bitmap, we’ll need 192 bytes per array, for a total
of 192 ∗ 6 = 1152 bytes. Enough to render a blocky yet recognizable
image of Ada Lovelace:
Sold to
Figure 21.1: Brick game
[email protected]
96
have six bytes for each row of bricks – three playfield registers for
the left side, and a different three for the right. If we have six rows
of bricks, we have 6 ∗ 6 = 36 bytes.
Bit # 4 5 6 7 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 4 5 6 7 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7
0 6 12 18 24 30
1 7 13 19 25 31
Array 2 8 14 20 26 32
Index 3 9 15 21 27 33
4 10 16 22 28 34
5 11 17 23 29 35
Pixel 0 4 12 20 24 32 39
We’ve also got to track the X and Y positions of both the player and
the ball.
Our game will have several different kernels, each of which draws a
different area of the screen. Because we draw the ball in all of them,
we define a macro so that we don’t have to duplicate code:
Sold to
[email protected]
97
; Enable ball if it is on this scanline (in X register)
; Modifies A.
; Takes 13 cycles if ball is present, 12 if absent.
MAC DRAW_BALL
lda #%00000000
cpx YBall
bne .noball
lda #%00000010 ; for ENAM0 the 2nd bit is enable
.noball
sta ENABL ; enable ball
ENDM
The outer loop adds enough CPU cycles that we don’t have time to
draw the ball, so we skip drawing it when we transition to a new
row. (We could have also drawn the ball but skipped the STA WSYNC,
rearranging things so that the timing worked out either way.)
This is visually not very noticable, but since we’re checking collision
registers we have to be aware of a VCS maxim: If you don’t draw it,
it doesn’t collide! Since the ball only disappears one out of every 16
Sold to
scanlines, it’s not a huge deal.
[email protected]
98
The inner loop looks pretty much like it did in Chapter 20, where
we learned how to draw an asynchronous playfield:
sta WSYNC
stx COLUPF ; change colors for bricks
; Load the first byte of bricks
; Bricks are stored in six contiguous arrays (row-major)
lda Bricks+NBL*0,y
sta PF0 ; store first playfield byte
; Store the next two bytes
lda Bricks+NBL*1,y
sta PF1
lda Bricks+NBL*2,y
sta PF2
inx ; good place for INX b/c of timing
nop ; yet more timing
lda Bricks+NBL*3,y
sta PF0
lda Bricks+NBL*4,y
sta PF1
lda Bricks+NBL*5,y
sta PF2
dec Temp
beq ScanLoop3b ; all lines in current brick row done?
bne ScanLoop3a ; branch always taken
Each row of bricks is 16 lines, and since VCS color hues change
every 16 values, we get a nice and inexpensive (in terms of CPU
cycles) rainbow effect just by storing the current scanline into the
COLUPF register.
Next Row
Set Left Side PF Set Right Side-PF Loop Draw Ball
Set Left Side PF Set Right Side-PF Loop Draw Ball
Set Left Side PF Set Right Side-PF Loop Draw Ball
Set Left Side PF Set Right Side-PF Loop Draw Ball
Next Row
The player can never travel into the brick field, so our next kernel
just draws the player’s sprite and ball. It’s conceptually similar to
Sold
the routine in Chapter 8 except we also include the DRAW to macro.
_BALL
[email protected]
99
After we draw the frame and enter the overscan period, we check
for collisions:
lda #%01000000
bit CXP0FB ; collision between player 0 and ball?
bne PlayerCollision
lda #%10000000
bit CXBLPF ; collision between playfield and ball?
bne PlayfieldCollision
beq NoCollision
First we’ll talk about what happens when the ball is in contact with
the player. When you press the joystick button, we set the Captured
flag which allows you to grab the ball:
PlayerCollision
lda INPT4 ;read button input
bmi ButtonNotPressed ;skip if button not pressed
inc Captured ; set capture flag
bne NoCollision
(There’s a “bug” here because the Captured variable rolls over after
255 frames – but in VCS programming, we like to call them
“features” and just get on with it.)
Now we see if the ball bounced off of the top of the player’s head,
or off of their shoes: We calculate Y P lyr + SpriteHeight/2 − Y Ball
to see if the ball is within the top half or bottom half of the player’s
sprite, and set YBallVel (the ball’s Y velocity) to -1 or 1 accordingly:
ldx #1
lda YPlyr
clc
adc #SpriteHeight/2
sec
sbc YBall
bmi StoreVel ; bottom half, bounce down (+1)
ldx #$ff ; top half, bounce up (-1)
bne StoreVel
For the collision between playfield and ball, we need to check to see
which brick we broke. This is pretty complex, so we Sold to
separated it
[email protected]
100
out into a subroutine. First, we correct the Y coordinate so that zero
starts at the top of the brickfield:
Then, we divide by the brick height (16). We could have also done
four LSRs, but this works for any BrickHeight value:
ldy #$ff
DivideRowLoop
iny
sbc #BrickHeight
bcs DivideRowLoop ; loop until < 0
cpy #NBrickRows
bcs NoBrickFound ; outside Y bounds of bricks
clc
adc #BrickHeight ; undo subtraction to get
remainder
pha ; save the remainder to return as result
The subroutine depends on two tables that define the bitmask and
byte offset for each of the 40 bricks in a row (the same layout
described in 6:
Sold to
[email protected]
102
We call the subroutine like so, using the remainder value returned
in A to see if we hit the top or bottom of the brick, and thus decide
which direction to bounce:
PlayfieldCollision
lda YBall
ldx XBall
jsr BreakBrick
bmi CollisionNoBrick ; return -1 = no brick found
; Did we hit the top or the bottom of a brick?
; If top, bounce up, otherwise down.
ldx #$ff ; ball velocity = up
cmp #BrickHeight/2 ; top half of brick?
bcc BounceBallUp ; yofs < brickheight/2
ldx #1 ; ball velocity = down
BounceBallUpw
stx YBallVel
The addition (ADC) sets the carry flag when it wraps, so the higher
the XBallVel value, the more often this happens, and the more often
we INC the ball position.
There’s a similar routine for moving the ball left, with one im-
portant difference. In this case, the velocity is negative, and if
you remember two’s complement from Chapter 1, the byte value
will be 128 or greater – so a signed value of -1 is the byte 255,
for example. The lower our velocity, the more often we wrap the
XBallErr addition. So we want to move the ball only when the carry
from the ADC is clear (no wrap), not when it’s set:
adc XBallErr
sta XBallErr ; XBallErr += XBallVel
bcs DoneMovement ; did wrap around? no move
Sold to
[email protected]
104
22
A Big (48 pixel) Sprite
We’ve seen that the VCS graphics are pretty limited. During each
scanline, we can draw 20 unique playfield pixels, two 8-bit sprites,
and up to three ball/missile objects. We’ve seen in Chapter 21
how to draw 40 unique playfield pixels with some complicated
gymnastics.
We’ve also seen that we can get six sprites on a scanline by using the
NUSIZ registers, which draw up to three duplicate sprites per player
object at configurable intervals. This is used in Combat for the three-
airplanes-in-formation game mode, for example. But this still only
gives us two unique sprites per scanline, and four clones.
We’ll use a technique similar to the Asynchronous Playfields trick –
reprogramming the TIA registers on-the-fly, writing to each register
multiple times during the scanline. If we time our writes carefully,
we’ll be able to draw six unique sprites per scanline, for example to
draw a six-digit scoreboard, or one large 48-pixel sprite.
Sold to
Figure 22.1: [email protected]
48-pixel sprite
105
The first step is to set the NUSIZ registers for “three copies, close” to
display three copies of each 8-pixel sprite, 8 pixels apart from one
another. Our goal is to set each player’s horizontal position so that
they overlap like so:
Player 0 00000000 22222222 44444444
Player 1 11111111 33333333 55555555
The next step is to enable the VDELPx registers for both players. As
described in Chapter 17, the VDELPx bit enables a buffer for the GRP
register, so that when you set the player’s bitmap register it does not
take effect until you set the other player’s bitmap register. This will
be essential for our 48-pixel kernel, because it means we can pre-
stage two GRP register values in the TIA chip, flipping them during
the very tight set of instructions that sets the player registers.
Before the frame starts, we must also position the two player objects.
They must be at an exact horizontal location, and player 1 must be
exactly 8 pixels to the right of player 0 so that they meet with no
overlaps. This does the trick:
sta WSYNC
SLEEP 20 ; skip 60 pixels
sta RESP0 ; position player 0 @ 69
sta RESP1 ; ...and player 1 @ 78
lda #$10
sta HMP1 ; player 1 goes 1 pixel to the left
sta WSYNC
sta HMOVE ; apply HMOVE
sta HMCLR
TIMER_SETUP 192
SLEEP 40 ; start near end of scanline
We are going to be lazy and use the TIMER_SETUP macro to make sure
we output 192 scanlines, even though our sprite is much smaller.
That macro also does a WSYNC, so we’ll SLEEP 40 so that we start the
loop near the end of the scanline. Sold to
[email protected]
106
Now the loop. We start by loading the first two sprite bytes into GRP0
and GRP1:
BigLoop
ldy loopcount ; counts backwards
lda Data0,y ; load B0 (1st sprite byte)
sta GRP0 ; B0 -> [GRP0]
lda Data1,y ; load B1 -> A
sta GRP1 ; B1 -> [GRP1], B0 -> GRP0
Because we’ve set the VDELP0 and VDELP1 bits, the first sprite byte (B0)
goes into the GRP0 buffer, not the real GRP0 register. This is indicated
by the [GRP0] notation.
The next sprite byte (B1) goes into [GRP1]. Since there is something
in [GRP0], this triggers [GRP0] to store into the real register GRP0.
Now we’ve loaded the third byte B2, and that goes into GRP0. Since
we just stored B1 to [GRP1], that goes into the real GRP1.
Now we have to get ready for the time-critical step, the “one weird
trick.” We load B4, B3, and B5 into the X, A, and Y registers, with the
help of a temporary location:
Everything’s all set for the Grand Finale. We alternately store to the
GRP0/GRP1 registers four times with an amazing flourish!
Note that we really need the buffered registers that the VDELPx flags
give us, because we only have three registers and simply don’t have
time to load a fourth register from memory in this sequence!
Note also that the final STA GRP0 is only there to ensure that [GRP1]
gets moved into the real GRP1 register; the value being stored is
irrelevant.
Now we decrement our counter and go back for another pass.
There are plenty of other ways to write this loop. It’s common to use
the LDA (ptr),y addressing mode so that you can configure each 8-
pixel column to point to a different bitmap – good for doing 6-digit
scoreboards, “lives left” displays, etc. The crux of the biscuit is that
4-instruction store firing at the right time.
Sold to
[email protected]
108
23
Tiny Text
String0 dc "HELLO[WORLD?"
lda #<String0
sta StrPtr
lda #>String0
sta StrPtr+1
jsr BuildText
BuildText uses a special trick involving the stack pointer (S) which
we’ll explain soon. First we’ll save S since we’ll be modifying it later:
BuildText subroutine
tsx
stx TempSP
We’ve got two variables that keep track of our progress, WriteOfs
and StrLen. WriteOfs points to the end of the column of bytes being
written to, and StrLen contains the current character being read:
.CharLoop
; Get first character
lda (StrPtr),y ; load next character
sec
sbc #LoChar ; subtract 32 (1st char is Space)
sta Temp
asl
asl
adc Temp ; multiply by 5
tax ; first character offset -> X
; Get second character
iny
lda (StrPtr),y ; load next character
sec
sbc #LoChar ; subtract 32 (1st char is Space)
sta Temp
asl
asl
adc Temp ; multiply by 5
iny
sty StrLen ; StrLen += 2
tay ; second character offset -> Y
lda FontTableLo+4,y
ora FontTableHi+4,x
pha
lda FontTableLo+3,y
ora FontTableHi+3,x
pha
lda FontTableLo+2,y
ora FontTableHi+2,x
pha
lda FontTableLo+1,y
ora FontTableHi+1,x
pha
lda FontTableLo+0,y
ora FontTableHi+0,x
pha
That’s all there is to it! Now we add five to WriteOfs to target the
next column of bytes, and repeat until we run out of characters:
lda WriteOfs
clc
adc #5
sta WriteOfs Sold to
[email protected]
112
.NoIncOfs
ldy StrLen
cpy #12
bne .CharLoop
ldx TempSP
txs
rts
If you wanted to save 256 bytes of space, you could use just one font
table and do something like this:
lda FontTableLo+4,x
asl
asl
asl
asl
ora FontTableLo+4,y
pha
...
lda #4
sta LoopCount
BigLoop
ldy LoopCount ; counts backwards
lda FontBuf+0,y ; load B0 (1st sprite byte)
sta GRP0 ; B0 -> [GRP0]
lda FontBuf+5,y ; load B1 -> A
sta GRP1 ; B1 -> [GRP1], B0 -> GRP0
sta WSYNC ; sync to next scanline
lda FontBuf+10,y ; load B2 -> A
sta GRP0 ; B2 -> [GRP0], B1 -> GRP1
lda FontBuf+25,y ; load B5 -> A
sta Temp ; B5 -> temp Sold
to
[email protected]
113
ldx FontBuf+20,y ; load B4 -> X
lda FontBuf+15,y ; load B3 -> A
ldy Temp ; load B5 -> Y
sta GRP1 ; B3 -> [GRP1]; B2 -> GRP0
stx GRP0 ; B4 -> [GRP0]; B3 -> GRP1
sty GRP1 ; B5 -> [GRP1]; B4 -> GRP0
sta GRP0 ; ?? -> [GRP0]; B5 -> GRP1
dec LoopCount ; go to next line
bpl BigLoop ; repeat until < 0
Sold to
[email protected]
114
24
Six-Digit Scoreboard
Digit0 word
Digit1 word
Digit2 word
Digit3 word
Digit4 word
Digit5 word
We will also use a lookup table for the bitmaps of digits 0-9. Timing
is critical, so we need to use ALIGN to make sure it doesn’t cross a page
boundary (we can also save an addition if we know the low byte is
zero): Sold to
[email protected]
115
align $100 ; make sure data doesn’t cross page boundary
FontTable
hex 003c6666766e663c007e181818381818
hex 007e60300c06663c003c66061c06663c
hex 0006067f661e0e06003c6606067c607e
hex 003c66667c60663c00181818180c667e
hex 003c66663c66663c003c66063e66663c
Now we need to set up the six Digit pointers. Each byte of the score
contains two BCD digits, so we’ll need to extract the high nibble
and low nibble separately, then multiply each by 8 to arrive at the
offset for each digit’s pointer:
GetDigitPtrs subroutine
ldx #0 ; leftmost bitmap
ldy #2 ; start from most-sigificant BCD value
.Loop
lda BCDScore,y ; get BCD value
and #$f0 ; isolate high nibble (* 16)
lsr ; shift right 1 bit (* 8)
sta Digit0,x ; store pointer lo byte
lda #>FontTable
sta Digit0+1,x ; store pointer hi byte
inx
inx ; next bitmap pointer
lda BCDScore,y ; get BCD value (again)
and #$f ; isolate low nibble
asl
asl
asl ; * 8
sta Digit0,x ; store pointer lo byte
lda #>FontTable
sta Digit0+1,x ; store pointer hi byte
inx
inx ; next bitmap pointer
dey ; next BCD value
bpl .Loop ; repeat until < 0
rts
Sold to
[email protected]
116
The kernel loop is similar to previous 48-pixel kernels, except it
uses the (aa),y indirect addressing mode:
Sold to
[email protected]
117
We also need a subroutine that adds to the score. This routine
adds three BCD-encoded bytes to the BCDScore variable, doing the
appropriate thing with the carry bits:
Sold to
[email protected]
118
25
A Big Moveable Sprite
c9 c9 cmp #$c9
c9 c9 cmp #$c9
c9 c9 cmp #$c9
; repeat many times ...
c5 ea cmp $ea
c9 c9 cmp #$c9 2
c9 c9 cmp #$c9 2
c5 ea cmp $ea 3
We decode two CMP #$c9 instructions, which take two cycles each,
and one CMP $ea, which takes three cycles, for a total of seven cycles.
Since they are all CMP instructions, there are no side effects besides
modifying flags.
Now, what if we started five bytes from the end? Since our
instructions take up two bytes each, we’d be essentially starting in
the middle of an instruction! But the CPU doesn’t see the boundaries
of our assembler instructions – it will happily execute whatever it
sees. This is what it sees:
c9 c9 cmp #$c9 2
c9 c5 cmp #$c5 2
ea nop 2
Note that the CMP #$c9 is the same, since we started in the sea of
$c9 bytes. But the last two instructions we decode are different.
Our last instruction is a NOP, which came from the $ea in the CMP $ea
instruction. The NOP only takes two cycles, so we’ve wasted a total
of six cycles, one less than the previous run.
Let’s see what would happen if we started four bytes from the end:
c9 c9 cmp #$c9 2
c5 ea cmp $ea 3
We’re back in alignment with our original assembler code, and this
time we take up five cycles – one less than previously.
Sold to
[email protected]
120
Using the clockslide is pretty simple, you just compute a pointer
to somewhere inside the array depending on how many cycles you
want to waste, then do an indirect jump to the pointer:
lda #<ClockslideEnd
sec
sbc DelayCycles
sta DelayPtr
lda #>ClockslideEnd
sbc #0
sta DelayPtr+1
jmp (DelayPtr)
REPEAT 36
.byte $c9
REPEND
.byte $c9,$c5
ClockslideEnd
nop
Now we can get to the business of drawing the sprite. The kernel in
Figure 25.1 is “WSYNC-free,” which means we’ll have to make sure
it takes exactly 76 cycles. This ensures that the TIA clock starts at
the same position at the start of every loop iteration, which ensures
that our register writes happen at the exact same moment for every
scanline. We’ll use the clockslide immediately before entering the
kernel loop. Note that we also use the indirect (aa),y addressing
mode as shown in the code that follows.
Sold to
[email protected]
121
.KernelLoop
nop
nop
nop
ldy LineCount
lda (Data0),Y
sta GRP0
lda (Data1),Y
sta GRP1
lda (Data2),Y
sta GRP0
lda (Data5),Y
sta Temp
lda (Data4),Y
tax
lda (Data3),Y
ldy Temp
sta GRP1
stx GRP0
sty GRP1
sta GRP0
dec LineCount
bpl .KernelLoop
Sold to
[email protected]
122
26
Sprite Formations
Sold to
Figure 26.1: Sprite retriggering example game
[email protected]
123
the RESPx register is strobed multiple times on a given scanline, the
first (leftmost) copy of the object will be hidden, and the TIA will
draw the other copy. You can keep strobing the register to output
multiple copies on the same scanline.
The Grid Kernel that draws a single row of evenly-spaced sprites
looks like this:
KernelLoop
lda EnemyFrame0,y ; load bitmap
sta WSYNC
ldx EnemyColorFrame0,y ; load color
sta GRP0
sta GRP1
stx COLUP0
stx COLUP1
ldx #0 ; so we can do the STA RESPn,x variant
KernelStores
sta RESP0,x
sta RESP1,x
sta RESP0,x
sta RESP1,x
sta RESP0,x
sta RESP1,x
sta RESP0,x
sta RESP1,x
dey ; also acts as 2-cycle delay
stx.w GRP0 ; clear player 0 bitmap (4-cycle version)
sta RESP0 ; reset player 0 position
stx GRP1 ; clear player 1 bitmap
sta RESP1 ; reset player 1 position
bpl KernelLoop ; repeat until Y < 0
rts
DrawFormation
ldx CurRow
lda EnemyRows0,x
ldx #1 ; start at KernelStores+1
ShiftLoop
ldy #RESP0
ror
bcs NoClearEnemy0
ldy #$30 ; no-op
NoClearEnemy0
sty ZPWrites,x
inx
inx
ldy #RESP1
ror
bcs NoClearEnemy1
ldy #$30 ; no-op Sold to
[email protected]
125
NoClearEnemy1
sty ZPWrites,x
inx
inx
cpx #16 ; 8*2 bytes
bcc ShiftLoop
ldy EnemyColorFrame0 ; get height -> Y
jsr ZPRoutine ; draw sprites
rts
Note that our Grid Kernel routine does not draw missiles because
we just don’t have the time. Rather, we set the missile registers
before we draw (we actually use the ball register for the player’s
missile so that it gets its own color). This creates a long stripe
whenever the missile is present. If we wanted to draw the missile
correctly, we’d have to probably give up the line-by-line sprite color
and do something like this:
Sold to
[email protected]
126
There are also a couple of other modifications we can make to this
kernel:
• We could add more STA RESPn,x instructions to create more
sprites. We could draw up to 11 sprites this way. However,
this requires some tricky logic if you want to remove sprites,
because we’ll have to remove the instructions that reset the
sprite at the end of the scanline.
• We could replace the STA WSYNC with a couple of NOPs and
make the loop take exactly 76 cycles. Then we could use the
Clockslide technique from Chapter 25 to have some control
over horizontal position.
This technique is useful for drawing static displays, like a map or
grid of tiles. If you need horizontal movement, it’s probably a lot
easier to just use the NUSIZ registers with two player objects and limit
yourself to six sprites per scanline!
Sold to
[email protected]
127
27
Advanced Timer Tricks
TIMER_SETUP 192
And from there, our various routines read the PIA timer (the INTIM
register) to figure out which scanline we’re on. It’s not perfect,
because our preferred PIA timer resolution is 64 cycles, and a
scanline is every 76 cycles. But it’s good enough for our purposes,
and a lot more convenient.
For instance, here we wait until it’s time to start drawing a row of
sprites:
WaitForRow
jsr DrawMissiles ; set missile registers
ldx CurRow ; get current row of sprites
lda EnemyYpos0,x ; get row Y position
cmp INTIM ; compare to timer
bcc WaitForRow ; wait until timer > Y position
Sold to
[email protected]
128
Note that there’s no STA WSYNC here! We just keep calling DrawMissiles
(which also uses the timer to see if missiles intersect the current
scanline) until the timer counts down below a given value.
We’ve got to be careful not to let the timer go below zero, though,
because at that point it goes negative and our code might miss it.
As described in the Stella Programming Guide[4] , this is a feature,
not a bug, as it allows programmers to determine how long ago the
timer expired:
There are also cases where the timer changes very close to the end
of a scanline, and our next WSYNC might miss it. One solution is to
always have a constant number of cycles between the point where
your timer loop exits and the next WSYNC. You’ll then at least miss
lines predictably.
For example, the example program for Chapter 26 (Formation
Flying at 8bitworkshop.com) has a DrawMissiles routine which is
used to draw 8-pixel high missiles using the PIA timer value:
DrawMissiles
lda INTIM ; load timer value
pha ; save timer value
sec
sbc MissileY0 ; subtract missile 0’s Y from timer value
cmp #8 ; within 8 lines of missile?
lda #3 ; bit 1 now set
adc #0 ; if carry set, bit 1 cleared
sta ENABL ; enable/disable ball
pla ; restore original timer value
sec Sold to
[email protected]
129
sbc MissileY1 ; subtract missile 1’s Y from timer value
cmp #8 ; within 8 lines of missile?
lda #3 ; bit 1 now set
adc #0 ; if carry set, bit 1 cleared
sta ENAM1 ; enable/disable missile
rts
Since scanlines take 76 CPU cycles and the closest PIA timer period
is 64 cycles, we don’t have an easy mapping between timer values
and scanlines. This diagram shows the problem:
0 O
1 O
2 O
3 O
4 O
5 O O
6 O
7 O
8 O
9 O
10 O O
11 O
12 O
13 O
14 O
15 O
16 O O
+0 CPU cycle +76
13 cycles
There are 13 cycles between the second CPY read and the STA WSYNC,
so that’s our effective resolution. In this case, we need to ensure that
we don’t enter this critical region in the 13 cycles before the end of
a scanline, or we could potentially wrap to the next scanline.
Here’s the lookup table:
align $100
Timer2Scanline
.byte 215, 0,214,213,212,211,210, 0,209,208,207,206,205,204, 0,203
.byte 202,201,200,199, 0,198,197,196,195,194, 0,193,192,191,190,189
.byte 188, 0,187,186,185,184,183, 0,182,181,180,179,178, 0,177,176
.byte 175,174,173,172, 0,171,170,169,168,167, 0,166,165,164,163,162
.byte 0,161,160,159,158,157,156, 0,155,154,153,152,151, 0,150,149
.byte 148,147,146, 0,145,144,143,142,141,140, 0,139,138,137,136,135
.byte 0,134,133,132,131,130, 0,129,128,127,126,125,124, 0,123,122
.byte 121,120,119, 0,118,117,116,115,114, 0,113,112,111,110,109,108
.byte 0,107,106,105,104,103, 0,102,101,100, 99, 98, 0, 97, 96, 95
.byte 94, 93, 92, 0, 91, 90, 89, 88, 87, 0, 86, 85, 84, 83, 82, 0
.byte 81, 80, 79, 78, 77, 76, 0, 75, 74, 73, 72, 71, 0, 70, 69, 68
.byte 67, 66, 0, 65, 64, 63, 62, 61, 60, 0, 59, 58, 57, 56, 55, 0
.byte 54, 53, 52, 51, 50, 0, 49, 48, 47, 46, 45, 44, 0, 43, 42, 41
.byte 40, 39, 0, 38, 37, 36, 35, 34, 0, 33, 32, 31, 30, 29, 28, 0
.byte 27, 26, 25, 24, 23, 0, 22, 21, 20, 19, 18, 0, 17, 16, 15, 14
.byte 13, 12, 0, 11, 10, 9, 8, 7, 0, 6, 5, 4, 3, 2, 0, 1
Note the align $100 which we use to avoid crossing page boundaries
and upsetting the timing. Sold to
[email protected]
132
The routine assumes the timer starts at 255, so before using the
routine we should set up the timer like this:
lda #$ff
sta WSYNC
sta TIM64T
TIMER_SETUP 216
lda #50
jsr WaitForScanline
lda #0
jsr WaitForScanline
lda Timer2Scanline,y ; fetch exact scanline
For many games, we’d like to display more than two sprites.
Unfortunately, the VCS hardware is really limited to two distinct
sprites per scanline, unless you get fancy with the NUSIZ register and
other TIA tricks. But if we reprogram the TIA between sprites, we
can get more on the screen – even though we’re still limited to two
sprites on a given scanline.
There are a lot of different ways to tackle this on the VCS, but
we’re going to try for a generalized approach that allows us to use
position sprites at any X-Y coordinate, each with its own bitmap and
color table. This is tricky because we can only do so much on each
scanline.
Our approach is to separate the problem into three phases:
1. Sort vertically
2. Position horizontally
3. Display sprite (then repeat steps 2 and 3)
In the Sort phase, we sort all sprites by Y coordinate. We do one sort
pass per frame, so it may take several frames for the sort to stabilize.
In the Position phase, we look at the sprites in Y-sorted order,
looking several lines ahead to see if a sprite is coming up. We then
allocate it to one of the two TIA’s player objects and set its position
using the SetHorizPos method. We can set one or both of the player
objects this way, one at a time. Sold to
[email protected]
134
Figure 28.1: Multiple sprites example
Place sprite #0
Set horizontal position player 1
Place sprite #1
Set horizontal position player 0
Draw sprites
We then loop through the scanlines, fetching pixels and colors for
one or both objects (up to four lookup tables) and setting registers
at the appropriate time. We don’t have time to do much else, so we
don’t look for any new objects to schedule until we’re done with this
loop. Sold to
[email protected]
135
This scheme can only display up to two objects on a given scanline,
so if the system tries to schedule a third, it will be ignored. Also,
the positioning routine takes a few scanlines to complete, so if the
top of a sprite is too close to the bottom of another sprite, the latter
may not be displayed.
To mitigate this, we increment a priority counter when a sprite entry
is missed. In the sort phase, we move those sprites ahead of lower
priority sprites in the sort order. This makes overlapping sprites
flicker instead of randomly disappear. If all goes well, each sprite
will get an equal share of screen time.
28.1 Variables
The XPos0 and YPos0 arrays track the coordinates of each sprite
(the "0" suffix reminds us that this is the address of the first array
element).
The Sorted0 array keeps a list of sprites sorted by vertical position,
top-first. Each entry is the index of the sprite (0-7).
Priority0 is an array tracks sprites that are missed – we’ll discuss
this later.
As we go down the screen, CurIndex keeps track of which sprite to
look at next (i.e., which entry of the Sorted0 array):
Sold to
[email protected]
136
The other variables are used by our sprite kernel, and they keep
pointers to bitmap and color tables for each sprite, as well as the
positions and heights of the next sprites to draw:
28.2 Position
In the Position step, we try to assign the next sprite in the sort order
to one of the two player objects. FindAnotherSprite is the subroutine
that does this:
FindAnotherSprite
GET_APPROX_SCANLINE
clc
adc #MinYDist
sta Scanline
ldx CurIndex
cpx #NSprites
bcs .OutOfSprites ; no more sprites to check
ldy Sorted0,x ; get sprite index # in Y-sorted order
lda YPos0,y ; get Y position of sprite
cmp Scanline ; SpriteY - Scanline
bmi .MissedSprite Soldaway)
; passed it? (or > 127 lines to
[email protected]
137
Now that a sprite is starting soon, we need to schedule it to one or
the other of the player objects. First, we check player 1:
lda XPos0,y
ldx SIndx1 ; player 1 available?
bne .Plyr1NotReady ; no, try player 0
Whichever player object we pick, the first step is to set the sprite’s
horizontal offset using the SetHorizPos subroutine – this could use
up to two scanlines:
Sold to
[email protected]
138
Then, we set various variables for the player 1 sprite, including Y
position, pointers to bitmap and color maps, and set the height of
the sprite:
lda YPos0,y
sta SIndx1
; Get index into SpriteDataMap (index * 4)
ldx MultBy4,y
; Copy addresses of pixel/color maps to player 1
lda SpriteDataMap,x
sta PData1
lda SpriteDataMap+1,x
sta PData1+1
lda SpriteDataMap+2,x
sta PColr1
lda SpriteDataMap+3,x
sta PColr1+1
; Get the sprite height as the first byte of the color map
ldy #0
lda (PColr1),y
sta SSize1
jmp .SetupDone
28.3 Display
After we set up the sprites, we now enter the display phase. First
we use the WaitForScanline subroutine as described in Chapter 27 to
wait for an exact scanline. We pass it zero, which makes it wait for
the next scanline that can be measured, and returns its value in A:
DrawSprites subroutine
lda #0 ; 0 = wait for next
jsr WaitForScanline
lda Timer2Scanline,y ; lookup scanline #
sta Scanline ; save it
Sold to
[email protected]
139
Next, we calculate how many scanlines need to be drawn for each
sprite, starting at the current scanline.
lda SIndx0
beq .Empty0 ; sprite 0 is inactive?
sec
sbc Scanline
clc
adc SSize0
sta SIndx0 ; SIndx0 += SSize0 - Scanline
.Empty0
lda SIndx1
beq .Empty1 ; sprite 1 is inactive?
sec
sbc Scanline
clc
adc SSize1
sta SIndx1 ; SIndx1 += SSize1 - Scanline
.Empty1
Now that we have the scanline counts for each player, we take the
maximum value, and that’s the total number of lines to draw (if it’s
zero, that means there weren’t any sprites to draw):
cmp SIndx0
bpl .Cmp1 ; sindx0 < sindx1?
lda SIndx0
.Cmp1
tax ; X = # of lines left to draw
beq .NoSprites ; X = 0? we’re done
sta WSYNC ; next scanline
.DrawNextScanline
; Make sure player 0 index is within bounds
ldy SIndx0
cpy SSize0
bcs .Inactive0 ; index >= size? (or index < 0)
; Lookup pixels for player 0
lda (PData0),y
; Do WSYNC and then quickly store pixels for player Sold
0 to
[email protected]
140
sta WSYNC
sta GRP0
; Lookup/store colors for player 0
lda (PColr0),y
sta COLUP0
.Inactive0
sta WSYNC
lda #0
sta GRP0
sta COLUP0
beq .DrawSprite1 ; always taken due to lda #0
.DrawSprite1
; Make sure player 1 index is within bounds
ldy SIndx1
cpy SSize1
bcs .Inactive1 ; index >= size? (or index < 0)
; Lookup/store pixels and colors for player 1
; Note that we are already 30-40 pixels into the scanline
; by this point...
lda (PData1),y
sta GRP1
lda (PColr1),y
sta COLUP1
.Inactive1
dey
sty SIndx1
dec SIndx0
Repeat until we’ve drawn all the scanlines for this job:
Sold to
[email protected]
141
dex
bne .DrawNextScanline
At the end, we free up both player slots by zeroing them out, as well
as cleaning up the player registers:
stx SIndx0
stx SIndx1
stx SSize0
stx SSize1
sta WSYNC
stx GRP0
stx GRP1
.NoSprites
rts
The main kernel loop relies on the timer functions in Chapter 12,
so the first thing we do is set up the timer:
NextFindSprite
jsr FindAnotherSprite
jsr FindAnotherSprite
We defer the WSYNC and HMOVE so we can do them both at once (if two
sprites were scheduled) which saves us a scanline:
jsr DrawSprites
Sold to
[email protected]
142
We strobe HMCLR to erase any previous fine offsets, and then we check
INTIM (the timer) to see if we’re far enough down the screen to finish
the loop. If so, we call WaitForScanline so that we end the loop on a
known scanline.
28.5 Sort
The sort routine is called during the VBLANK period. It’s a bubble sort
algorithm, which is pretty simple to implement. The idea is that
you compare successive pairs of entries, and swap them until they
are in order. We just sort the Sorted0 array, not all four object arrays.
The pseudocode looks like this:
We run this for each pair of sprite indices – e.g. if there are 8 sprites,
we run it for indices 0 through 6 (which swap with 1 through 7).
The 6502 code is not very complicated, just a bunch of indexed
lookups, comparisons, loads, and stores:
28.6 Improvements
Sold to
[email protected]
145
29
Random Number Generation
lda Random
asl
eor Random
asl
eor Random
asl
asl
eor Random
asl
rol Random
lda Random
asl
eor Random
asl
eor Random
asl
asl
rol
eor Random
lsr
ror Random
Sold to
[email protected]
147
There is another type of LFSR called a Galois LFSR which is even
more compact:
lda Random
lsr
bcc .NoEor
eor #$d4 ; #%11010100
.NoEor:
sta Random
We used $D4 in our example, but other constants that give you the
full range of 255 unique numbers include: $8E, $95, $96, $A6, $AF,
$B1, $B2, $B4, $B8, $C3, $C6, $D4, $E1, $E7, $F3, and $FA.
The inverse is just as simple, we just have to shift left instead of
right, and use a different constant (the original constant rotated left
1 bit):
lda Random
asl
bcc .NoEor
eor #$a9 ; #%10101001
.NoEor:
sta Random
Since half of the time a Galois LFSR just performs a simple shift,
you may have to iterate them at least twice to get plausible-looking
random values. Because the period of a maximal LFSR is odd, you
can iterate twice and still get the full range of values.
You can also extend a Galois LFSR to 16 bits. It’s pretty much the
same as the 8-bit version, except we use the 16-bit constant $d400:
lsr Random+1
ror Random
bcc .NoEor
lda Random+1
eor #$d4
sta Random+1
.NoEor:
rts
Sold to
[email protected]
148
And the reverse, which uses the constant $a801 (the previous
constant rotated left by 1 bit):
asl Random
rol Random+1
bcc .NoEor
lda Random
eor #$01
sta Random
lda Random+1
eor #$a8
sta Random+1
.NoEor:
rts
These “magic numbers” give us the full range of values (255 for the
8-bit version and 65,535 for the 16-bit version) but other constants
will result in shorter periods. LFSRs are also used in the TIA chip to
generate sound, and we’ll see that they are configurable in a similar
way.
Another cheap source of pseudo-randomness is to just read bytes
from the code in the ROM directly, say starting at $F000. This is
sometimes used to provide noisy backgrounds, since the pattern of
bytes is usually random enough to trick the eye. It’s not great for
procedural generation, though, because many values will be over-
or under-represented.
Sold to
[email protected]
149
30
Procedural Generation
Sold to
[email protected]
150
TIP: To access an example with the sample code described
in this chapter, access the Procedural Generation example
available at 8bitworkshop.com. Not only can you modify the
code and see your changes in real-time, but you can also use
arrow keys to navigate rooms!
We’ll use a Galois LFSR (as described in the last chapter) to modify
the value forward and backward:
NextRandom SUBROUTINE
lsr
bcc .NoEor
eor #$d4
.NoEor:
rts
PrevRandom SUBROUTINE
asl
bcc .NoEor
eor #$a9
.NoEor:
rts
We’ll start at room number 1. When we move down off the bottom
of the screen, we’ll go to the next room number by iterating the
LFSR two times. When we move up, we’ll go to the previous room
by reverse-iterating two times. Going left and right will teleport
seven rooms back or seven rooms forward.
These routines will handle changing rooms, using the Y register to
count moving by multiple rooms:
MoveNextRoom
lda RoomType
jsr NextRandom
dey
sta RoomType
bne MoveNextRoom
rts
MovePrevRoom
lda RoomType
jsr PrevRandom
dey
sta RoomType Sold to
[email protected]
151
bne MovePrevRoom
rts
We’ll use the bits of the room number to define where walls will be
in the room. The plan is to divide the playfield into three sections:
• Top (3x2 playfield bytes)
• Middle (3x3 playfield bytes)
• Bottom (3x2 playfield bytes)
We’ll store various “wall components” in tables, and use the bits of
the room number to index into these tables. The first two bits are
used to choose between four different top sections:
BuildRoom
lda RoomType
and #3
jsr MulBy3ToX
lda PFRoomTop0+0,x
sta PFData+0
lda PFRoomTop0+1,x
sta PFData+1
lda PFRoomTop0+2,x
sta PFData+2
lda PFRoomTop1+0,x
sta PFData+3
lda PFRoomTop1+1,x
sta PFData+4
lda PFRoomTop1+2,x
sta PFData+5
Then the next two bits are used for the middle:
lda RoomType
ror
ror
and #3
jsr MulBy3ToX
lda PFRoomMid0+0,x
sta PFData+6
lda PFRoomMid0+1,x
sta PFData+7
lda PFRoomMid0+2,x
sta PFData+8
Sold to
[email protected]
152
lda PFRoomMid1+0,x
sta PFData+9
lda PFRoomMid1+1,x
sta PFData+10
lda PFRoomMid1+2,x
sta PFData+11
lda PFRoomMid2+0,x
sta PFData+12
lda PFRoomMid2+1,x
sta PFData+13
lda PFRoomMid2+2,x
sta PFData+14
The bottom section is the reflection of the top section of the next
room, so that the openings match up. (The left and right rooms will
always have compatible openings.) We can easily call NextRandom
to fetch the next room’s value:
lda RoomType
jsr NextRandom
pha
and #3
jsr MulBy3ToX
lda PFRoomTop1+0,x
sta PFData+15
lda PFRoomTop1+1,x
sta PFData+16
lda PFRoomTop1+2,x
sta PFData+17
lda PFRoomTop0+0,x
sta PFData+18
lda PFRoomTop0+1,x
sta PFData+19
lda PFRoomTop0+2,x
sta PFData+20
We also set the room colors, using this room’s number, and since
we’ve already got it, the next room’s number:
lda RoomType
and #$f0
sta COLUBK ; background color
pla ; next random value, stored Sold
to
[email protected]
153
ora #$08
sta COLUPF ; foreground color
rts
MulBy3ToX
sta Temp
asl ; X*2
clc
adc Temp ; (X*2)+X
tax ; -> X
rts
Then we’ll display the playfield using the two-line kernel as de-
scribed in Chapter 16, but we’ll use another table that maps a
section of the screen into the 21 bytes that define the room. We’ll
call this routine every time we need a new playfield byte:
FetchPlayfield
dec PFOfs
ldx PFOfs
ldy PFOffsets,x ; get index into PFData array
lda PFData,y ; load playfield byte
rts
That’s pretty much it, though we can add a collision routine that
makes the player stop when hitting walls. It does this by detecting
a collision between player and playfield, and if one is detected it
sets the player to its previous position – so the player appears to
“wiggle” between frames:
Sold to
[email protected]
155
31
Drawing Lines
One thing the VCS wasn’t really designed for (among many, many
other things) is drawing arbitrary lines. But nothing’s stopped us
yet, so we’re going to draw some. There are times when a line
comes in handy, like the vine in Pitfall!, or the proton beams in
Ghostbusters.
We’re going to define a line by four components: Its starting and
ending Y coordinate, its starting X coordinate, and a slope:
Sold to
Figure 31.1: A line drawn with player objects
[email protected]
156
the entire lower byte to be 1/256th of its integer value. It’s like
putting an invisible decimal point between each 8 bits of the 16-
bit value, i.e. between the two bytes.
HiByte LoByte
-------- -------- N = integer part
NNNNNNNN.xxxxxxxx x = fractional part
Since the slope is fractional, we’ll need to track the fractional part
of the line’s X position:
lda X1 ; starting X
ldx #2 ; missile 0
jsr SetHorizPos
sta WSYNC
sta HMOVE ; apply fine offsets
From now on, we’re going to use the HMOVE registers to move the
missile, so we don’t need to actually track its X coordinate in
memory. We’ll look up the HMOVE values in a table that range from -7
to +8 pixels (as seen in Figure 9.1):
HMoveTable
hex 7060504030201000f0e0d0c0b0a09080
We also have a table that defines the width of the missile for each
X movement. If the X coordinate moves by just -1, 0, or +1 in a
scanline, we keep the missile just 1 pixel wide. If it moves more
than that, we progressively expand it to make the line look solid.
DotWidths
hex 40403030201000000010203030404040
Sold to
[email protected]
157
We track the Y position in the Y register starting at 0. The first step
in the loop is to see if we’re within the upper and lower Y values:
ldy #0
sty XFrac ; reset X fractional part
ScanLoop
cpy Y1
bcc NoLine ; out of bounds (< Y1)?
cpy Y2
bcs NoLine ; out of bounds (> Y2)?
The NoLine branch just does a WSYNC, hides the missile, and goes to
the next scanline:
NoLine
sta WSYNC
lda #0
sta ENAM0 ; hide missile
jmp NextScan
lda XFrac
clc
adc Slope ; this sets carry flag
sta XFrac
The Carry flag will be set if the X fractional part exceeds 255. Now
we add the high byte of the slope to the Carry flag, plus 7 so that we
can index our two tables:
lda Slope+1
adc #7 ; 7 + carry flag
tax ; -> X
Sold to
[email protected]
158
Now we can index our two tables. One looks up the HMOVE value
so we can move the missile, and the other looks up the width of the
missile so we have a solid-looking line if we move more than 1 pixel:
Now we WSYNC and apply the register values, and also enable the
missile in case it is hidden. Note that we HMOVE before we set the HMM0
register – this is because we want to delay the line’s movement until
the next scanline, so that the missile’s width fills up the gap:
NextScan
sta WSYNC
sta HMOVE ; apply moves on previous scanline
ldx #2
stx ENAM0 ; enable missile
ldx Temp
stx NUSIZ0 ; set missile width
sta HMM0 ; set HMM0 for next scanline
iny
cpy #192
bcc ScanLoop ; any more scanlines?
beq DoneLine ; branch always taken
lda XFrac
clc
adc Slope
sta XFrac ; add slope to X fractional part
lda #0
sta HMCLR ; clear HMOVE registers
bcc .noMove
lda HMoveDir ; HMOVE direction, either #$10 or #$f0
.noMove:
sta HMM0 ; store HMOVE register
Sold to
[email protected]
160
32
The Sound and Music
Channel 0
AUDC0
AUDF0
AUDV0
Audio Output
In our main frame loop, we call the music subroutine during the
VBLANK period, once for each channel/track:
TIMER_SETUP 37
ldx #0
jsr MusicFrame
ldx #1
jsr MusicFrame
TIMER_WAIT
Sold to
[email protected]
164
This should take no more than a few scanlines to complete in
the worst case. The MusicFrame routine first decrements the note’s
duration, checking to see if it is finished playing:
MusicFrame
dec chan0dur,x ; decrement note duration
bpl SkipLoadNote ; only load if duration < 0
If a note is currently playing, we grab its pitch and set the appropri-
ate TIA register. We also calculate the volume as (duration_remaining/2
and stuff that into the TIA volume register:
PlayNote
lda chan0note,x
sta AUDF0,x
lda chan0dur,x
clc
ror
sta AUDV0,x
rts
TryAgain
ldy pat0idx,x ; load index into pattern table
lda Patterns,y ; load pattern code
beq NextPattern ; end of pattern?
If the byte loaded is zero, our pattern has ended and we have to go
to NextPattern which loads the next pattern in the track. Otherwise,
we continue:
Note that instead of 5 RORs we could also do 4 ROLs, sinced the rotate
instructions shift through the carry bit and wrap around to the
other side.
This decodes the note’s duration, by shifting right five bits with ROR
and isolating the first three bits with AND. We use BEQ to check if
the duration is zero, since this indicates a special case with a TONE
command. Otherwise, we continue:
tay ; Y = duration
lda DurFrames,y ; look up in duration table
sta chan0dur,x ; save note duration
pla ; pop saved value into A
and #$1f ; extract first 5 bits
sta chan0note,x ; store as note value
NoteTone
pla
and #$f
beq NextPattern
sta AUDC0,x
jmp TryAgain
NextPattern
ldy trk0idx,x
lda Track0,y
Sold to
[email protected]
166
beq ResetTrack
sta pat0idx,x
inc trk0idx,x
And if the next pattern offset was also zero, we’d reset both tracks
back to the beginning:
ResetTrack
lda #0
sta trk0idx
lda #Track1-Track0
sta trk1idx
Improvements:
• The NoteTone subroutine just tapers off the volume linearly. We
could have a different “envelope” that tapers upwards then
downwards, or any other shape driven by a table.
Sold to
[email protected]
167
33
Pseudo-3D: Sunsets and Starry
Nights
One easy way to convince the player that they’re gazing into the
distance is to show them a pretty sunset. The team at Activision
specialized in sunsets, starting with the title Barnstorming. A
sunset-colored stripe even became part of their logo.
It turns out the VCS is pretty good at sunsets, since it has 128
different colors and can draw horizontal lines prettySold
well.to
[email protected]
168
Figure 33.1: Sunset with clouds and mountains
33.2 Mountains
Now, after the sky is done, we’ll draw a short seven-line segment of
mountains. This will give an even more interesting look. We’ll again
use the playfield for the mountains, and make them a flat color.
We’ll keep incrementing the sky color on every scanline to make
the sunset look like it’s peeking out from behind the mountains.
We’ll also have the mountains change colors during the game-day,
using a lookup table with only 16 entries (as opposed to 64 for the
sky and clouds). This is similar to the previous loop, except we do
one scanline at a time. We also have a 16-entry table for the ground
color, which we’ll set after we draw the mountains.
lda TimeOfDay+1
lsr
lsr ; divide time-of-day by 4
and #$f ; keep in range 0-15
tax ; -> Y Sold to
[email protected]
170
lda MountainColors,x ; load mountain color
sta COLUPF ; set foreground
lda GroundColors,x ; load ground color
pha ; save it for later
ldx #0
stx PF0
stx PF1 ; to avoid artifacts, we have to
stx PF2 ; clear previous clouds
.MtnLoop
lda SunsetColors,y ; get sunset color
sta WSYNC ; start scanline
sta COLUBK ; set background color
lda MtnPFData0,x ; load mountains -> playfield
sta PF0
lda MtnPFData1,x
sta PF1
lda MtnPFData2,x
sta PF2
iny ; next sky color
tya
and #$3f ; keep sky color in range 0-63
tay ; sky color -> Y
inx
cpx #7 ; only 7 scanlines for the
mountains
bne .MtnLoop
pla ; restore ground color
sta COLUBK ; set background
Since drawing the sky is fun, let’s try to do a night sky with stars.
This takes advantage of a TIA “feature” involving the ball object.
Whenever you reset the ball position with RESBL, the TIA draws it
immediately at the current TIA color clock. You can draw as many
balls as you want on a given scanline, and if you strobe RESBL in a
loop across multiple scanlines, you can make a pattern of dots.
For stars, we’re going to wait a “random” time between each ball.
We don’t need a lot of variation, but just enough to make sure
the stars don’t line up in any discernable pattern. So we generate
pseudorandom numbers by reading the code in ROM, Soldtest/shift
to a
[email protected]
171
Figure 33.2: Pseudo-3d road with stars
few of their bits, and use branch instructions (which take three
cycles if the branch is taken, two if not) to add between zero and
four additional cycles between each star. This gives us a nice star
density but also adds enough spacing that the stars look randomly
distributed.
To figure out when to stop drawing stars, we’ll read the timer
register and stop when it goes below a predetermined value.
DrawNight subroutine
lda #6
sta ENABL ; enable ball
sta COLUPF ; set ball color
ldy #0
.MoreStars
sta RESBL ; strobe the ball to display a star
adc Start,y ; "randomize" the A register
bmi .Delay1 ; +1 cycle if bit 7 set
.Delay1
ror ; shift lo bit into carry
bcs .Delay2 ; +1 cycle if bit 0 set
.Delay2
ror ; shift lo bit into carry
bcs .Delay3 ; +1 cycle if bit 1 set
.Delay3
ror ; shift lo bit into carry
bcs .Delay4 ; +1 cycle if bit 2 set
Sold to
[email protected]
172
.Delay4
iny ; next "random" number
ldx INTIM ; load timer
cpx #$89 ; timer says we’re done?
bcs .MoreStars ; nope, make more stars
lda #0
sta ENABL ; disable ball
rts
You can tweak the various delay branches until you get a star
pattern that looks good to you.
So far so good; we’ve got a nice little gradiated sky with clouds and
mountains, and a solid-colored ground. In the next chapter we’ll
learn how to build a curving road disappearing to the horizon.
TIP: To see this code and the code from Chapter 34: Driving
Down the Road in action and directly manipulate it in real-time,
check out the Pseudo 3D example on 8bitworkshop.com.
Sold to
[email protected]
173
34
Pseudo-3D: Driving Down the Road
You may have seen games on the VCS like Activision’s Enduro and
Atari’s Pole Position that are from the perspective of a camera above
and behind a car. The car is driving on a track that disappears into
the horizon. It’s not sorcery, just some clever manipulation of TIA
graphics objects.
We already used the playfield to draw clouds and mountains, but
we’re going to now leave that alone and use the two missiles and
ball objects. With these, we’ll draw the two shoulders of the road,
and also the dashed center line.
XPos = 72;
TPos = TrackFrac;
for (int i=31; i>=0; i++) {
XPos += XVel;
RoadX[i] = XPos>>8;
XVel += TrackData[TPos>>8];
TPos += TrackLookahead;
TrackLookahead++;
}
.CurveLoop
; Modify X position
; XPos += XVel (16 bit add)
lda XPos
clc
adc XVel
sta XPos
lda XPos+1
adc XVel+1
sta XPos+1
sta RoadX0,x ; store in RoadX0 array
; Modify X velocity (slope)
; XVel += TrackData[TPos]
ldy TPos+1 ; get track data offset
lda TrackData,y ; load track curve data
clc ; clear carry for ADC
bmi .CurveLeft ; track slope negative?
adc XVel
sta XVel
lda XVel+1
adc #0 ; carry +1
jmp .NoCurveLeft
.CurveLeft
adc XVel
sta XVel
Sold to
lda XVel+1
[email protected]
177
sbc #0 ; carry -1
nop ; make the branch timings are the same
.NoCurveLeft
sta XVel+1
; Advance TPos (TrackData index)
; TPos += TrackLookahead
lda TPos
clc
adc TrackLookahead
sta TPos
lda TPos+1
adc #0
sta TPos+1
; Go to next segment
inc TrackLookahead ; see further along track
dex
bpl .CurveLoop
lda TrackFrac
asl Sold to
[email protected]
179
asl ; TrackFrac * 4
sta ZOfs ; for animated stripe
ldx #0
.RoadLoop
lda RoadColors,x ; color of sides and center line
sta COLUP0
sta COLUP1
sta COLUPF
lda RoadX0+1,x ; get next X coordinate
sec
sbc RoadX0,x ; subtract this X coordinate
clc
adc #7 ; add 7
tay ; -> Y
lda HMoveTable-2,y ; left side biased -2
sta HMM0 ; -> missile 0 fine offset
lda HMoveTable,y ; center line
sta HMBL ; -> ball fine offset
lda HMoveTable+2,y ; right side biased +2
sta HMM1 ; -> missile 1 fine offset
sta WSYNC
sta HMOVE ; apply fine offsets
sta WSYNC
lda ZOfs
sec
sbc INTIM
sta ZOfs ; ZOfs -= timer
rol
rol
rol ; shift left by 3
sta ENABL ; enable ball (bit 2) Sold to
[email protected]
180
sta WSYNC
lda RoadWidths,x ; lookup register for missile size
sta NUSIZ0
sta NUSIZ1
sta WSYNC
inx
cpx #NumRoadSegments-1
bne .RoadLoop
PrevGenCur = GenCur
GenCur += GenDelta
if ((GenCur >= GenTarget && GenTarget >= 0) ||
(GenCur < GenTarget && GenTarget < 0)) {
GenTarget = random number from -31..32
if (GenTarget - GenCur >= 0)
GenDelta = random number from 1..15
else
GenDelta = random number from -15..-1
GenCur = PrevGenCur
}
GenTrack subroutine
; Shift the existing track data one byte up
; (a[i] = a[i+1])
ldx #0
.ShiftTrackLoop
lda TrackData+1,x
sta TrackData,x
inx
cpx #TrackLen-1
bne .ShiftTrackLoop
; Modify our current track value and
; see if it intersects the target value
lda GenCur
clc
adc GenDelta
cmp GenTarget
beq .ChangeTarget ; target == cur?
bit GenTarget ; see if target >=0 or <0
bmi .TargetNeg
bcs .ChangeTarget ; target>=0 && cur>=target
bcc .NoChangeTarget
.TargetNeg
bcs .NoChangeTarget ; target<0 && cur<target
; Generate a new target value and increment value,
; and make sure the increment value is positive if
; the target is above the current value, and negative
; otherwise
.ChangeTarget
jsr NextRandom ; get a random value
and #$3f ; range 0..63
sec
sbc #$1f ; range -31..32
sta GenTarget ; -> target
cmp GenCur
bmi .TargetBelow ; current > target?
jsr NextRandom ; get a random value
and #$f ; mask to 0..15
Sold to
jmp .TargetAbove
[email protected]
182
.TargetBelow
jsr NextRandom
ora #$f0 ; mask to -16..0
.TargetAbove
ora #1 ; to avoid 0 values
sta GenDelta ; -> delta
lda GenCur
.NoChangeTarget
; Store the value in GenCur, and also
; at the end of the TrackData array
sta GenCur
sta TrackData+TrackLen-1
rts
Sold to
[email protected]
184
35
Bank Switching
35.1 Trampolines
Sold to
[email protected]
186
The trampoline would look like this:
BankSwitch
pha ; push hi byte
tya ; Y -> A
pha ; push lo byte
bit $1FF8,x ; do the bank switch
rts ; return to target
BankResetStart
ldy #<(Start-1)
lda #>(Start-1)
ldx #0
; ... execution continues with trampoline
Then we just ensure that each bank contains an identical RESET vec-
tor in $FFFC/FFFD that points to this routine, and that the BankResetStart/B
code is present at the same location in all banks.
In a real program, you’d probably use a macro to make this stuff
foolproof. The example in subsection 35.4 demonstrates this. You
Sold to
[email protected]
187
can also check out the Bankswitching example available in the
8bitworkshop emulator.
NOTE: When bankswitching, we always use a read instruction (BIT
and CMP work well, or even the undocumented nop aaaa instruction)
because write instructions may cause bus conflicts.
You might see statements in bankswitched code that look like this:
;;; BANK 0
org $1000
rorg $F000
;;; BANK 1
org $2000
rorg $F000
We’ve seen ORG before, but not RORG. ORG means origin and RORG means
relocatable origin. ORG affects where the code is physically placed in
the ROM image, but RORG is where the code thinks it’s placed. In most
VCS bankswitching methods, the ORGs will be evenly spaced and the
RORGs will be identical.
Sold to
[email protected]
188
35.4 Bankswitching Example
processor 6502
include "vcs.h"
include "macro.h"
include "xmacro.h"
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
seg.u Variables
org $80
Temp .byte
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
seg Code
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; BANK 0
org $1000
rorg $F000
Sold
;----The following code is the same on both banks---- to
[email protected]
189
Start
; Ensure that bank 0 is selected
lda #>(Reset_0-1)
ldy #<(Reset_0-1)
ldx #0
BankSwitch
BANK_SWITCH_TRAMPOLINE
;----End of bank-identical code----
Reset_0
CLEAN_START
lda #$30
sta COLUBK ; make the screen red
bit INPT4 ; test button
bmi Reset_0 ; button not pressed, repeat
; Switch to Bank 2 routine
lda #>(Main_1-1)
ldy #<(Main_1-1)
ldx #1
jmp BankSwitch
; Bank 0 epilogue
org $1FFA
rorg $FFFA
.word Start ; NMI
.word Start ; RESET
.word Start ; BRK
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; BANK 1
org $2000
rorg $F000
;----The following code is the same on both banks----
Start
; Ensure that bank 0 is selected
lda #>(Reset_0-1)
ldy #<(Reset_0-1)
ldx #0
BankSwitch
BANK_SWITCH_TRAMPOLINE
;----End of bank-identical code----
Main_1
inc Temp
lda Temp
sta COLUBK ; make rainbows
Soldto
bit INPT4 ; test button
[email protected]
190
bpl Main_1 ; button is pressed, repeat
BANK_SWITCH 0,Reset_0
; Bank 1 epilogue
org $2FFA
rorg $FFFA
.word Start ; NMI
.word Start ; RESET
.word Start ; BRK
Sold to
[email protected]
191
36
Wavetable Audio
The VCS can produce a variety of shrill and flatulent noises which
work surprisingly well for sound effects, but unless you’re a fan
of “flat twos” and the Phrygian mode, you may not be completely
satisfied with its musical abilities.
Activision’s Pitfall 2 was the zenith of VCS technical achievements
in 1984. It featured a custom “Display Processor Chip” (DPC) chip
inside the cartridge, to which the CPU would offload several CPU-
intensive functions like sprite calculation. Another thing it would
do is generate three-voice music and pipe it through a single VCS
sound channel.
It turns out we can do the same thing the DPC chip does with music,
except we’ll use most of our CPU time doing it.
lda Cycle0Lo
clc
adc Delta0Lo
sta Cycle0Lo
lda Cycle0Hi
adc Delta0Hi
and #$1F
sta Cycle0Hi ; Cycle = (Cycle+Delta) & 0x1f00
tay ; hi byte -> Y
lda Wavetable,y ; lookup sample in wavetable
sta AUDV0 ; store in audio volume register
We can mix two voices for each TIA audio channel for a total of four
simultaneous voices. This takes the CPU about 140 cycles, which
is almost two scanlines, so when generating four-voice wavetable
audio, we don’t have time left over to do much video, unfortunately.
It also gives us an upper frequency of 7860 Hz.
Our wavetable is 32 bytes long. This table defines a simple triangle-
shaped wave, but it could be a sine wave or any other form:
Wavetable
hex 00010203 04050607 08090a0b 0c0d0e0f
hex 0f0e0d0c 0b0a0908 07060504 03020100
We can use a precomputed table of delta values for each note in the
chromatic scale:
align $100
NoteDeltas
word 9, 9, 10, 10, 11, 11, 12, 13, 14, 14, 15, 16
word 17, 18, 19, 20, 22, 23, 24, 26, 27, 29, 30, 32
word 34, 36, 38, 41, 43, 46, 48, 51, 54, 57, 61, 65
word 68, 72, 77, 81, 86, 91, 97, 102, 108, 115, 122, 129
word 137, 145, 153, 163, 172, 182, 193, 205, 217, 230,
244, 258
word 273, 290, 307, 325, 344, 365, 387, 410,Sold
434,to
460,
487, 516
[email protected]
194
word 547, 579, 614, 650, 689, 730, 773, 819, 868, 920,
974, 1032
word 1093, 1159, 1227, 1300, 1378, 1460, 1546, 1638,
1736, 1839, 1948, 2064
word 2187, 2317, 2455, 2601, 2755, 2919, 3093, 3277,
3472, 3678, 3897, 4128
word 4374, 4634, 4910, 5202, 5511, 5839, 6186, 6554,
6943, 7356, 7793, 8257
word 8748, 9268, 9819, 10403, 11022, 11677, 12371, 13107
ldy #48
lda NoteDeltas,y
sta Delta0Lo
lda NoteDeltas+1,y
sta Delta0Hi
All we’ve done here is play a droning infinite chord, but you could
extend this code to make a music player by loading different notes
at appropriate intervals, as we do in the Wavetable Sound example
in the 8bitworkshop emulator. Displaying graphics at the same time
would be tricky to say the least, but dedicated Atari homebrew
authors have done it (look for a cartridge called Stella’s Stocking
online).
Sold to
[email protected]
195
37
Paddles
The paddles are potentiometers (knobs) that travel 330 degrees and
have a single button. The VCS supports up to four of them. There’s
really no equivalent device on most standard game controllers or
keyboards, but here’s how to read them anyway.
First, reading the switches on each paddle is just like reading the
joysticks/switches, as we did in Chapter 19. There are 4 bits in the
SWCHA register, one for each paddle:
Just like the joysticks, the bit is 0 if the paddle button is pressed, 1
otherwise.
Reading the potentiometer (knob) value is a little more compli-
cated. We would like a single number that reads 0 when the paddle
is turned all the way counter-clockwise, and at its maximum (say,
255 or $FF) when turned all the way clockwise. But that’s not how it
works on the VCS.
The paddles are connected to a capacitor, which charges at different
Sold
rates depending on the position of the potentiometer. Youtoread the
[email protected]
196
paddle position by measuring the time it takes for the potentiome-
ter to discharge. Oddly enough, the capacitor is controlled by the
VBLANK register:
VERTICAL_SYNC
lda #$82
sta VBLANK ; turn off video; dump paddles to ground
TIMER_SETUP 37
TIMER_WAIT
lda #0
sta VBLANK ; turn on video; remove ground dump
TIMER_SETUP 192
.Loop
lda INTIM ; get timer value
beq .Exit
bit INPT0 ; paddle discharged?
bpl .Discharged ; yes, store value
.byte $2c ; skip next insn (BIT opcode)
.Discharged
sta Paddle1 ; store paddle value
; ... draw video ...
jmp .Loop
.Exit
Note that this loop doesn’t draw anything, it just checks the paddle
position continuously until the bottom of the frame. If you wanted
to add graphics, you’d have to add STA WSYNCs and other stuff, but
keep the paddle-checking code (you could make it a macro, too).
If the paddle is turned all the way right, it will discharge almost
immediately. If centered, it will take about 190 scanlines to
discharge. It turned all the way left, it will take about 380 scanlines
– way more than the number of scanlines in a frame!
This means you not only have to check paddles during the 192
scanlines of your visible frame, but all through the overscan,
VSYNC and VBLANK periods too – and measuring Sold to paddle
a single
[email protected]
197
Hex Bits Used
Addr Name 76543210 Description
38 INPT0 x....... Dumped Input Port 0
39 INPT1 x....... Dumped Input Port 1
3A INPT2 x....... Dumped Input Port 2
3B INPT3 x....... Dumped Input Port 3
Table 37.2: Paddle Registers
Sold to
[email protected]
198
38
Illegal Opcodes
Translating the 8-bit opcode for a 6502 instruction into actions that
the CPU can perform is called decoding, and it requires a fair bit
of silicon to pull off. Not every opcode is a valid instruction –
some instructions are considered “illegal,” and the chip designers
decided to save silicon by not explicitly preventing them from
executing.
Most of these illegal instructions don’t crash the CPU, but result
in odd combinations of other instructions – if this was David
Letterman, they might be on the segment “Stupid CPU Tricks!”
However, some of them can be useful in some situations. It’s
unlikely that early VCS developers took advantage of them, because
they were probably worried about compatibility with future hard-
ware. But if you are willing to break the official rules, you can save
a few cycles where there’s no other good option. (Note that some
recent Atari clones, like the Flashback 2, are reported not to support
these instructions, so caveat emptor.)
A lot of these instructions combine two different 6502 operations,
for instance:
SAX - Performs a bitwise AND with A and X, then stores the result. No
flags are affected.
LAX - LDA then TAX. All addressing modes are available except for
immediate. Sold to
[email protected]
199
ANC - with immediate mode operand then copy bit 7 (Nega-
AND
tive/Sign) to Carry.
ASR - AND with immediate mode operand then LSR.
ARR - AND then ROR. Sets Carry and Overflow bits strangely.
SBX - X = (A AND X)-#operand. Sets Negative, Zero, Carry.
lda #SpriteHeight
dcp LinesLeft
bcs SkipDraw
Sold to
[email protected]
200
39
Precise Pitch via Duty Cycling
We learned how to play music on the VCS in Chapter 32. Since each
of the two sound channels can generate 96 possible frequencies (32
for each of the 3 base clocks), many notes were out of tune. If we’re
willing to spend a little more CPU time, we can get more accurate
pitch.
We accomplish this by duty cycling (or modulating) the frequency
divisor for each channel. In other words, we switch between two
adjacent frequency values, potentially lingering on one value longer
than the other. If this is done rapidly, the human ear perceives the
average frequency.
For example, if we emit a 400 Hz tone 75% of the time, and a 440
Hz tone 25% of the time, the average frequency perceived will be
(400 ∗ 0.75 + 440 ∗ 0.25) = 410 Hz.
Our new music player will have three different lookup tables to
lookup the AUDC, AUDF, and duty cycle for each note. The duty
cycle is given in a bitmask:
MAC DUTYCYCLE
TIMER_SETUP 34
jsr DutyCycle ; cycle notes
jsr DrawBitmap ; draw ~33 lines of bitmap
ENDM
The PULSE macro is expanded twice per frame, and does the same
thing as DUTYCYCLE except it also decrements the volumes for each
channel and the note timer, fetching the next note if neccessary:
MAC PULSE
TIMER_SETUP 35
jsr DutyCycle ; cycle notes
jsr Pulse ; decrement duration timer
jsr DrawBitmap ; draw ~34 lines of bitmap
ENDM
NextFrame
VERTICAL_SYNC ; 4 scanlines
lda #$d0
sta BitmapY ; reset to top of bitmap Sold to
[email protected]
202
PULSE ; 34 scanlines
DUTYCYCLE ; 32 scanlines
DUTYCYCLE ; 32 scanlines
DUTYCYCLE ; 32 scanlines
PULSE ; 34 scanlines
DUTYCYCLE ; 32 scanlines
DUTYCYCLE ; 32 scanlines
DUTYCYCLE ; 32 scanlines
jmp NextFrame
All-in-all, this loop generates 264 scanlines while cycling the audio
frequency 8 times per channel, and decrementing volume and/or
loading new notes 2 times per channel.
Since our music player has its own lookup table, our song file
doesn’t need AUDC and AUDF values anymore, just the notes and their
durations. Each byte of the song is either a note or a delay.
If a byte’s high bit is clear, it is a note (with range 0-63). Notes are
played immediately, alternating between the two channels.
If the high bit is set, the lower 7 bits are used as a delay (in 1/120 sec
frames). No notes will be fetched until the delay counter expires.
The byte $FF indicates the end of the song, and the music player will
start over at the beginning.
You can create song files from MIDI files using the midi2song.py
script, located at http://8bitworkshop.com/tools/midi2song.py. For
example:
Sold to
[email protected]
203
40
Timing Analysis
Sold to
[email protected]
205
41
Making Games
Now that you have the design figured out, it’s just a simple matter of
programming! Just use your fingers to type the keys in the correct
order!
Joking aside, it’s a good idea to start with a template like the
skeleton NTSC frame example in Figure 12.1 (the IDE creates one
automatically). It’s also convenient to create subroutines where
possible, like this:
NextFrame
VERTICAL_SYNC
TIMER_SETUP 37
jsr FrameSetup
TIMER_WAIT
TIMER_SETUP 192
jsr DrawFrame
TIMER_WAIT
TIMER_SETUP 29
jsr FrameEnd
TIMER_WAIT
jmp NextFrame
Sometimes you don’t have the extra 12 CPU cycles to spare for a
JSR/RTS cycle, so it’s okay to inline the code, too.
Most VCS games showed the same basic display whether or not a
game was active. If you need a title screen or other wholly separate
display kernels, you could duplicate the main loop entirely, or
maybe just use the JMP (xx) instruction to switch between different
kernels:
Sold to
[email protected]
207
TIMER_SETUP 192
jmp (CurrentKernel)
BackFromKernel ; make sure kernel jumps back here
TIMER_WAIT
As usual, it’s a tradeoff between shaving off a few CPU cycles, a few
bytes of ROM, or making the code more readable.
Now that you’ve put hours of work into your game designing,
developing, playtesting, and tweaking, it’s time to share it with the
world!
The easiest way is to just click the "Share" button in the IDE which
generates a shareable Web URL. Anyone opening this link will see
the source code and be able to play the emulated game in the
browser. You can also download the ROM file from the IDE and
distribute it for use in other emulators like Stella.
If you want to play your game on actual hardware, there are several
options. First, you have to get a console. For ultimate authenticity,
you can pick up a vintage Atari VCS/2600 online. You’ll need to
either visit a thrift store to pick up a CRT television (recommended)
or find a monitor that has a composite input.
You could also find a used Atari Flashback 2, which is a modern
reinvention of the VCS that accepts external cartridges and outputs
HDMI.
Now you have to get your game’s ROM into the console. The easiest
way is probably the Harmony Cartridge[8] . Just put your ROM on
a SD or microSD card, slide it into the cartridge, and then
Sold to pop the
[email protected]
208
cartridge into your game console. From there you can select from
available ROMs using an onscreen menu.
You could also DIY your own cartridge by programming a fast
microcontroller to respond to memory requests in the same way a
ROM chip would, and then building a breadboard with a cartridge
slot connector. This is outside the scope of this book, but plenty of
resources are available online.
Once you are satisfied with your game, you can pay for a service like
AtariAge to manufacture your very own cartridge complete with a
custom label.
Sold to
[email protected]
209
42
Troubleshooting
When programming for the Atari, there are times when nothing
seems to work no matter what you try! In this chapter, we’ll list
symptoms you may encounter that indicate common problems, and
include tips for solving them.
Screen “flips” continuously
You are not drawing the right number of scanlines. Make sure your
code draws exactly 262 scanlines by counting WSYNCs and by using
the timer routines in Chapter 12. You can see the current number of
scanlines by clicking the emulator window and typing Ctrl-G (Alt-
G on Mac).
Screen “flips” periodically
If the screen flips only every once in a while, you might have code
that misses scanlines. Make sure you don’t spend more than 75
cycles before a STA WSYNC. If you are using the timer routines, you
may be running past the TIMER_END macro.
Sprites or objects wiggle by one scanline
You may have forgotten to clear (CLC) the carry flag before an ADC,
or set (SEC) the carry flag before a SBC. This leaves the carry flag in
whatever state it was in, and thus the addition or subtraction might
be off by zero or one.
Sold to
[email protected]
210
Garbled sprite that changes with horizontal position
You may be writing a register too late. See if you can rearrange
things so that registers are written to in the 22-CPU cycle HBLANK
period.
Sprites are smeared or move quickly horizontally across the screen
You may be forgetting to reset the HMxx registers (STA HMCLR being the
easiest way), so the sprites move every time HMOVE is strobed.
Setting GRPx registers has no effect
You may have set a VDEL register without realizing it, and maybe you
aren’t alternating writes to GRP0 and GRP1.
Timing problems
If timing isn’t as consistent as you expect it to be, you may have
indexed memory accesses across page boundaries (which add +1
cycle) or branches across page boundaries (which add +1 cycle).
Branches don’t seem to work properly
Make sure you consult the table in Chapter 1 and that you don’t
have any intervening instructions that modify flags.
$ vs % vs #
Remember that $ is for hexadecimal numbers, and % is for binary
numbers. Anything else is treated as a decimal number.
Also remember that unless your operand is prefixed with #, the
instruction loads from memory.
Errors in include files
When using macros, errors might be flagged at the line of the
include declaration instead of where the macro is invoked.
Sold to
[email protected]
214
Appendix B: VCS Colors
Hex +0 +1 +2 +3 +4 +5 +6 +7
00 black dim gray dim gray gray
10 teal midnight blue sea green steel blue
20 navy midnight blue steel blue steel blue
30 navy midnight blue steel blue steel blue
40 dark blue midnight blue dark slate blue slate blue
50 indigo dark orchid dark orchid slate blue
60 purple brown indian red indian red
70 maroon brown sienna indian red
80 dark red firebrick sienna indian red
90 maroon saddle brown sienna indian red
a0 maroon saddle brown sienna dark khaki
b0 dark green dark olive green dark olive green dark khaki
c0 dark green forest green dark olive green cadet blue
d0 dark green forest green sea green cadet blue
e0 dark green dark slate gray dark slate gray cadet blue
f0 navy midnight blue dark slate blue steel blue
Hex +8 +9 +10 +11 +12 +13 +14 +15
00 dark gray silver light silver white smoke
10 med. turquoise med. turquoise turquoise aquamarine
20 steel blue steel blue med. turquoise sky blue
30 slate blue cornflower blue cornflower blue light sky blue
40 slate blue med. purple sky blue light sky blue
50 med. orchid med. orchid light steel blue lavender
60 pale violet-red pale violet-red violet light pink
70 indian red pale violet red dark salmon light pink
80 indian red dark salmon dark salmon light pink
90 indian red burlywood dark salmon navajo white
a0 dark khaki tan burlywood navajo white
b0 dark khaki dark khaki pale goldenrod pale green
c0 dark seagreen dark seagreen light green pale green
d0 dark seagreen med. aquamarine light green aquamarine
e0 cadetblue med. aquamarine med. aquamarine pale turquoise
f0 steel blue med. aquamarine sky blue light sky blue
Sold to
[email protected]
215
Appendix C: 6502 Opcodes
Sold to
[email protected]
223
Appendix D: 6502 Instruction Flags
N = Negative (Sign)
Z = Zero
C = Carry
V = Overflow
D = Decimal
Sold to
[email protected]
225
Summary of Illegal 6502 Instructions
Mnemonic Flags Affected Expression
ANC NZC A &= #opr
ASR NZC A = (A & #opr) » 1
ARR NZCV A = (A & #opr) » 1
DCP NZC (A - opr–)
ISC NZCV A -= opr++
LAS NZ A=X=S = opr & S
LAX NZ A=X = opr
RLA NZC A = (rol A) & opr
RRA NZCV A = (ror A) + opr
SBX NZC X = (A & X) - #opr
SLO NZC A = (A «= 1) | opr
SRE NZC A = (A »= 1) ôpr
N = Negative (Sign)
Z = Zero
C = Carry
V = Overflow
Sold to
[email protected]
226
Appendix E: Header Files
xmacro.h
;-------------------------------------------------------
; Usage: TIMER_SETUP lines
; where lines is the number of scanlines to skip (> 2).
; The timer will be set so that it expires before this number
; of scanlines. A WSYNC will be done first.
MAC TIMER_SETUP
.lines SET {1}
.cycles SET ((.lines * 76) - 13)
; special case for when we have two timer events in a line
; and our 2nd event straddles the WSYNC boundary
if (.cycles % 64) < 12
lda #(.cycles / 64) - 1
sta WSYNC
else
lda #(.cycles / 64)
sta WSYNC
endif
sta TIM64T
ENDM
;-------------------------------------------------------
; Use with TIMER_SETUP to wait for timer to complete.
; Performs a WSYNC afterwards.
MAC TIMER_WAIT
.waittimer
lda INTIM
bne .waittimer
sta WSYNC
ENDM
Sold to
[email protected]
228
Index
16-bit math, 157, 165, 186 joysticks, 73
8bitworkshop.com, 17 paddles, 207
coordinates, 42
6502 CPU cycles
arithmetic, 12 constant number of, 138
assembler, 6
branch instructions, 11 DASM, 18
bus, 5
clock, 4 fixed-point math, 110, 165,
condition flags, 10 186
illegal instructions, 88,
hexadecimal notation, 2
211
indirect addressing, 75 IDE, 17
logical operations, 13 assembler, 18
mnemonic, 6 debugging, 20
pointers, 75 emulator, 18
shift operations, 15 keyboard shortcuts, 21
stack, 12
writing loops, 7 Javatari, 18
ball, 63 kernel, 41
BCD mode, 94 four-phase, 85
6-digit score, 123 skeleton, 69
addition, 126 two-line, 79
two-line and vertical
clockslide, 127 delay, 90
collisions, 97 WSYNC-free, 129
bricks, 107
color linear-feedback shift register,
background, 37 155
playfield, 39 16-bit, 157
sprites, 49 Fibonacci, 156
color clock, 31 Galois, 157
color tables, 49 procedural generation,
controls 159
console switches, 71 Sold 171
sound circuits, to
[email protected]
229
starfield, 182 HMOVE, 55, 151, 166,
local labels, 64 189
lookup tables, 49 artifacts, 153
INTIM (timer), 67
memory map, 23 missiles, 62
missile NUSIZ, 90, 191
drawing, 138 PF0/1/2 (playfield), 39
missiles, 63 player bitmap, 45
music player, 173 REF (reflection), 89
RES (reset position), 46
nibble, 2
SWCHA (joystick), 73
NTSC, 31
SWCHB (switches), 71
NTSC frame, 33
TIM64T (timer), 67
player objects, 43 VBLANK, 35
horizontal positioning, VDEL (vertical delay), 90
45 VSYNC, 35
playfield, 39 WSYNC, 33
asynchronous, 103
scanlines, 31
full-screen bitmap, 99
SetHorizPos, 63, 144, 166,
kernel, 80, 85
189
score mode, 92, 96
signed vs. unsigned, 3
pseudo-3D graphics, 179
sound, 171
stars, 182
DPC chip, 203
sunsets, 179
duty cycling, 213
pseudorandom number
music player, 173
generator, 155
wavetables, 204
register sprites, 44
strobing, 36 48-pixel, fixed, 113
registers 48-pixel, moveable, 127
AUD (sound), 171 48-pixel, text, 117
ball, 62 color tables, 49
COLUBK, 37 multiple, 143
COLUPF, 39 number and size
CTRLPF, 91 registers, 90
CX (collision), 97 priority, 91
HMCLR, 56, 168, 189 reflection, Sold
89 to
[email protected]
230
retriggering, 131
single-height vs.
double-height, 49
text
12 character, 117
scoreboard, 93
timer, 67
approximate scanline,
142
drawing missiles, 138
macros, 69
scanline lookup table,
140
Sold to
[email protected]
231