How To Code v7
How To Code v7
Based on AmigaGuide Release 7/July/93 This HTML Release 18/Jan/03 by Jolyon Ralph. Introduction General Guidelines Assembler 680x0 issues Action Replay AGA Programming Information Blitter CDTV programming Copper Programming Vector Coding Interrupts Debugging Input Kickstart Miscellaneous Optimising Reading C Startup and Exit Problems Tracker Problems Video Standards Books startup.asm - Copper Startup code
Introduction to HowToCode
This file has grown somewhat from the file uploaded over Christmas 1992. I've been very busy over the last two months, so I'm sorry that I haven't been able to update this sooner. It started as an angry protest after several new demos I downloaded refused to work on my 3000, and has ended up as a sort of general how-to-code type article, with particular emphasis on the Amiga 1200. Now, as many of you may know, Commodore have not released hardware information on the AGA chipset, indeed they have said they will not (as the registers will change in the future). Demo coders may not be too concerned about what is coming in a year or two, but IF YOU ARE WRITING COMMERCIAL SOFTWARE you must be. Chris Green, from Commodore US, asked me to mention the following: "I'd like it if you acknowledged early in your text that it IS possible to do quite exciting demos without poking any hardware registers, and that this can be as interesting as direct hardware access.
amiga.physik.unizh.ch has two AGA demos with source code by me, AABoing and TMapdemo. These probably seem pretty lame by normal demo standards as I didn't have time to do any nifty artwork or sound, and each only does one thing. but they do show the POTENTIAL for OS friendly demos." I have seen these demos and they are very neat. Currently you cannot do serious copper tricks with the OS (or can you Chris? I'd love to see some examples if you can...), for example smooth gradiated background copperlists or all that fun messing with bitplane pointers and modulos. But for a lot of things the Kickstart 3.0 graphics.library is very capable. If you are in desperate need for some hardware trick that the OS can't handle, let Chris know about it, you never know what might make it into the next OS version! Chris mentions QBlit and QBSBlit, interrupt driven blitter access. These are things that will make games in particular far easier to write under the OS now. Chris also says "Note that if I did a 256 color lores screen using this document, it would run fifty times slower than one created using the OS, as you haven't figured out enhanced fetch modes yet. A Hires 256 color screen wouldn't even work." There are some new additions to the AGA chapter that discuss some of this problem, but if you want maximum performance from an AGA system, use the OS. Remember that on the A1200 chipram has wait-states, while the 32-bit ROM doesn't. So use the ROM routines, some of them run faster than anything you could possibly write (on a A1200 with just 2Mb ram). The only drawback is again documentation. To learn how to code V39 OS programs you need the V39 includes and autodocs, which I'm not allowed to include here. Perhaps, in a later release, I'll give some highlights of V39 programming... Get Chris Green's example code, it's a good place to start. Register as a developer with your local Commodore office to get the autodocs and includes, it's relatively inexpensive (85 per year in the UK). You can now buy the includes/autodocs and Rom Kernal manuals in AmigaGuide from Commodore US on a CD-ROM (developers only), the CATS CD Edition 2. It's excellent! Most demos I've seen use similar startup code to that I was using back in 1988. Hey guys, wake up! The Amiga has changed quite a bit since then.
Read the Manuals Self-Modifying code Use Relocatable code All addresses are 32bit Packers/Crunchers Avoid unnecessary hardware access Opening libraries properly Nothing is fixed - almost! Version Numbers System private structures/functions
Self-Modifying Code
Don't use self-modifying code. Processors with cache ram cannot handle self-modifying code at all well. They grab a large number of instructions from ram in one go, and execute them from cache ram. If these instructions alter
themselves the changes are not made to the copy in cache ram, so the code can crash. The larger the cache the more likely this is to happen, even when you think you will be safe, so the best strategy is to either a) Disable caches (and suffer a large speed-loss penalty) b) Avoid using self modifying code.
Keep your code in multiple sections. Several small sections are better than one large section, they will more easily fit in and run on a system with fragmented memory. Lots of calls across sections are slower than with a single section, so keep all your relevent code together. Keep code in a public memory section:
section mycode,code
Never use code_f,data_f or bss_f as these will fail on a chipram only machine. And one final thing, I think many demo coders have realised this now, but $C00000 memory does not exist on any production machines now, so stop using it!!!
32Bit Addresses
Always treat *ALL* addresses as 32-bit values. "Oh look" says clever programmer. "If I access $dcdff180 I can access the colour0 hardware register, but it confuses people hacking my code!".
Oh no you can't. On a machine with a 32-bit address bus (any accelerated Amiga) this doesn't work. And all us hackers know this trick now anyway :-) Always pad out 24-bit addresses (eg $123456) with ZEROs in the high byte ($00123456). Do not use the upper byte for data, for storing your IQ, for scrolly messages or for anything else. Similarly, on non ECS machines the bottom 512k of memory was paged four times on the address bus, eg:
move.l #$12345678,$0 move.l $80000,d0 move.l $100000,d1 move.l $180000,d2 ; d0 = $12345678 ; d1 = $12345678 ; d2 = $12345678
This does not work on ECS and upwards!!!! You will get meaningless results if you try this, so PLEASE do not do it!
Using Packers/Crunchers
Don't ever use Tetrapack or Bytekiller based packers. They are crap. Many more demos fall over due to being packed with crap packers than anything else. If you are spreading your demo by electronic means (which most people do now, the days of the SAE Demodisks are long gone!) then assemble your code, and use LHARC to archive it, you will get better compression with LHARC than with most runtime packers. If you *have* to pack your demos, then use Powerpacker 4+, Turbo Imploder or Titanics Cruncher, which I've had no problems with myself. (found in the documentation to IMPLODER 4.0)
>** 68040 Cache Coherency ** > >With the advent of the 68040 processor, programs that diddle with code which is >subsequently executed will be prone to some problems. I don't mean the usual >self-modifying code causing the code cached in the data cache to no longer >be as the algorithm expects. This is something the Imploder never had a >problem with, indeed the Imploder has always worked fine with anything >upto and including an 68030. > >The reason the 68040 is different is that it has a "copyback" mode. In this >mode (which WILL be used by people because it increases speed dramatically) >writes get cached and aren't guaranteed to be written out to main memory >immediately. Thus 4 subsequent byte writes will require only one longword >main memory write access. Now you might have heard that the 68040 does >bus-snooping. The odd thing is that it doesn't snoop the internal cache >buses! > >Thus if you stuff some code into memory and try to execute it, chances are >some of it will still be in the data cache. The code cache won't know about >this and won't be notified when it caches from main memory those locations >which do not yet contain code still to be written out from the data caches. >This problem is amplified by the absolutely huge size of the caches. > >So programs that move code, like the explosion algorithms, need to do a >cache flush after being done. As of version 4.0, the appended decompression >algorithms as well as the explode.library flush the cache, but only onder OS >2.0. The reason for this is that only OS 2.0 has calls for cache-flushing. > >This is yet another reason not to distribute imploded programs; they might
>just cross the path of a proud '40 owner still running under 1.3.
I doubt it! Only a complete *IDIOT* would run an '040 under KS1.3. They *deserve* to have their software crash!!
>It will be interesting to see how many other applications will run into >trouble once the '40 comes into common use among Amiga owners. The problem >explained above is something that could not have been easily anticipated >by developers. It is known that the startup code shipped with certain >compilers does copy bits of code, so it might very well be a large problem.
You can use the following exec.library functions to solve the problem. CacheClearU and CacheControl(). Both functions are available with Kickstart 2.0 and above. I strongly disadvise trying to 'protect' code by encrypting parts of it, it's very easy for your code to fail on >68000 if you do. What's the point anyway? Lamers will still use Action Replay to get at your code. I never learnt anything by disassembling anyones demo. It's usually far more difficult to try and understand someone elses (uncommented) code than to write your own code from scratch. exec.library/CacheClearU
CacheClearU - User callable simple cache clearing (V37) CacheClearU() -636 void CacheClearU(void); Flush out the contents of any CPU instruction and data caches. If dirty data cache lines are present, push them to memory first. Caches must be cleared after *any* operation that could cause invalid or stale data. The most common cases are DMA and modifying instructions using the processor. Some examples of when the cache needs clearing: Self modifying code Building Jump tables Run-time code patches Relocating code for use at different addresses. Loading code from disk
exec.library/CacheControl
CacheControl - Instruction & data cache control oldBits = CacheControl(cacheBits,cacheMask) D0 -648 D0 D1 ULONG CacheControl(ULONG,ULONG); This function provides global control of any instruction or data caches that may be connected to the system. All settings are global -- per task control is not provided. The action taken by this function will depend on the type of CPU installed. This function may be patched to support external caches, or different cache architectures. In all cases the function will attempt to best emulate the provided settings. Use of this function may save state specific to the caches involved.
The list of supported settings is provided in the exec/execbase.i include file. The bits currently defined map directly to the Motorola 68030 CPU CACR register. Alternate cache solutions may patch into the Exec cache functions. Where possible, bits will be interpreted to have the same meaning on the installed cache. IN: cacheBits - new values for the bits specified in cacheMask. cacheMask - a mask with ones for all bits to be changed. OUT: oldBits
Oh yes, graphics.library is always going to be second down the chain from Execbase? No way! (Note by Michel: I'm sorry... I'll never do it again :) ) If you want to access gfxbase (or any other library base) OPEN the library. Do not wander down the library chain, either by guesswork or by manually checking for "graphics.library" in the library base name. OpenLibrary() will do this for you. Here is the only official way to open a library.
MOVEA.L LEA.L MOVE.L JSR 4,a6 gfxname(PC),a1 #39,d0 _LVOOpenLibrary(a6)
; version required (here V39) ; resolved by linking with amiga.lib ; or by include "exec/exec_lib.i"
TST.L d0 BEQ.S OpenFailed ; use the base value in d0 as the a6 for calling graphics functions ; remember d0/d1/a0/a1 are scratch registers for system calls
gfxname
DC.B
'graphics.library',0
Don't use OldOpenLibrary! Always open libraries with a version, at least V33. V33 is equal to Kickstart 1.2. And DON'T forget to check the result returned in d0 (and nothing else). OldOpenLibrary saves no cycles. All it does is
moveq.l JMP #0,d0 _LVOOpenLibrary(a6)
Version Numbers
Put version numbers in your code. This allows the CLI version command to determine easily the version of both your source and executable files. Some directory utilities allow version number checking too (so you can't accidentally copy a newer version of your source over an older one, for example). Of course, if you pack your files the version numbers get hidden. Leaving version numbers unpacked was going to be added to PowerPacker, but I don't know if this is done yet. A version number string is in the format
$VER: howtocode6 7.0 (13.06.92) ^ ^ ^Version number (date is optional) | | | | File Name | | Identifier
The Version command searches for $VER and prints the string it finds following it. For example, adding the line to the begining of your source file
; $VER: MyFunDemo.s 4.0 (21.01.93)
dc.b
This can be very useful for those stupid demo compilations where everything gets renamed to 1, 2, 3, etc... [Ed: Hi Russ!] Just do version 1 to get the full filename (and real date)
l+ sets linkable code on (as I mix C and Assember in my current projects) o+ enables optimise mode. ow+ enables optimiser warnings (they act as errors with SLINK, so I edit my source when I get an optimiser warning) ow1- disables warnings on short backwards branch optimising ow2- disables warnings on address register indirect with displacement zero to address register indirect optimising, again I don't want to edit my code if I have (for example) move.l vs_vscreen1b(a0),a1 ; vs_vscreen1b = 0
ow6- disables warnings if short branches forwards can be made d+ debug information on CHKIMM - Check Immediate values. This will report an error if any immediate addresses are used (the most common mistake in assembler is to leave the # from a value). Address 4 (EXECBASE) is allowed, and other fixed addesses (eg CUSTOM - $dff000) are allowed as long as you add a .L to the end. add.l 123,d0 LEA (CUSTOM).L,a0 ; This now gives an error! ; This doesn't.
If you find that your Argasm executables fail then check you haven't got any BSR's across sections! Argasm seems to allow this, but of course the code doesn't work. Jez 'Polygon' San from Argonaut software who published ArgAsm says it's not a bug, but a feature of the linker... Yeah right Jez... But Argasm is *fast*, and it produces non-working code *faster* than any other assembler :-) Argonaut have abandoned ArgAsm so the last version (1.09d) is the last. There will be no more, and it doesn't support 68020+ instructions, so I've stopped using it now.
to your s:user-startup. Copy the CygnusEd Activator (on Cygnus Ed V2.1x distribution disk) as C:ed, yes, that's right. Right over the top of the abysmal Commodore editor!! You lose 200Kb of fastram doing this, but believe me, it's worth it. Whenever you need to use Cygnus Ed, either type ed filename and it loads in a flash, or just press Right-ALT/Right-Shift/Return to open a new CED session. The CygnusEd Activator is public domain, and is in the utils directory of this archive. 3. Install the commands on the keys you want to use (under the special menu). I currently have mine set up:
F1 - devpac.ced - This calls Devpac to assemble the current file. Output is file ram:test, errors to ram:errors - Same, but for Argasm
F2
- argasm.ced
F3
- errors.ced
- Open and close the error window. The error window is not editable. - Execute the code!
F10 - ram:Test
Other keys are free for C, TeX or whatever else you want to use...
The 68040 FPU bit is set when a working 68040 FPU is in the system. If this bit is set and both the 68881 and 68882 bits are not set, then the 68040 math emulation code has not been loaded and only 68040 FPU instructions are available. This bit is valid *ONLY* if the 68040 bit is set. Don't forget to check which ROM version you're running. DO NOT assume that the system has a >68000 if the word is non-zero! 68881 chips are available on add-on boards without any faster processor. And don't assume that a 68000 processor means a 7Mhz 68000. It may well be a 14Mhz processor. So, you can use this to determine whether specific processor functions are available (more on 68020 commands in a later issue), but *NOT* to determine values for timing loops. Who knows, Motorola may release a 100Mhz 68020 next year :-) [Editors Note: They didn't! :-) ] There is *NO* easy way to check for a Memory Management Unit. The MMU is present in a broken form in many 680EC30 chips.
will be slower than: move.l add.l move.l add.l d0,(a0)+ d2,d0 d1,(a0)+ d3,d1 ; ; ; ; store x coordinate x+=deltax store y coordinate y+=deltay
The 68020 adds a number of enhancements to the 68000 architecture, including new addressing modes and instructions. Some of these are unconditional speedups, while others only sometimes help: Adressing modes Scaled Indexing. The 68000 addressing mode (disp,An,Dn) can have a scale factor of 2,4,or 8 applied to the data register on the 68020. This is totally free in terms of instruction length and execution time. An example is:
68000 ----add.w move.w 68020 ----move.w
d0,d0 (0,a1,d0.w),d1
(0,a1,d0.w*2),d1
16 bit offsets on An+Rn modes. The 68000 only supported 8 bit displacements when using the sum of an address register and another register as a memory address. The 68020 supports 16 bit displacements. This costs one extra cycle when the instruction is not in cache, but is free if the instruction is in cache. 32 bit displacements can also be used, but they cost 4 additional clock cycles.
Data registers can be used as addresses. (d0) is 3 cycles slower than (a0), and it only takes 2 cycles to move a data register to an address register, but this can help in situations where there is not a free address register. Memory indirect addressing. These instructions can help in some circumstances when there are not any free register to load a pointer into. Otherwise, they lose. New instructions Extended precision divide an multiply instructions. The 68020 can perform 32x32->32, 32x32->64 multiplication and 32/32 and 64/32 division. These are significantly faster than the multi-precision operations which are required on the 68000. EXTB. Sign extend byte to longword. Faster than the equivalent EXT.W EXT.L sequence on the 68000. Compare immediate and TST work in program-counter relative mode on the 68020. Bit field instructions. BFINS inserts a bitfield, and is faster than 2 MOVEs plus and AND and an OR. This instruction can be used nicely in fill routines or text plotting. BFEXTU/BFEXTS can extract and optionally signextend a bitfield on an arbitrary boundary. BFFFO can find the highest order bit set in a field. BFSET, BFCHG, and BFCLR can set, complement, or clear up to 32 bits at arbitrary boundaries. On the 020, all shift instructions execute in the same amount of time, regardless of how many bits are shifted. Note that ASL and ASR are slower than LSL and LSR. The break-even point on ADD Dn,Dn versus LSL is at two shifts. Many tradeoffs on the 020 are different than the 68000. The 020 has PACK an UNPACK which can be useful. The 68020-40 (bd.w,an) addressmode can be optimised to x(an). Saves 1 word and some cycles.
|------------------------|--------------------| | Addressmode | optimising | |------------------------|--------------------| |------------------------|--------------------| | move. l (1000.w,an),dn | move.l 1000(an),dn | |------------------------|--------------------|
The 68020-40 (bd.w,pc) addressmode can be optimised to bd.w(pc). Saves 1 word and some cycles.
|------------------------|--------------------| | Addressmode | optimising | |------------------------|--------------------| |------------------------|--------------------| | move. l (1000.w,pc),dn | move.l 1000(pc),dn | |------------------------|--------------------|
The 68020-40 (bd.w) addressmode can be optimised to bd.w. Saves 1 word and some cycles.
|------------------------|--------------------| | Addressmode | optimising | |------------------------|--------------------| |------------------------|--------------------| | move. l (bd.w,an),dn | move.l bd.w,dn | |------------------------|--------------------|
The 68020-40 (bd.l) addressmode can be optimised to bd.l. Saves 1 word and some cycles.
|------------------------|--------------------| | Addressmode | optimising | |------------------------|--------------------| |------------------------|--------------------| | move. l (bd.l,an),dn | move.l bd.l,dn | |------------------------|--------------------|
The 68020-40 addressmode (an) can be optimised to the 68000 addressmode (an). (an) can be interprete as a sub type of the address mode (bd.w,an.xn) and this is a 68020-40 addressmode. But (an) is a well known 68000 addressmode, so you should turn optimising ALWAYS on.
|------------------------|--------------------| | Addressmode | optimising | |------------------------|--------------------| |------------------------|--------------------| | move. l (an),dn | move.l (an),dn | |------------------------|--------------------|
The 68020-40 addressmode (pc) can be optimised to the 68000 addressmode (pc). (pc) can be interprete as a sub type of the address mode (bd.w,pc.xn) and this is a 68020-40 addressmode. But (pc) is a well known 68000 addressmode, so you should turn optimising ALWAYS on.
To get into sysop mode on Action Replay 3 type the same as Action Replay 2. After this you get a message "Try a new one". Then type in
NEW
The Action Replay 2, How it and the Amiga works, and why
For all the hackers amongst you lot, especially those with one of Datel's excellent Action Replay Mk.][s (unlike the first one, which was useless), here is a little technical info about it, and how to protect against it. The Cartridge Internals Pressing the button What happens, and why.. Two ways to get into the cart without pressing the button How does it know what's going on with the custom chip registers? How to protect against Action Replay Mk. ][
Normally, this Rom & Ram is totally invisible & undetectable. It is switched out of the address space, and there is *nothing* you can do to switch it in via software (there is an exception to this, see later).
Two ways to get into the cart without pressing the button
(Or: 'Futility- an object lesson'!) 1) This first, is the way the cart boots up on a reset (displaying that little piccy). The way this works is indecently sneaky! Instead of (as I had expected) either intercepting the reset wholesale, or appearing to the Kickstart reset code as an autoboot cartridge (which can be done by making your Rom appear at $F00000 on reset with $1111 as the first word - see $fc00d2 in the Rom), the AR2 goes for a MUCH sneakier route! It detects processor accesses to location $0000008, and when one happens, it effectively 'presses' it's own button. (i.e. does the level 7 business) Now, it just so happens that the Kickstart Rom does a MOVE to location $8 very early on, when it sets up the exception vectors, so voila! There it goes into the cart, returning later when it feels like it! At this point, I feel I should deeply disappoint those of you with a more technical bent (oo-er!)... You may be thinking that if you use the processor to generate a reset (with the RESET instruction), and then access location $8, then the cart will reveal it's presence, and thereby you can protect against it! No way jose! The designer of the cart was clever enough to put in a circuit that can actually tell the difference between you
plonking your fingers on the 3-keys of doom, and the processor doing a RESET.. Not easy to do! (It actually times the reset pulse, and whereas the processor RESET instruction makes a really short one, the keyboard reset lasts for about a second, and only responds to a location 8 access after a genuine reset.) Smart eh? 2) The second way is a bit simpler. When you set a breakpoint with the cart, or set the exceptions with SETEXCEPT, what the cart does is to put a set of TST.B $BFE001 instructions at location $100, with the vectors in question pointing there. What then happens, is then when you get an exception, the TST.B $BFE001 is executed, the cart detects it happening, and does the Level 7 thang. Now. This TST.B can't just be anywhere. It has to be from $100 to $120ish. "Ah HA!", you think,"I'll just have a few 'TST.B $BFE001's executed at $100 before my game loads!!" ....Oh no you won't! This whole thing is only enabled after the user does a SETEXCEPT or sets up a breakpoint! Normally the above has NO EFFECT AT ALL! Who's a clever little cart designer then???
How does it know what's going on with the custom chip registers?
I knew you would ask me that one! Ok, get ready for some interesting (?) technical info. As you may well know, all the custom chip registers at $DFF000-$DFF200 are EITHER read-only OR write-only, never, ever, both. This is for a good design reason (take my word for it). Some of the more interesting registers have separate read registers (e.g. DMACONR) so you can tell what the register itself contains. Most don't... Oh dear! It is therefore simply NOT POSSIBLE to tell what was last written to, say, COLOUR00 ($DFF180), without either; A) Asking the person looking at the screen to describe the border colour, or B) Extra hardware. Not entirely suprisingly, the AR2 goes for the latter solution. First, a quickie lesson in how the amiga's custom chips communicate internally... The custom chip 'registers' are not really actual memory at all. What that area ($Dff000-$dff200) actually is is a 'window' into the internal custom-chip computer system. This system consists of a 'bus' (i.e. a channel for data consisting of several physical connections grouped together, each one carrying one bit of the data that it being passed) that is connected to all the custom chips inside the Amiga. (See the pins labeled 'RGA1-8' which contains the number $000-$200 of the custom register, and 'D0-D15' which contain the 16 bits of data being transferred, that are on all of the main custom chips) Using this, all the various registers (which in physical reality are located scattered about in various of the custom chips, according to their usage. I.e. DIWSTRT is in Denise, which generates the display, and BLTSIZE is in Fat Agnus, which contains the blitter amongst other things) can be read or written. These registers do not exist solely for the purpose of being read/written by the 68000. Certainly most of them
have no purpose than to be set up by the main processor in order to tell the appropriate custom chip what to do, but others are really of no use to the programmer whatsoever. For example, the xxxxxxDAT registers (more on them later). Apart from the 68000 using this bus to communicate with the custom chip registers, the chips themselves use it all the time too! For example, how the screen display is generated... Fat Agnus does all the DMA handling; i.e. it is the chip that, when a bit of data (screen,disk,audio,etc) has to be transferred to/from chip memory, actually does the donkey work of read/writing the value to the appropriate main memory location. (Main memory is a separate system to this internal custom chip world, and Agnus is the interface) For screen DMA, the data actually is destined for the Denise chip, which is a separate lump of silicon, so how does it get from Agnus (that has just fetched it from Ram) to Denise? Via this internal bus! This is where those registers that no-one seems to know about come in. I mean the xxxxDAT registers! BPL1DAT (for example) is not really meant to be accessed by the user at all! It is there for internal usage.. i.e. Agnus reads a word of the screen memory from the outside world, and writes it into BPL1DAT, which is a physical register inside the Denise chip. Now, this operation is functionally the same as you doing a MOVE.W into BPL1DAT, but it is done by Agnus, with no help from the processor. This is DMA. If you read from VHPOSR, then the read request from the 68000 would be passed to Agnus, which would then consult the internal bus, and then deliver the value back. Ok, basically the processor is just a spectator on this internal register bus. (Am I being too patronising? Sorry.) Right, so now you know that the custom chip registers are not really part of main memory at all (that's why the Copper can only work within this small world of registers and not with all of Ram), and you know/already knew that there are many registers that you cannot read at all. Back on the actual subject in hand.. i.e. how the AR2 knows what these internal registers contain, when it is simply not possible to read them....! Answer: It doesn't. What it actually does, is use an idea stolen from Romantic Robot (a company that made the first decent freezer cart for the Amstrad CPC, which also had write-only registers) which is to make a bit of sneaky hardware that effectively sits there and watches the 68000 like a hawk. Whenever the 68000 does a write operation to a chip register, it makes an internal copy of the value being written! What this amounts to is $200 bytes of totally invisible (invisible, that is, until you whack the button, when it appears - to be read - in another area of memory) Ram that contains all the values most recently written by the processor to any custom chip register. Handy eh? So the action replay has all those values you wrote to COLOUR00, DSKSYNC,etc,etc copied down in its'
own bit of ram! Now, here's where the major catch comes... You may have noticed two things.... A) Only the value last written is kept, and B) Only VALUES WRITTEN BY THE PROCESSOR can be kept. The first argument is not really that important, but the second one IS!
B) Point the Supervisor stack to an ODD address, and run your program in user mode, with NO INTERRUPTS! When you get an interrupt, the processor always enters supervisor mode, switches over to the supervisor stack, and pushes on the address to return to after the interrupt and the current Status Register value. If the address that it tries to push these to is odd...? Kapooof. Not just an address error, but the address error itself also tries to push words onto the odd-stack, and you get a double-exception.. i.e. total 68000 lock-up. It will not recover until you do a hardware reset. This is absolutely the best way to fuck ANY cart up. Press the button when this has been done and the entire computer crashes totally. But... Can you write a game without using interrupts? (The 'Say kids.. what time is it?' approach) C) Use the CIA Time-of-day alarm. This is semi-complex.. Each of the 2 CIAs have 'Time of Day' clocks. These are clocks that run on the conventional hour/minute/second scale, and are driven by the system clock. They have to be set to the right time after every reset, so are almost never used for their intended purpose. Thing is, the clocks also have an alarm facility, whereby you can get an interrupt from the CIA when the current time=the preset alarm time. There are 3 registers (hours/minutes/seconds) that if read, contain the current time, if written in mode 1 (mode is set by a register bit), will change the current time, and if written in mode 2, will change the alarm time. This alarm time cannot ever be read. So.. what you do is... Set your alarm time for, say, 00:00:10, then set the current time to 00:00:00, and enable the interrupt. Start your game going. When you get the alarm interrupt, set the current time back to 00:00:00, and in another 10 seconds you will get another interrupt, and so on. If, however, you notice that the time has ever gone past 00:00:10 without you getting an interrupt, or that the alarm occurs at the wrong time, then you know that someone has tampered with the program and didn't set the right alarm time! I know it sounds complicated, but if you use a weird alarm time, then noone will ever know what the correct value to set it to is, and so the 'freezer' can never produce a copy of the game that will unfreeze and work. Both these approches use features of the Amiga/68000 that cannot be got around. The first one can be fixed by having loads of internal connections into the Amiga, but no-one will want to do that. Unfortunately, at time of writing (31st Jan'91), I can't think of another way to stop the cart getting into the monitor. Like I said, it's quite well designed! If you can find a bug in the software to exploit, all well and good, but remember bugs can be fixed! P.P.P.P.S. Thanx to Bob & Jim for the help. Brought to you by GREMLIN of MAYHEM (finished 5:50am 31st Jan 1991)
This will not work unless the V39 SetPatch command has been executed. If you *must* use trackloader demos then execute the graphics.library function SetChipRev(chipset) This is a V39 function (No Kickstart 3.0? Then you haven't got AGA!). You can set the chipset you require with the following parameters:
Normal ECS AGA Best = = = = $00 $03 (Only on ECS or higher) $0f (Only on AGA chipset machines) $ffffffff (This gives best possible on machine)
This is called in the system by SetPatch. The code in howtocode4 also had major problems when being run on non ECS machines (without Super Denise or Lisa), as the register was undefined under the original (A) chipset, and would return garbage, sometimes triggering a false AGA-present response.
Bitplanes:
Set bits 0 to 7 bitplanes as before in BPLCON0 (for 0 to 7 bitplanes) For 8 bitplanes you should set bit 4 (BPU3) of BPLCON0 bits 12 to 14 (BPU0 to BPU2) should be zero. Using 64-colour mode (NOT extra halfbrite) requires setting the KILLEHB (bit 9) in BPLCON2. Super Hires can be enabled by bit 6 (SHRES) of BPLCON0
Colour Registers
There are now 256 24-bit colour registers, all accessed through the original 32 12-bit colour registers. If you suspect this sounds like it could be messy, then you're right, it is! AGA works with 8 differents palettes of 32 colors each, re-using colour registers from COLOR00 to COLOR31
You can choose the palette you want to access via bits 13 to 15 of register BPLCON3.
BANK2 BANK1 BANK0 bit 15 | bit 14 | bit 13 | Selected palette -------+--------+--------+-----------------------------0 | 0 | 0 | Palette 0 (color 0 to 31) 0 | 0 | 1 | Palette 1 (color 32 to 63) 0 | 1 | 0 | Palette 2 (color 64 to 95) 0 | 1 | 1 | Palette 3 (color 96 to 127) 1 | 0 | 0 | Palette 4 (color 128 to 159) 1 | 0 | 1 | Palette 5 (color 160 to 191) 1 | 1 | 0 | Palette 6 (color 192 to 223) 1 | 1 | 1 | Palette 7 (color 224 to 255)
To move a 24-bit colour value into a colour register requires two writes to the register: First clear bit 9 (LOCT) of BPLCON3 Move high nibbles of each colour component to colour registers Then set bit 9 (LOCT) of BPLCON3 Move low nibbles of each colour components to colour registers For example, to change colour zero to the colour $123456
lea move.w move.w move.w move.w (CUSTOM.L),a0 #$0135,COLOR00(a0) #$0200,BPLCON3(a0) #$0246,COLOR00(a0) #$0000,BPLCON3(a0)
Sprites
To change the resolution of the sprite, just use bit 7 and 6 of register BPLCON3
bit 7 | bit 6 | Resolution ------+-------+----------0 | 0 | ECS Defaults 0 | 1 | Always lowres 0 | 1 | Always hireres 1 | 1 | Always superhires --------------------------
For 32-bit and 64-bit wide sprites use bit 3 and 2 of register FMODE ($dff1fc) Sprite format (in particular the control words) vary for each width.
bit 3 | bit 2 | Wide | Control Words ------+-------+-------------+---------------------------------0 | 0 | 16 pixels | 2 words (normal) 1 | 0 | 32 pixels | 2 longwords 0 | 1 | 32 pixels | 2 longwords 1 | 1 | 64 pixels | 2 double long words (4 longwords) ---------------------------------------------------------------
Wider sprites are not available under all conditions. It is possible to choose the color palette of the sprite. This is done with bits 0 to 3 (even) and 4 to 7 (odd) of register $010C.
bit 3 | bit 2 | bit 1 | bit 0 | Even sprites bit 7 | bit 6 | bit 5 | bit 4 | Odd Sprites ------+-------+-------+-------+-----------------------------------------0 | 0 | 0 | 0 | $0180/palette 0 (coulor 0) 0 | 0 | 0 | 1 | $01A0/palette 0 (color 15) 0 | 0 | 1 | 0 | $0180/palette 1 (color 31) 0 | 0 | 1 | 1 | $01A0/palette 1 (color 47) 0 | 1 | 0 | 0 | $0180/palette 2 (color 63) 0 | 1 | 0 | 1 | $01A0/palette 2 (color 79) 0 | 1 | 1 | 0 | $0180/palette 3 (color 95) 0 | 1 | 1 | 1 | $01A0/palette 3 (color 111) 1 | 0 | 0 | 0 | $0180/palette 4 (color 127) 1 | 0 | 0 | 1 | $01A0/palette 4 (color 143) 1 | 0 | 1 | 0 | $0180/palette 5 (color 159) 1 | 0 | 1 | 1 | $01A0/palette 5 (color 175) 1 | 1 | 0 | 0 | $0180/palette 6 (color 191) 1 | 1 | 0 | 1 | $01A0/palette 6 (color 207) 1 | 1 | 1 | 0 | $0180/palette 7 (color 223) 1 | 1 | 1 | 1 | $01A0/palette 7 (color 239) -------------------------------------------------------------------------
Alignment Restrictions
Bitplanes, sprites and copperlists must be, under certain circumstances, 64-bit aligned under AGA. Again to benefit from maximum bandwitdh bitplanes should also only be multiples of 64-bits wide, so if you want an extra area on the side of your screen for smooth blitter scrolling it must be *8 bytes* wide, not two as normal. This also raises another problem. You can no longer use AllocMem() to allocate bitplane/sprite memory directly. Either use AllocMem(sizeofplanes+8) and calculate how many bytes you have to skip at the front to give 64-bit alignment (remember this assumes either you allocate each bitplane individually or make sure the bitplane size is also an exact multiple of 64-bits), or you can use the new V39 function AllocBitMap()
Colours
Fetchmode
64 128 256 HAM-8 32 64 128 256 HAM-8 2 4 8 16 32 64 128 256 HAM-8 2 4 8 16 32 64 128 256 HAM-8
1 1 1 1 2 2 2 2 2 1 1 2 2 4 4 4 4 4 1 1 2 2 4 4 4 4 4
SUPER-HIRES (1280x256)
PRODUCTIVITY (640x480,etc)
This table only shows the minimum required fetchmode for each screen. You should always try and set the fetchmode as high as possible (if you are 64-bit aligned and wide, then $11, if 32-bit aligned and wide $01, etc...) Bits 2 and 3 do the same for sprite width, as has been mentioned elsewhere... Remember... To take advantage of the increased fetchmodes (which give you more processor time to play with!) your bitmaps must be on 64-bit boundaries and be multiples of 64-bits wide (8 bytes)
Monitor Problems
Unfortunately the A1200/AGA chipset does not have the deinterlacer circuitry present in the Amiga 3000, but instead has new 'deinterlaced' modes. This gives the A1200 the capability of running workbench (and almost all OS legal software) the ability to run flicker free at high resolution on a multiscan or Super VGA monitor. Unlike the Amiga 3000 hardware it produces these flicker free modes by generating a custom copperlist, so any programs that generate their own copperlists will continue to run at the old flickery 15Khz frequency unless they add their own deinterlace code. This is a big problem for many A1200 owners as there are very few multiscan monitors that support 15Khz displays now. Most multiscan monitors will not display screen at less than 27Khz. People with A1200/4000 and this kind of monitor *CANNOT* view any games or demos that write their own copperlists. Can you help them out? Unfortunately it's not easy. Deinterlacing is done in AGA by doing two things. Firstly different horizontal and vertical frequencies are set (These are set to unusual values for anyone used to Amiga or PC displays! For example, DblPal is set by default to 27Khz horizontal and 48Hz vertical) It's important to realise that the vertical frequency changes too! Seondly, for non-interlaced screens, bitplane scandoubling is enabled (bit BSCAN2 in FMODE) This repeats each scanline twice. A side effect of this is that the bitplane modulos are unavailable for user control. So... There are three options. 1. Write nasty copperlist code to work with both standard and promoted displays (Not a good idea!) 2. Use the OS and set up your displays legally, asking the Display Database for a screenmode that is available for the current monitor. 3. Give up, and say your demo requires a 15Khz monitor. I think most people will go for option 3. The Commodore 1084/1085, Phillips 8833/8852 and the Commodore 1950/1960/1940/1942 monitors are all capable of running 15Khz screens.
exec.library/AllocMem()
AllocMem -- allocate memory given certain requirements memoryBlock = AllocMem(byteSize, attributes) D0 -198 D0 D1 void *AllocMem(ULONG, ULONG);
graphics.library/AllocBitMap()
AllocBitMap -- Allocate a bitmap and attach bitplanes to it. (V39) bitmap=AllocBitMap(sizex,sizey,depth, flags, friend_bitmap) -918 d0 d1 d2 d3 a0 struct BitMap *AllocBitMap(ULONG,ULONG,ULONG,ULONG, struct BitMap *); Allocates and initializes a bitmap structure. Allocates and initializes bitplane data, and sets the bitmap's planes to point to it. IN: sizex = the width (in pixels) for the bitmap data. sizey = the height (in pixels). depth = the number of bitplanes deep for the allocation. flags = BMF_CLEAR - Clear the bitmap. BMF_DISPLAYABLE - bitmap displayable on AGA machines in all modes. BMF_INTERLEAVED - bitplanes are interleaved friend_bitmap = pointer to another bitmap, or NULL. If this pointer If present, bitmap will be allocated so blitting between the two is simplified. SEE ALSO FreeBitMap()
graphics.library/FreeBitMap()
FreeBitMap -- free a bitmap created by AllocBitMap FreeBitMap(bm) -924 a0 VOID FreeBitMap(struct BitMap *) Frees bitmap and all associated bitplanes IN: bm
A pointer to a BitMap.
Here is the assembler version of the code: See startup.asm for an integrated example of this code:
; Setup code - assumes V39 Kickstart or higher FixSpritesSetup: move.l _IntuitionBase,a6 lea wbname,a0 jsr _LVOLockPubScreen(a6) tst.l beq.s move.l move.l move.l lea move.l jsr move.l move.l move.l move.l move.l lea jsr move.l move.l jsr jsr d0 .error d0,wbscreen d0,a0
sc_ViewPort+vp_ColorMap(a0),a0 taglist,a1 _GfxBase,a6 ; open graphics.library first! _LVOVideoControl(a6) resolution,oldres ; store old resolution
#SPRITERESN_140NS,resolution #VTAG_SPRITERESN_SET,taglist wbscreen,a0 sc_ViewPort+vp_ColorMap(a0),a0 taglist,a1 _LVOVideoControl(a6) ; set sprites to lores wbscreen,a0 _IntuitionBase,a6 _LVOMakeScreen(a6) _LVORethinkDisplay(a6)
; Sprites are now set back to 140ns in a system friendly manner! .error rts ReturnSpritesToNormal: ; If you mess with sprite resolution you must return resolution ; back to workbench standard on return! This code will do that... move.l beq.s move.l move.l lea move.l move.l jsr move.l move.l jsr move.l sub.l wbscreen,d0 .error d0,a0 oldres,resolution ; change taglist taglist,a1 sc_ViewPort+vp_ColorMap(a0),a0 _GfxBase,a6 _LVOVideoControl(a6) ; return sprites to normal. _IntuitionBase,a6 wbscreen,a0 _LVOMakeScreen(a6) wbscreen,a1 a0,a0
_LVOUnlockPubScreen(a6)
wbname
intuition.library/LocPubScreen()
LockPubScreen -- Put a lock on a Public Screen. screen = LockPubScreen( Name ) D0 -510 A0 struct Screen *LockPubScreen( UBYTE * ); Prevents a public screen (or the Workbench) from closing.
intuition.library/UnlockPubScreen()
UnlockPubScreen -- Remove lock from a Public Screen. UnlockPubScreen( Name, [Screen] ) -516 A0 A1 VOID UnlockPubScreen( UBYTE *, struct Screen * ); Releases a lock from @{" LockPubScreen() " link lockpubscreen} IN: Usually Name = NULL and Screen = pointer returned by LockPubScreen()
graphics.library/VideoControl()
VideoControl -- Parse tags on viewport colormap. err = VideoControl( cmap , tags ) d0 -708 a0 a1 ULONG VideoControl( struct ColorMap *, struct TagItem * ); Process the tag commands on the colormap. IN: cm = pointer to struct ColorMap tags = pointer to a table of videocontrol tagitems. OUT: error = NULL if no error occured.
graphics.library/SetChipRev()
SetChipRev -- Enables Chip Set features chipbits = SetChipRev(Rev)
-888
d0
IN: Rev - Revision to be enabled ($ffffffff for best possible) OUT: chipbits - State of chipset on exit. Only call this routine once. It is called by the OS in SetPatch, but you should use it if you are writing Non-DOS demos or games.
OwnBlitter()/DisownBlitter()
If you are using the blitter in your code and you are leaving the system intact (as you should) always use the graphics.library functions OwnBlitter() and DisownBlitter() to take control of the blitter. Remember to free it for system use, many system functions (including floppy disk data decoding) use the blitter. OwnBlitter() does not trash any registers. I guess DisownBlitter() doesn't either, although Chris may well correct me on this, and they are fast enough to use around your blitter code, so don't just OwnBlitter() at the beginning of your code and DisownBlitter() at the end, only @{" OwnBlitter() " link ownblitter} when you need to. graphics.library/OwnBlitter()
OwnBlitter -- get the blitter for private usage OwnBlitter() -456 void OwnBlitter( void ); If blitter is available return immediately with the blitter locked for your exclusive use. If the blitter is not available put task to sleep. It will be awakened as soon as the blitter is available. When the task first owns the blitter the blitter may still be finishing up a blit for the previous owner. You must do a WaitBlit before actually using the blitter registers. Calls to OwnBlitter() do not nest. If a task that owns the blitter calls OwnBlitter() again, a lockup will result. (Same situation if the task calls a system function that tries to own the blitter).
graphics.library/DisownBlitter()
DisownBlitter - return blitter to free state. DisownBlitter() -462 void DisownBlitter( void ); Free blitter up for use by other blitter users.
graphics.library/QBlit()
QBlit -- Queue up a request for blitter usage QBlit( bp ) -276 a1 void QBlit( struct bltnode * ); Link a request for the use of the blitter to the end of the current blitter queue. The pointer bp points to a blit structure containing, among other things, the link information, and the address of your routine which is to be called when the blitter queue finally gets around to this specific request. When your routine is called, you are in control of the blitter ... it is not busy with anyone else's requests. This means that you can directly specify the register contents and start the blitter. Your code must be written to run either in supervisor or user mode on the 68000. IN: bp - pointer to a blit structure Your routine is called when the blitter is ready for you. In general requests for blitter usage through this channel are put in front of those who use the blitter via OwnBlitter and DisownBlitter. However for small blits there is more overhead using the queuer than Own/Disown Blitter.
graphics.library/QBSBlit
QBSBlit -- Synchronize the blitter request with the video beam. QBSBlit( bsp ) -294 a1 void QBSBlit( struct bltnode * ); Call a user routine for use of the blitter, enqueued separately from the @{" QBlit ",link qblit} queue. Calls the user routine contained in the blit structure when the video beam is located at a specified position onscreen. Useful when you are trying to blit into a visible part of the screen and wish to perform the data move while the beam is not trying to display that same area. (prevents showing part of an old display and part of a new display simultaneously). Blitter requests on the QBSBlit queue take precedence over those on the regular blitter queue. The beam position is specified the blitnode. IN: bsp - pointer to a blit structure.
graphics.library/WaitBlit()
WaitBlit -- Wait for the blitter to finish. WaitBlit() -228
Blitter Timing
Another common cause for demo crashes is blitter timing.
Assuming that a particular routine will be slow enough that a blitter wait is not needed is silly. Always check for blitter finished, and wait if you need to. Don't assume the blitter will always run at the same speed too. Think about how your code would run if the processor or blitter were running at 100 times the current speed. As long as you keep this in mind, you'll be in a better frame of mind for writing code that works on different Amigas. Another big source of blitter problems is using the blitter in interrupts. Many demos do all processing in the interrupt, with only a
.wt btst bne.s #6,$bfe001 .wt ; is left mouse button clicked?
loop outside of the interrupt. However, some demos do stuff outside the interrupt too. Warning. If you use blitter in both your interrupt and your main code, (or for that matter if you use the blitter via the copper and also in your main code), you may have big problems.... Take this for example:
lea move.l jsr move.l move.l move.l move.w move.w move.w $dff000,a5 GfxBase,a6 _LVOWaitBlit(a6) #-1,BLTAFWM(a5) #source,BLTAPT(a5) #dest,BLTDPT(a5) #%100111110000,BLTCON0(a5) #0,BLTCON1(a5) #64*height+width/2,BLTSIZE(a5)
; trigger blitter
There is *nothing* stopping an interrupt, or copper, triggering a blitter operation between the WaitBlit() call and your final BLTSIZE blitter trigger. This can lead to total system blowup. Code that may, by luck, work on standard speed machines may die horribly on faster processors due to timing differences causing this type of problem to occurr. You can prevent this by using OwnBlitter() The safest way to avoid this is to keep all your blitter calls together, use the copper exclusively, or write a blitterinterrupt routine to do your blits for you, which is very good because you avoid getting stuck in a waitblit-loop. Always use the graphics.library WaitBlit() routine for your end of blitter code. It does not change any registers, it takes into account any revision of blitter chip and any unusual circumstances, and on an Amiga 1200 will execute faster (because in 32-bit ROM) than any code that you could write in chipram.
Calculating LF Bytes
Instead of calculating your LF-bytes all the time you can do this
A B C EQU EQU EQU %11110000 %11001100 %10101010
move.w
#(A!B)&C,d0
Blitter clears
If you use the blitter to clear large areas, you can generally improve speed on higher processors (68020+) by replacing it by a cache-loop that clears with movem.l instead:
moveq moveq moveq moveq moveq moveq moveq sub.l sub.l sub.l sub.l sub.l sub.l #0,d0 #0,d1 #0,d2 #0,d3 #0,d4 #0,d5 #0,d6 a0,a0 a1,a1 a2,a2 a3,a3 a4,a4 a5,a5
lea EndofBitplane,a6 move.w #(bytes in plane/156)-1,d7 .Clear movem.l d0-d6/a0-a5,-(a6) movem.l d0-d6/a0-a5,-(a6) movem.l d0-d6/a0-a5,-(a6) dbf d7,.Clear ; final couple of movems may be needed to clear last few bytes of screen...
This loop was (on my 1200) almost three times faster than the blitter. With 68000-68010 you can gain some time by NOT using blitter- nasty and the movem-loop.
So, use A,D,A&D for the fastest operation. Use A&C for 2-source operations (e.g. collision check or so).
Programming CDTV/A570
Until now there has been no CDTV documentation available to the public... Well, here are a few tips..... cdtv.device Checking for A570 CD-ROM AmigaCD32
Using cdtv.device
The CDTV can be controlled by the cdtv.device, which is a standard Amiga device. Open the cdtv.device as standard, and issue commands to it to play audio, read data, etc... Examine cdtv.i, included in the source directory. For example: To play track 2 on an audio CD in a CDTV, use the following:
include "cdtv.i"
...... your code here ...... move.l MyCDTVRequest,a1 ; set this up as for any ; other device (eg trackdisk.device)
#CDTV_PLAYTRACK,IO_COMMAND(a1) #2,IO_OFFSET(a1) #1,IO_LENGTH(a1) 4.w,a6 _LVOSendIO(a6) ; track number ; number of tracks to play
; send command
If you need to gain extra memory, you can shut down the cdtv.device (apparently) by issuing a CDTV_STOP command to the device.
If it returns NULL then it's not A570, if it returns an address then its an A570 exec.library/FindResident()
FindResident - find a resident module by name resident = FindResident(name) D0 -96 A1 struct Resident *FindResident(STRPTR);
Search the system resident tag list for a resident tag ("ROMTag") with the given name. If found return a pointer to the resident tag structure, else return zero. IN: name - pointer to name. OUT: resident - pointer to the resident tag structure (or NULL)
AmigaCD 32 information
As HTC7 was going to press the AmigaCD32 had been launched in Germany, and UK launch is imminent (July 16th): AmigaCD32 is: 68020 14Mhz processor unit, double speed CD-ROM. Will run AmigaCD, CD+G, CDTV and CD Audio discs. It contains AGA chipset and Kickstart 3.1. It has two joystick/mouse ports, Composite video, RF (PAL), S-VHX and AUX (A4000 keyboard port). There are *NO* other AMiga ports. No RGB (so no monitors...) no Serial or Parallel (so no Parnet!!!!), and most strange of all - no floppy disk drive port :-( It is being sold as a games console to rival Nintendo and Sega. The most interesting new feature is a new piece of hardware to do *fast* chunky to planar pixel conversion. Hopefully this will be fitted to the Amiga 1200 and 4000 in time... Programming is done in the same way as any other Amiga model. There are some new libraries and devies, including lowlevel.library, that allows direct control of the new joypad controller (with 10 buttons)
LoadView( View ) -222 A1 void LoadView( struct View * ); IN: View - a pointer to the View structure which contains the pointer to the constructed coprocessor instructions list, or NULL. If the View pointer is non-NULL, the new View is displayed, according to your instructions. The vertical blank routine will pick this pointer up and direct the copper to start displaying this View. If the View pointer is NULL, no View is displayed, and the hardware defaults back to standard chipset defaults (mostly). Even though a LoadView(NULL) is performed, display DMA will still be active. Sprites will continue to be displayed after a LoadView(NULL) unless an OFF_SPRITE is subsequently performed.
graphics.library/WaitTOF()
WaitTOF -- Wait for the top of the next video frame. WaitTOF() -270 void WaitTOF( void ); Wait for vertical blank to occur and all vertical blank interrupt routines to complete before returning to caller.
However (as many of you have found out), this actually triggers just before the end of the previous line (around 4 or 5 low-res pixels in from the maximum overscan border). For most operations this is not a problem (and indeed gives a little extra time to initialise stuff for the next line), but if you are changing the background colour ($dff180), then there is a noticable 'step' at the end of the scanline. The correct way to do a copper wait to avoid this problem is
$xx07,$fffe.
This just misses the previous scanline, so the background colour is changed exactly at the start of the scanline, not before.
1. Preface
The sources of this text has more or less indirectly been some books from my school. Some sources worth mentioning are: Elementary Linear Algebra (by Howard Anton, released by John Wiley) Calculus - A complete course (By Robert K. Adams) The DiscWorld series (by T. Pratchett) By reading this text, you should also be able to know what it is you're doing. Not just converting given formulas to 680x0 code, but also know what the background of it is. If you know the theory behind your routine, you also know how to optimize it! NO text will here mention GLENZ-vectors, since they are amazingly depressive. This text is meant for democoders on all computers that supports a good graphic interface, which is fast enough to do normal concave objects in one frame (not PC). sqr() means sqare root in this text. I'm curious about what support Commodore has for this kind of programming in their latest OS, it could be a great Idea if rotations etc that used many multiplications was programmed in ROM. The methods described are used by
most well-known demo-coders in "vector" demos. The rights of this part stays with the author. I've coded Blue House I+2, most of Rebels Megademo II, my own fantasic and wonderful cruncher (not released), Amed (also not released), some intros, and the rubiks snake in Rebels Aars-intro, and the real slideshow from ECES. Sorry for most of my demos not working on other machines than real A500's, but that's the only computer I've used for bugtesting. The meaning of this text is that it shall be a part of How To Code.txt and that the same rules works for this text as for that. The rights of this part stays with the author. Sourcecodes should work with most assemblers except for Insert sorting, which needs a 68020 assembler. Hi to all my friends who really supported me by comments like:
"How can you write all that text?" "Who will read all that?" "Can I have it first, so I can get more bbs-access?" "Why are you writing that?" "I want to play Zool!" (-My youngest brother) "My dog is sick..." "You definitely have the right approach to this!" "" (-Commodore Sweden) (But in swedish of course!)
The reason why Terry Pratchetts DiscWorld series is taken as a serious source is that he is a great visualizer of strange mathematical difficulties. If you ever have problems with inspiration, sit back, read and try to imagine how demos would look like in the DiscWorld... (Glenz-Turtles?) Now read this text and impress me with a great AGA-demo... (C) MOVEMENT 1993. "Death to the pis" /T. Domar
2. Introduction to vectors
What is a vector? If you have seen demos, you have probably seen effects that is called, in a loose term, vectors. They may be balls, filled polygons, lines, objects and many other things. The thing that is in common of these demos are the vector calculations of the positions of the objects. It can be in one, two or three Dimensions (or more, but then you can't see the ones above 3) You can for example have a cube. Each corner on the cube represent a vector TO the center of rotation. All vectors go FROM something TO something, normally we use vectors that goes from a point (0,0) to a point (a,b). This vector has the quantity of (a,b). Definition of vector: A QUANTITY of both VALUE and DIRECTION. or, in a laymans loose terms: a line. A line have a length that we can call r, and a direction we can call t. We can write this vector (r,t) = (length,angle). But there is also another way, which is more used when dealing with vector
objects with given coordinates. The line from (0,0) to (x,y) has the length sqr(x*x+y*y) and that is the VALUE of the vector. The direction can be seen as the angle between the x-axis and the line described by the vector. If we study this in two dimensions, we can have an example vector as following:
^ y | _.(a,b) | /| | / | / | / V | / |/ - t=angle between x-axis and vector V ---+------------> (0,0) x
We can call this vector V, and, as we can see, it goes from the point (0,0) and (a,b). We can denote this vector as V=(a,b). Now we have both a value of V (The length between (0,0) and (a,b)) and a direction of it (the angle in the diagram) If we look at the diagram, we can see that the length of the vector can be computed with pythagoras theorem, like:
r=sqr(a*a+b*b)
and t is the angle (Can be calculated with t=tan(y/x)) Three Dimensions? Now, if we have seen what a vector is in two dimensions, what is a vector in three? In three dimensions, every point has three coordinates, and so must then the vector have.
V=(a,b,c)
What happens to the angle now? Here we can have different definitions, but let's think a little. If we start with giving ONE angle, we can only reach points on one PLANE, but we want to get a direction in SPACE. If we try with TWO angles, we will get a better result. One angle can represent the angle between the z-axis and the vector, the other the rotation AROUND the z-axis. For more problems in this area (there's many) study calculus of several variables and specially polar transformations in triple integrals, or just surface integrals in vector fields.
* A coordinate system can be "Translated" to a new point with the translation formula:
x'=x-k y'=y-l z'=z-m
Where (k,l,m) is the OLD point where the NEW coordinate system should have its point (0,0,0) This is a good operation if you want to ROTATE around A NEW POINT! * A vector can be rotated (Check chapter 4) The vector is always rotated around the point (0,0,0) so you may have to TRANSLATE it. * We can take scalar product and cross-product of vectors (see any book about introduction to linear algebra for these. everything is evaluated in this text, so you don't have to know what this is)
3. Coding techniques
Presenting a three dimensional point on a two dimensinal screen Assume that you have a point in space (3d) that you want to take a photo of. A photo is 2d, so this should give us some sort of an answer. Look at the picture below:
Point / Screen (="photo") . |/ | ^y | | || <---+-x <- Eye of observer z | | | |
Inspecting this gives us the following formula: Projected Y = Distance of screen * Old Y / ( Distance of point ) (The distances is of course the Z coordinates from the Eyes position) And a similar way of thinking gives us the projection of X.
New Y=k*y/(z+dist) X=k*x/(z+dist)
(where k is a constant for this screen, dist is the distance from the ROTATION point to the EYE on the Z-axis) A way of presenting real numbers with Integers For 68000 coding (compatible with all better processors) it is comfortable to be able to do multiplictations etc. with words (680x0,{x>=2} can do it with longwords, but this won't work very good with lower x's). But we need the fract parts of the numbers too, so how do we do? We can try to use numbers that are multiplied by a constant p. Then we can do the following operation:
[cos(a)*p] * 75 (for example from a list with cos(x) mult. with p)
But as you can see this number grows for each time we do another multiplication, so what we have to do is to divide by p again:
[cos(a)*p] * 75 / p
If you are a knower of digital electronics, you say "Oh no, not a division that takes so long time". But if you choose p carefully (i.e. p = 2 or 4 or 8 ...) you can use shifting instead of clean division. Look at this example:
mulu asr.l 10(a0),d0 #8,d0 ;10(a0) is from a list of cos*256 values ;and we "divide" by 256!
Now we have done a multiplication of a fixed point number! (A hint to get the error a little smaller: clear a Dxregister and use an addx after the asr, and you will get a round-off error instead:
moveq : : mulu asr.l addx.l : rts #0,d7
This halves the error!) The same thinking comes in with divisions, but in the other way:
: ext.l ext.l asl.l divs : rts d0 d1 #8,d0 d1,d0
Additions and subtractions are the same as normal integer operations: (no shifting needed)
: add.w : : sub.w : 10(a0),d0
16(a1),d1
So, With multiplications you MUL first, then LSR. With divisions you LSL first, then DIV. If you wish to have higher accuracy with the multiplications, the 68020 and higher processors offers a cheap way to do floating point operations (32-bit total) instead. You can also do integer multiplications 32*32->32, and use 16-bit coses and sins instead, which enables you to use 'swap' instead of 'lsr'. How can I use Sin and Cos in my assembler code? The easiest and fastest way is to include a sinus-list in you program. Make a basic-program that counts from 0 to 2*pi, for example 1024 times. Save the values and include them into your code. If you have words and 1024 different sinus values then you can get sinus and cosinus this way:
lea and.w move.w add.w and.w move.w : : sinuslist(pc),a0 #$7fe,d0 (a0,d0.w),d1 #$200,d0 #$7fe,d0 (a0,d0.w),d0 ;sinuslist is calculated list ;d0 is angle ;d1=sin(d0)
;d0=cos(original d0)
Your program could look like this: (AmigaBasic, HiSoft basic) (NEVER use AmigaBasic on other processors than 68000) open "ram:sinuslist.s" for output as 1 pi=3.141592654# vals=1024 nu=0 pretxt=chr$(10)+chr$(9)+chr$(9)+'dc.w'+chr$(9) for L=0 to vals angle=L/vals*2*pi y='$'+hex$(int(sin(angle)*255.4)) nu=nu+1 if nu=8 then print #1,pretxt;:nu=0 else print #1,','; print #1,y$; next L close 1 You can of course do a program that calculates the sins in assembler code, by using ieee-libs or coding your own floating point routines. the relevant algoritm is... (for sinus) indata: v=angle (given in radians) Laps=number of terms (less=faster and more error, integer) 1> Mlop=1 DFac=1 Ang2=angle*angle Talj=angle sign=1
Result=0 2> FOR terms=1 TO Laps 2.1> Temp=Talj/Dfac 2.2> Result=sign*(Result+Temp) 2.3> Talj=Talj*Ang2 2.4> Mlop=Mlop+1 2.5> Dfac=Dfac*Mlop 2.6> sign=-sign 3> RETURN sin()=Result where the returned sin() is between -1 and 1... The algorithm uses MacLaurin polynoms, and are therefore recommended only for values that are not very far away from 0.
Sine: Hypotenuse/side not close to angle , /| Length>/ |< Length * sin(a) /a | '---+ Length * cos(a) If we put in this in the original rotation formula (V'=rot(V,a)=V(r,t+a)) we can see that we can convert r and t to x and y with: x=r*cos(t) y=r*sin(t) Let's get back to our problem of the rotated vector V=(x,0). Here is r=x (=sqrt(x*x+0*0)), t=0 (=arctan(0/x) if we put this in our formula we get: V=(r,t) if r=x, t=0 If we rotate this vector with the angle a we get: V=(r,t+a) And if we translate back to our coordinate denotion: V=(r*cos(t+a),r*sin(t+a))=(x*cos(a),x*sin(a)) ^We insert x=r, t=0 And that is the formula for rotation of a vector that has no y-composant. For a vector V=(0,y) we get: r=y, t=pi/2 (=90 degrees) since we now are in the y-axis, which is 90 degrees from the x-axis. V=(r,t) => V'=(r,t+a) => V'=(r*cos(t+a),r*sin(t+a)) => V'=(y*cos(pi/2+a),y*sin(pi/2+a)) Now, there are a few trigonometric formulas that says that cos(pi/2+a)= =sin(a) and sin(pi/2+a)=-cos(a) We get: V'=( y * sin(a) , y * ( -cos(a) ) ) But if we look in the general case, we have a vector V that has both x and y composants. Now we can use the single-cases rotation formulas for calculating the general case with an addition: Vx'=rot((x,0),a) = (x*cos(a) ,x*sin(a)) + Vy'=rot((0,y),a) = ( +y*sin(a), -y*cos(a)) ---------------------------------------------------------V' =rot((x,y),a) = (x*cos(a)+y*sin(a),x*sin(a)-y*cos(a)) (Vx' means rotation of V=(x,0) and Vy' is rotation of V=(0,y)) And we have the rotation of a vector given in coordinates! FINAL FORMULA OF ROTATION IN TWO DIMENSIONS rot( (x,y), a)=( x*cos(a)+y*sin(a) , x*sin(a)-y*cos(a) ) x-composant ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ y-composant
* Three dimensions Now we are getting somewhere! In the 2 dimensions case, we rotated x and y coordinates, and we didn't see any z coordinates that changed. Therefore we call this a rotation around the Z axis. Now, the simpliest thing to do in three dimensions is to still do the same thing, just rotate around any axis to get the new coordinate. Leave out the variable that represents the coordinate of the current rotation-axis, and you can use the very same expression. If you want to rotate only one or two coordinates, you can use the normal method of rotation, because then you won't have to calculate a 3x3 transformation matrix. But if you have more points, I recommend the optimized version. But there are optimizations in this field, but let's first look at ONE way to do this: NORMAL METHOD OF ROTATING A VECTOR WITH THREE GIVEN ANGLES IN 3D: Assume we want to rotate V=(x,y,z) around the z-axis with the angle a, around y with b and around x with c. The first rotation we do is around the Z-axis: U=(x,y) (x,y from V-vector) => => U'=rot(U,a)=rot((x,y),a)=(x',y') Then we want to rotate around the Y-axis: W=(x',z) (x' is from U' and z is from V) => => W'=rot(W,b)=rot((x',z),b)=(x'',z') And finally around the X-axis: T=(y',z') (y' is from U' and z' is from W') => => T'=rot(T,c)=rot((y',z'),c)=(y'',z'')
The rotated vector V' is the coordinate vector (x'',y'',z'') ! With this method we can extend out rot-command to: V''= rot(V,angle1,angle2,angle3) where V is the original vector! ( V''= rot((x,y,z),angle1,angle2,angle3) ) I hope that didn't look too complicated. As I said, there are optimizations of this method. These optimizations can be skipping one rotation of the above ones, or some precalculations. ORDER is very important. You won't get the same answer if you rotate X,Y,Z with the same angles as before.
Optimizations
For xyz vectors we can write the equations to form the rotations: let c1=cos(angle1), c2=cos(angle2), c3=cos(angle3),
s1=sin(angle1), s2=sin(angle2), s3=sin(angle3) (x*cos(a)+y*sin(a),x*sin(a)-y*cos(a)) x' = x*c1+y*s1 y' = x*s1-y*c1 x''= x'*c2+z*s2 z' = x'*s2-z*c2 y''= y'*c3+z'*s3 z''= y'*s3-z'*c3 <- Rotated x-coordinate
which gives: x''= (x*c1+y*s1)*c2+z*s2= c2*c1 *x + c2*s1 *y + s2 *z ^^^^^^^^^^^=x' ^^^^^ xx ^^^^^ xy ^^ xz y''= (x*s1-y*c1)*c3+((x*c1+y*s1)*s2-z*c2)*s3= c3*s1 *x - c3*c1 *y + s3*s2*c1 *x + s3*s2*s1 *y - s3*c2 *z= (s3*s2*c1+c3*s1) *x + (s3*s2*s1-c3*c1) *y + (-s3*c2) *z ^^^^^^^^^^^^^^^^ yx ^^^^^^^^^^^^^^^^ yy ^^^^^^^^ yz z''= (x*s1-y*c1)*s3-((x*c1+y*s1)*s2-z*c2)*c3= s3*s1 *x - s3*c1 *y - c3*s2*c1 *x - c3*s2*s1 *y + c3*c2 *z= (-c3*s2*c1+s3*s1) *x + (-c3*s2*s1-c3*c1) *y + (c3*c2) *z ^^^^^^^^^^^^^^^^^ zx ^^^^^^^^^^^^^^^^^ zy ^^^^^^^ zz Now, look at the pattern of the solutions, for x'' we have calculated something times the original (x,y,z), the same for y'' and z'', What is the connection? Say that you rotate many given vectors with three angles that are the same for all vectors, then you get this scheme of multiplications. When you rotated as above you had to use twelve multiplications to do one rotation, but now we precalculate these 'constants' and manage to get down to nine multiplications! FINAL FORMULA FOR ROTATIONS IN THREE DIMENSION WITH THREE ANGLES (x,y,z is the original (x,y,z) coordinate. c1=cos(angle1), s1=sin(angle1), c2=cos(angle2) and so on...) If you want to rotate a lot of coordinates with the same angles you first calculate these values: xx=c2*c1 xy=c2*s1 xz=s2 yx=c3*s1+s3*s2*c1 yy=-c3*c1+s3*s2*s1 yz=-s3*c2 zx=s3*s1-c3*s2*c1;s2*c1+c3*s1 zy=-s3*c1-c3*s2*s1;c3*c1-s2*s1 zz=c3*c2 Then, for each coordinate, you use the following multiplication to get the rotated coordinates: x''=xx * x + xy * y + xz * z y''=yx * x + yy * y + yz * z z''=zx * x + zy * y + zz * z
So, you only have to calculate the constants once for every new angle, and THEN you use nine multiplications for every point you wish to rotate to get the new set of points. Look in the end of this text for an example of how this can be implemented in 68000-assembler. If you wish to skip on angle, you can optimize further. if you want to remove angle3, set c3=1 and all s3=0 and put into your constant-calculation and it will be optimized for you. What method code, but I to be proud angles, the you want to use depends of course on how much you want to prefer the optimized version since it's more of... If you only rotate a few points with the same first (non-optimized) version might be the choice.
If you want to, you can check that the transformation matrix has a determinant equal to 1.
5. Polygons!
The word "polygon" means many corners, which means that it has a lot of points (corners) with lines drawn to. If you have, for example, 5 points, you can draw lines from point 1 to point 2, from point 2 to point 3, from point 3 to point 4 and from point 4 to point 5. If you want a CLOSED polygon you also draw a line from point 5 to point 1. Points:2 . .3 1 . 5..4 Open polygon of points above: /| / | / / / / _/ Closed polygon of points above: /| / | / / / / _/
"Filled vectors" is created by drawing polygons, and filling inside. Normally the following algorithm is used: First you define all "corners" on the polygon as vectors, which allows you to rotate it and draw it in new angles, and then you draw one line from point 1 to point 2, and so on. The last line is from point 5 to point 1.
When you're finished you use a BLITTER-FILL operation to fill the area. You will need a special line drawing routine for drawing these lines so the BLITTER-FILL works, I have an example of a working line-drawing routine in the appendices (K-seka! Just for CJ!). Further theory about what demands there are on the line drawing routine will be discussed later (App. B 2). There are also other ways to get a filled area (mostly for computers without blitter, or for special purposes on those that have) Information about that will be in later issues.
An object is in this text a three-dimensional thing created with polygons. We don't have to think about what's inside, we just surround a mathematically defined sub-room with polygons. But what happends to surfaces that is on the other side of the object? and if there are hidden "parts" of the object, what can we do about them? We begin with a cube, it is easy to imagine, and also the rotation of it. we can see that no part of the cube is over another part of the cube in the viewers eyes. (compare, for example, with a torus, where there are sometimes parts that hides other parts of the same object) Some areas are of course AIMING AWAY FROM THE VIEWER, but we can calculate in what direction the polygon is facing (to or from the viewer) Always define the polygons in objects in the same direction (clockwise or non-clockwise) in all of the object. imagine that you stand on the OUTSIDE MIDDLE of the plane, and pick all points in a clockwise order. Which one you start with has nothing to do with it, just the order of them. Pick three points from a plane (point1, point2 and point 3) If all three points are not equal to any of the other points, these points define a plane. You will then only need three points to define the direction of the plane. Examine the following calculation: c=(x3-x1)*(y2-y1)-(x2-x1)*(y3-y1)
(This is AFTER 3d->2d projection, so there's no z-coordinate. If you want to know what this does, look in appendix b) This needs three points, which is the minimum number of coordinates a polygon must be, to not be a line or a point (THINK). This involves two multiplications per plane, but that isn't very much compared to rotation and 3d->2d projection. But let us study what this equation gives: If c is negative, the normal vector of the plane which the three points span is heading INTO the viewer ( = The plane is fronting the viewer => plane should be drawed )... If c is POSITIVE, the normal vector of the plane is heading AWAY from the viewer ( = The plane cannot be seen by the viewer => DON'T draw the plane) ...
But to question 2, what happends if parts of the object covers OTHER parts of the object...
Object optimization
Assume that you have a CONVEX object. If it is closed, you have almost as few points as you have planes. If you have a list to every coordinate that exist (no points are the same in this list) that for each polygon shows what point you should fetch for this coordinate, you can cut widely on the number of ROTATIONS. For example:
/* A cube */ /* order is important! Here is clockwise */ end_of_plane=0 pointlist dc.l pt4,pt3,pt2,pt1,end_of_plane dc.l pt5,pt6,pt2,pt1,end_of_plane dc.l pt6,pt7,pt3,pt2,end_of_plane dc.l pt7,pt8,pt4,pt3,end_of_plane dc.l pt8,pt5,pt1,pt4,end_of_plane dc.l pt5,pt6,pt7,pt8,end_of_plane pt1 dc.w -1,-1,-1 pt2 dc.w 1,-1,-1 pt3 dc.w 1,-1,1 pt4 dc.w -1,-1,1 pt5 dc.w -1,1,-1 pt6 dc.w 1,1,-1 pt7 dc.w 1,1,1 pt8 dc.w -1,1,1 Now, you only have to rotate the points pt1-pt8, which is eight points. If you had computed four points for each plane, you would have to compute 24 rotations instead.
To get the normal of these we take the cross product of them: | i j k | N = V1xV2 = |x2-x1 y2-y1 z2-z1| = |x3-x1 y3-y1 z3-z1| n1 n2 = ((y2-y1)*(z3-z1)-(y3-y1)*(z2-z1),-((x2-x1)*(z3-z1)-(x3-x1)*(z2-z1)), ,(x2-x1)*(y3-y1)-(x3-x1)*(y2-y1)) n3 Now, we have N. We also have the LIGHTSOURCE coordinates (given) To get COS of the ANGLE between two vectors we can use the scalar product between N and L (=lightsource vector) divided by the length of N and L: /(||N||*||L||) = * | * (n1*l1+n2*l2+n3*l3)/(sqr(n1*n1+n2*n2+n3*n3)*sqr(l1*l1+l2*l2+l3*l3)) (can be (n1*l1+n2*l2+n3*l3)/k if k is a precalculated constant) This number is between -1 and 1 and is cos of the angle between the vectors L and N. the SQUARE ROOTS take much time, but if you keep the object intact (do only rotations/translatins etc.) and always pick the same points in the object, then ||N|| is intact and can be precalculated. If you make sure the length of L is always 1, you won't have to devide by this, which saves many cycles. The number will, as said, be between -1 and 1. You may have to multiply the number with something before dividing so that you have a larger range to pick colours from. If the number is negative, set it to zero. The number can be NEGATIVE when it should be POSITIVE, this is because you took the points in the wrong order, but you only have to negate the result instead. If you didn't understand a thing of this, look on the formulas with a '*' in the border. n1 means the x-coordinate of N, n2 the y-coordinate and so on, and the same thing with L.
* *
The heaviest weights must fall to the bottom, and bringing the VALUES with it. The values in this case can be the 2d->3d projected x and y coordinates plus bob information. The Weights can be the Z coordinates before projection. Begin with the first two elements, check what element is the HEAVIEST, and if it is ABOVE the lighter element, move all information connected with the WEIGHT and the WEIGHT to the place where the light element was, and put the light data where the heavy was. (This operation is called a 'swap' operation) Step down ONE element and check element 2 and 3.. step further until you're at the bottom of the list. The first round won't fix the sorting, you will have to go round the list THE SAME NUMBER OF TIMES AS YOU HAVE OBJECTS minus one!!!! The many comparisions vote for a faster technique... Algorithm: 1> FOR outer loop=1 TO Items-1 1.1> FOR inner loop=1 TO Items-1 1.1.1> IF Item(inner loop)>Item(inner loop+1) 1.1.1.1> Swap Item(inner loop),Item(inner loop+1) (Items is the number of entries to sort, Item() is the weight of the current item)
4.1> ADD checklist(L),checklist(L+1) 5> FOR L=0 TO number of objects 5.1> PUT ENTRY at 2ndBUF(checklist(transformed weight)) 5.2> ADD ENTRYSIZE TO checklist(transformed weigth) Now, your data is nicely sorted in the list 2ndBUF, the original list is left as it was (except for Z-transformation). (ENTRYSIZE is the size of the entry, so if you have x,y,z coordinates in words, your size is 3 words=6 bytes.) Also try to think a little about what you get when you transform. The subtraction is useful since it minimizes the loops, but lsr-ing the weights take time and makes the result worse. Of course you don't have to scan the list every time, just make sure that you know what the lowest possible and the higest possible weight is.
Quicksort is recursive, which means that you will have to call the routine from within itself. This is not at all complicated, you just have to put some of your old variables on the stack for safe-keeping. What it does is this: +> The first entry in the list is the PIVOT ENTRY. | For each other ENTRY, we put it either BEFORE or AFTER | the PIVOT. If it is lighter than the PIVOT we put it BEFORE, | otherwise we put it AFTER. | Now we have two new lists, All entries BEFORE the PIVOT, | and all entries AFTER the PIVOT (but not the pivot itself, | which is already sorted). | Now we quicksort All entries BEFORE the pivot separately +< and then we quicksort all entries AFTER the pivot. (We do this by calling on the routine we're already in) This may cause problems with the stack if there's too many things to sort. The recursion loop is broken when there's <=1 entry to sort. Contrary to some peoples belief, you don't need any extra lists to solve this.
Algorithm: Inparameters: (PivotEntry=first element of list List size=size of current list) 1> If list size <= 1 then exit 2> PivotWeight=Weight(PivotEntry) 3> for l=2nd Entry to list size-1 3.1> if weight(l) > PivotWeight 3.1.1> insert entry in list 1 3.2> ELSE 3.2.1> insert entry in list 2 4> Sort list 1 (bsr quicksort(first entry list 1, size of list 1)) 5> Sort list 1 (bsr quicksort(first entry list 2, size of list 2)) 6> Link list 1 -> PivotEntry -> list 2 (PivotEntry = FirstEntry, it don't have to be like this, but I prefer it since I find it easier.)
and when you've checked that the blitter is finished, you start bobbing out all images, and when the frame is displayed, you swap screens so you display your finished screen the next frame.
* * * * * *
For this routine, you must have a sinus table of 1024 values, and three words with angles and a place (9 words) to store the transformation matrix. __ . /( |( )|/ '(|) / )|(||/ |)
Calculate_Constants lea lea lea move.w and.w move.w add.w and.w move.w move.w and.w move.w add.w and.w move.w move.w and.w move.w add.w and.w move.w Coses_Sines(pc),a0 Angles(pc),a2 Sintab(pc),a1 (a2),d0 #$7fe,d0 (a1,d0.w),(a0) #$200,d0 #$7fe,d0 (a1,d0.w),2(a0) 2(a2),d0 #$7fe,d0 (a1,d0.w),4(a0) #$200,d0 #$7fe,d0 (a1,d0.w),6(a0) 4(a2),d0 #$7fe,d0 (a1,d0.w),8(a0) #$200,d0 #$7fe,d0 (a1,d0.w),10(a0)
;xx=c2*c1 ;xy=c2*s1 ;xz=s2 ;yx=c3*s1+s3*s2*c1 ;yy=-c3*c1+s3*s2*s1 ;yz=-s3*c2 ;zx=s3*s1-c3*s2*c1;s2*c1+c3*s1 ;zy=-s3*c1-c3*s2*s1;c3*c1-s2*s1 ;zz=c3*c2 lea move.w move.w move.w muls asr.l move.w Constants(pc),a1 6(a0),d0 (a0),d1 d1,d2 d0,d1 #8,d1 2(a0),d3
muls asr.l move.w ;neg.w move.w move.w move.w move.w muls asr.l move.w muls muls muls muls add.l sub.l asr.l asr.l move.w neg.w move.w muls asr.l neg.w move.w move.w move.w muls asr.l move.w move.w move.w muls muls muls muls sub.l asr.l move.w add.l asr.l neg.w move.w muls asr.l move.w rts Coses_Sines Angles Constants dc.w dc.w dc.w
d3,d0 #8,d0 d0,(a1) d1 d1,2(a1) 4(a0),4(a1) 8(a0),d4 d4,d6 4(a0),d4 #8,d4 d4,d5 d2,d5 10(a0),d2 d3,d4 10(a0),d3 d4,d2 d5,d3 #8,d2 #8,d3 d2,6(a1) d3 d3,8(a1) 6(a0),d6 #8,d6 d6 d6,10(a1) 10(a0),d0 d0,d4 4(a0),d0 #8,d0 d0,d1 8(a0),d2 d2,d3 (a0),d0 2(a0),d1 (a0),d2 2(a0),d3 d1,d2 #8,d2 d2,12(a1) d0,d3 #8,d3 d3 d3,14(a1) 6(a0),d4 #8,d4 d4,16(a1)
;Sintab is a table of 1024 sinus values with a radius of 256 ;that I have further down my code...
Screen_widht=40 ;40 bytes wide screen... fill_lines: ;(a6=$dff000, a0=start of bitplane to draw in) cmp.w beq.s ble.s exg exg sub.w move.w asr.w ext.l sub.w muls add.l add.l and.w move.w eor.b ror.w or.w swap tst.w bmi.s cmp.w ble.s move.w exg bra.s move.w bra.s neg.w cmp.w ble.s move.w exg bra.s move.w asl.w move.w move.w sub.w ble.s and.w move.w sub.w or.w lsl.w add.w d1,d3 noline lin1 d1,d3 d0,d2 d2,d0 d2,d5 #3,d2 d2 d3,d1 #Screen_Widht,d3 d2,d3 d3,a0 #$f,d5 d5,d2 #$f,d5 #4,d2 #$0b4a,d2 d2 d0 lin2 d0,d1 lin3 #$41,d2 d1,d0 lin6 #$51,d2 lin6 d0 d0,d1 lin4 #$49,d2 d1,d0 lin6 #$55,d2 #1,d1 d1,d4 d1,d3 d0,d3 lin5 #$ffbf,d2 d3,d1 d0,d3 #2,d2 #6,d0 #$42,d0
lin1:
lin3: lin2:
lin4: lin6:
lin5:
bltwt:
btst bne.s bchg move.l move.l move.l move.w move.l move.w move.w move.w move.w move.l move.w noline: rts
#6,2(a6) bltwt d5,(a0) d2,$40(a6) #-1,$44(a6) a0,$48(a6) d1,$52(a6) a0,$54(a6) #Screen_Widht,$60(a6) d4,$62(a6) d3,$64(a6) #Screen_Widht,$66(a6) #-$8000,$72(a6) d0,$58(a6)
;width
;width
WghtOffs=6 NextOffs=0 QuickSort ;(a5=start of sortlist, ; d0=0 (pointer to first entry, first time=0) ; d1=number of entries)
cmp.w ble.s moveq moveq move.w move.w move.w subq.w .Permute cmp.w
;don't sort if <=1 entries ;size list 1 ;size list 2 ;first Nentry=d0 ;d2=Pivot weight ;d3=2nd entry ;Dbf-loop+skip first ;entry weight
* __ . * /( |( )|/ '(|) * / )|(||/ |) WghtOffs=4 EntrySize=6 InsertSort ;(a5=start of data ; a4=start of checklist ; a3=start of 2ndBUF ; d0 is lowest value of entries ; d1 is highest value ; d2 is number of entries movem.l a4/a5,-(a7) sub.w subq.w subq.w move.w .ClearChecklist clr.w dbf move.w sub.w addq.w dbf movem.l move.w move.w addq.w addq.w dbf moveq add.w move.w dbf movem.l move.w move.w move.l addq.w dbf d0,d1 #1,d2 #1,d1 ;max size of checklist this sort. ;Dbf-loops...
d1,d3 ;clear used entries (a4)+ d3,.ClearCheckList d2,d3 ;transform... d0,WghtOffs(a5) #EntrySize,a5 d3,.Transform (a7),a4/a5 d2,d3 ;Insert next line instead for WghtOffs(a5),d0 ;68000 compatibility... #4,(a5,d0.w*2) ;add.w d0,d0 addq.w #4,(a5,d0.w) #EntrySize,a5 d3,.AddisList #-4,d0 d0,(a4) (a4)+,d0 d1,.GetMemPos (a7)+,a4/a5 WghtOffs(a5),d0 (a4,d0.w),d0 a5,(a3,d0.w) #EntrySize,a5 d2,.PutNewList you have a list of ADDRESSES to I made it this way to flexible (you maybe have more entry than me?). ; #-lwdsize
.Transform
.AddisList
.GetMemPos
.PutNewList
;In this case ;each object. ;make it more ;data in each rts
as a calculation of the normal vector of the plane that the polygon in question spanned. We had three points:
p1(x1,y1) p2(x2,y2) p3(x3,y3)
If we select p1 as base-point, we can construct the following vectors of the rest of the points:
V1=(x3-x1,y3-y1,p) V2=(x2-x1,y2-y1,q)
Where p and q in the z value denotes that we are not interested in this value, but we must take it in our calculations anyway. (These values are NOT the same as the original z-values after the 2d->3d projection) Now, we can get the normal vector of the plane that these vectors span by a simple cross-product:
V1 x V2 = | i j = |(x3-x1) (x2-x1) |(y3-y1) (y2-y1) k| p| q|
But we are only interested in the Z-direction of the result-vector of this operation, which is the same as getting only the Z-coordinate out of the cross-product:
Z of (V1xV2) = (x3-x1)*(y2-y1)-(x2-x1)*(y3-y1)
Now if Z is positive, this means that the resultant vector is aiming INTO the screen (positive z-values) QED /Asterix B 2. How to make a fill line out of the blitters line-drawing
You can't use the blitter line-drawing as it is and draw lines around a polygon without a few special changes.
First, make sure it draws lines as it should, many line-drawers I've seen draws lines to wrong points Make sure you use Exclusive or- instead of or-minterm Always draw lines DOWNWARDS. (or UPWARDS, if you prefer that) Before drawing the line and before blit-check, eor the FIRST POINT ON THE SCREEN THAT THE LINE WILL PASS. Use fill-type line mode.
These are the rotation matrices around the x,y and z axis'. If you would use these you'll get 12 muls'. 4 four for each axis. But, if you multiply these three matrices with eachother you'll get only 9 muls'. Why 9 ??? Simple : after multiplying you'll get a 3x3 matrice, and 3*3=9 ! It doesn't matter if you do not know how to multiply these matrices. It's not important here so I'll just give the 3x3 matrice after multiplying : (c = cos, s = sin, A/B/G are Alpha,Beta and Gamma.) | cA*cB | cG*sA-sB*cA*sG |-sG*sA-sB*cA*cG -cB*sA cA*cG+sG*sA*sB -cA*sG+sA*sB*cG sB | cB*sG | cG*cB |
I hope I typed everything without errors :) Ok, how can we make some coordinates using this matrice. Again, the trick is all in multiplying. To get the new (x,y,x) we need the original points and multiply these with the matrice. I'll work with a simplyfied matrice. (e.g. H = cA*cB etc...) x y z ------------| H I J | K L M | N O P ( <= original coordinates) | | |
New X = x * H + y * I + z * J New Y = x * K + y * L + z * M New Z = x * N + y * O + z * P Ha ! That's a lot more than 9 muls'. Well, actually not. To use the matrice you'll have to precalculate the matrice. Always rotate with your original points and store them somewhere else.
Just change the angles to the sintable to rotate the shape. If you rotate the points rotated the previous frame you will lose all detail until nothing is left. So, every frame looks like this : - pre calculate new matrice with given angles. - Calculate points with stored matrice.
[ ] The resulting points are relative to (0,0). So they can be negative to. Just use a add to get it in the middle of the screen. NOTE: Always use muls,divs,asl,asr etc. Data can be both positive and negative. Also, set the original coordinates as big as possible, and after rotating divide them again. This will improve the quality of the movement. (Michael Vissers)
When doing a muls with a value and then downshifting the value, use and 'addx' to get roundoff error instead of truncated error, for example: moveq #0,d7 DoMtxMul : muls (a0),d0 ;Do a muls with a sin value *256 asr.l #8,d0 addx.w d7,d0 ;roundoff < trunc :
When you do a 'asr' the last outshifted bit goes to the x-flag. if you use an addx with source=0 => dest=dest+'x-flag'. This halves the error, and makes complicated vector objects less 'hacky'.
CALL: MACRO jsr _LVO1(a6) ENDM SystemOff: move.l sub.l CALL CALL CALL CALL move.l CALL moveq.l lea CALL rts SystemOn: move.l moveq.l lea CALL CALL move.l CALL move.l CALL rts IntLevel3: movem.l ... movem.l ; ; d2-d7/a2-a4,-(sp) (sp)+,d2-d7/a2-a4 ; all other registers can be trashed
GfxBase,a6 a1,a1 LoadView WaitTOF WaitTOF OwnBlitter $4.w,a6 Forbid #INTB_VERTB,d0 VBlankServer(pc),a1 AddIntServer
;Claim ownership of blitter ;Forbid multitasking ; INTB_COPER for copper interrupt ;Add my interrupt to system list
;Change for copper interrupt. ;Remove my interrupt ;Permit multitasking ;Give blitter back ;Load original view
If you set your interrupt to priority 10 or higher then a0 must point at $dff000 on exit. moveq rts #0,d0 ; must set Z flag on exit! ;Not rte!!!
exec.library/AddIntServer()
NAME AddIntServer -- add an interrupt server to a system server chain SYNOPSIS AddIntServer(intNum, interrupt) -168 D0-0:4 A1 void AddIntServer(ULONG, struct Interrupt *); FUNCTION This function adds a new interrupt server to a given server chain. The node is located on the chain in a priority dependent position. If this is the first server on a particular chain, interrupts will be enabled for that chain. Each link in the chain will be called in priority order until the chain ends or one of the servers returns with the 68000's Z condition code clear (indicating non-zero). Servers on the chain should return with the Z flag clear if the interrupt was specifically for that server, and no one else. VERTB servers should always return Z set. (Take care with High Level Language servers, the language may not have a mechanism for reliably setting the Z flag on exit). Servers are called with the following register conventions: D0 - scratch D1 - scratch A0 A1 A5 A6 scratch server is_Data pointer (scratch) jump vector register (scratch) scratch
all other registers must be preserved INPUTS intNum - the Paula interrupt bit number (0 through 14). Processor level seven interrupts (NMI) are encoded as intNum 15. The PORTS, COPER, VERTB, EXTER and NMI interrupts are set up as server chains. interrupt - pointer to an Interrupt structure. By convention, the LN_NAME of the interrupt structure must point a descriptive string so that other users may identify who currently has control of the interrupt.
WARNING Some compilers or assemblers may optimize code in unexpected ways, affecting the conditions codes returned from the function. Watch out for a "MOVEM" instruction (which does not affect the condition codes) turning into "MOVE" (which does). BUGS The graphics library's VBLANK server, and some user code, currently assume that address register A0 will contain a pointer to the custom chips. If you add a server at a priority of 10 or greater, you must compensate for this by providing the expected value ($DFF000).
exec.library/RemIntServer()
NAME RemIntServer -- remove an interrupt server from a server chain SYNOPSIS RemIntServer(intNum, interrupt) -174 D0 A1 void RemIntServer(ULONG,struct Interrupt *); FUNCTION This function removes an interrupt server node from the given server chain. If this server was the last one on this chain, interrupts for this chain are disabled. INPUTS intNum - the Paula interrupt bit (0..14) interrupt - pointer to an interrupt server node BUGS Before V36 Kickstart, the feature that disables the interrupt would not function. For most server chains this does not cause a problem.
; remember SSP ; swap USP and SSP ; push return address on stack
that last was needed because it was a subroutine that RTSes (boy did I have porblems working out my crashes before I fixed that) Then I have my exit code:
ReturnWithOS tst.l crashstack beq .nocrash move.l crashstack,sp clr.l crashstack RTE .nocrash
my exit code goes on after this. This made it possible to escape from an interrupt without having to care for what the exception frames look like. (CJ) I haven't tried this because my code never crashes. ;-)
Monitors
If you are using AA-chipmodes, or want to make your code compatible with it, you must also make sure you code works with every MONITOR on the market. Not only the computer (Thanks to Alien/A poor group in Ankara but not Bronx), for spotting this in my JoyRide2 Intro. This is *extremely* dificult. See Monitor Problems in the AGA chapter.
Keyboard Timings
If you have to read the keyboard by hardware, be very careful with your timings. Not only do different processor speeds affect the keyboard timings (for example, in the game F-15 II Strike Eagle on an Amiga 3000 the key repeat delay is ridiculously short, you ttyyppee lliikkee tthhiiss aallll tthhee ttiimmee. You use up an awful lot of Sidewinders very quickly!), but there are differences between different makes of keyboard, some Amiga 2000's came with Cherry keyboards, these have small function keys the same size as normal alphanumeric keys - these keyboards have different timings to the normal Mitsumi keyboards. Use an input handler to read the keyboard. The Commodore guys have spent ages writing code to handle all the different possible hardware combinations around, why waste time reinventing the wheel?
Hardware Differences
The A1200 has a different keyboard to older Amigas. One of the side effects of this is it appears that older hardware-hitting keyboard read routines are not able to register more than one keypress at a time. I currently do not know whether this is a limitation in hardware and if it is possible to read multi-key presses (excepting the obvious CTRL/ALT/SHIFT type combinations) at all... A bit annoying for games writers I would think. If you now are using you own hardware and Interrupts to read the keyboard on faster computers, make sure you ALWAYS have the given time-delay for ALL keyboards you want your program to work with (The delay between or.b #$40,$bfee01 and or.b #$40,$bfee01)! Don't trust delay loops since the cache can speed those up rather drastically! I have seen too many, even commercial, programs that just ignores this and have NO delay code or just a simple dbf-loop. After about 15 keypresses your keyboard is dead and there is no code to reset it. If you can - skip having your own keyboard routines, since they mostly fail anyway.
Important differences: Size: Kickstart 1.2 was 256Kb. Kickstart 3.0 is 512Kb Offsets: *EVERYTHING* changes. Do not make any assumptions about any data in rom, for example reset locations, topaz.font data position. Libraries: Many disk-based libraries under 1.3 are now in ROM, along with disk-validator and other things.... Workbench: Workbench is much improved. Use it. OS Functions: *Many* new OS functions in all libraries. Now much easier to use, and faster. Much faster than under 1.2/1.3
d0 now contains version number. Compare with the following (all values in decimal)
V0 to V32 V33 V34 V35 V36 - Obsolete! No longer supported. - Kickstart 1.2 - Kickstart 1.3 (1.2 with autoboot for HD) - Early beta-kickstart 1.4. Obsolete - Obsolete! Early V2.00-V2.03 supplied with Amiga 3000 Amiga 3000 owners should upgrade to at least V37 - Kickstart 2.04. Final release version of Kickstart 2 - Workbench 2.1 (exec.library should not show this version. All true V38 libraries are disk based) - Kickstart 3.0 - Kickstart 3.1 - Currently only in AmigaCD 32
V37 (V38)
V39 V40
;which Version of Exec ? ;old one -> goto old_kick ;else use Exec-Function
.old_kick: lea .Reset_Code(pc),a5 jsr _LVOSupervisor(a6) ;never reaching this point cnop 0,4 .Reset_Code: lea ROMEND,a0 sub.l SIZE_OFFSET(a0),a0 move.l 4(a0),a0 subq.l #2,a0 reset jmp (a0) ; and in the same END
;get Supervisor-status
Trackloaders
Use CIA timers! DON'T use processor timing. If you use processor timing you will MESS UP the diskdrives in accelerated Amigas. Use AddICRVector to allocate your timers, don't hit $bfxxxx addresses!!! On second thoughts. DON'T use trackloaders! Use Dos...
68000 optimisation
Written by Irmen de Jong, march '93. (E-mail: [email protected]) Some notes added by CJ NOTE! Not all these optimisations can be automatically applied. Make sure they will not affect other areas in your code!
----------------------------------------------------------------------------Original Possible optimisation Examples/notes ----------------------------------------------------------------------------STANDARD WELL-KNOWN optimisATIONS RULE: use Quick-type/Short branch! Use INLINE subroutines if they are small! ----------------------------------------------------------------------------BRA/BSR xx MOVE.X #0 move.l #0,d0 move.l #0,a0 CLR.L Dx CMP #0 MOVE.L #nn,dx ADD.X #nn SUB.X #nn JMP/JSR xx JSR xx;RTS BSR xx;RTS BRA.s/BSR.s xx CLR.X/MOVEQ/SUBA.X -> moveq #0,d0 -> sub.l a0,a0 MOVEQ #0,Dx TST MOVEQ #nn,dx ADDQ.X #nn SUBQ.X #nn BRA/BSR xx JMP xx BRA xx possible if -128<=nn<=127 possible if 1<=nn<=8 same... possible if xx is close to PC save a RTS same... (assuming routine doesn't rely on anything in the stack) if xx is close to PC move.l #0,count -> clr.l count
LSL/ASL #1/2,xx ADD xx,xx [ADD xx,xx] lsl #2,d0 -> 2 times add d0,d0 MULU #yy,xx where yy is a power of 2, 2..256 LSL/ASL #1-8,xx mulu #2,d0 -> asl #1,d0 -> add d0,d0 BEWARE: STATUS FLAGS ARE "WRONG" DIVU #yy,xx where yy is a power of 2, 2..256 LSR/ASR #.. SWAP divu #16,d0 -> lsr #4,d0 BEWARE: STATUS FLAGS ARE "WRONG", AND HIGHWORD IS NOT THE REMAINDER.
ADDRESS-RELATED OPTIMISATIONS RULE: use short adressing/quick adds! ---------------------------------------------------------------------------MOVEA.L #nn MOVEA.W #nn Movea is "sign-extending" thus possible if 0<=nn<=$7fff adda.l #800,a0 -> lea 800(a0),a0 possible if -$8000<=nn<=$7fff lea 6(a0),a0 -> addq.w #6,a0 possible if 1<=nn<=8 move.l 4,a6 -> move.l 4.w,a6 possible if 0<=nnnn<=$7fff (nnnn is SIGN EXTENDED to LONG!) try xx(PC) with the LEA
ADDA.X #nn
LEA nn()
LEA nn()
ADDQ.W #nn
$0000nnnn.l
$nnnn.w
LEA xx,Ay
LEA nnnn(Ax),Ay
copy&add in one
OFFSET-RELATED OPTIMISATIONS RULE: use PC-relative addressing or basereg addressing! put your code&data in ONE segment if possible! ---------------------------------------------------------------------------MOVE.X nnnn MOVE.X nnnn(pc) lea copper,a0 -> lea copper(pc),a0.. LEA nnnn LEA nnnn(pc) ...possible if nnnn is close to PC (Ax,Dx.l) (Ax,Dx.w) possible if 0<=Dx<=$7fff
If PC-relative doesn't work, use Ax as a pointer to your data block. Use indirect addressing to get to your data: move.l Data1-Base(Ax),Dx etc.
TRICKY OPTIMISATIONS ---------------------------------------------------------------------------BSET #xx,yy ORI.W #2^xx,yy 0<=xx<=15 BCLR #xx,yy ANDI.W #~(2^xx),yy " BCHG #xx,yy EORI.W #2^xx,yy " BTST #xx,yy ANDI.W #2^xx,yy " Best improvement if yy=a data reg. BEWARE: STATUS FLAGS ARE "WRONG". SILLY OPTIMISATIONS (FOR OPTIMISING COMPILER OUTPUTS ETC) ---------------------------------------------------------------------------MOVEM (one reg.) MOVE.l movem d0,-(sp) -> move.l d0,-(sp) MOVE xx,-(sp) 0(Ax) MULU/MULS #0 MULU #1,xx MULS #1,xx PEA xx (Ax) CLR.L moveq #0,Dx with data-registers. possible if xx=(Ax) or constant.
SWAP CLR SWAP high word is cleared with mulu #1 SWAP CLR SWAP EXT.L see MULU, and sign exteded. BEWARE: STATUS FLAGS ARE "WRONG"
Example: imagine you want to eor 4096 bytes beginning at (a0). Solution one:
.1 move.w eori.b dbra #4096-1,d7 d0,(a0)+ d7,.1
Consider the loop from above. 4096 times a eor.b and a dbra takes time. What do you think about this:
.1 move.w eor.l dbra #4096/4-1,d7 d0,(a0)+ ; d0 contains byte repeated 4 times d7,.1
Eors 4096 bytes too! But only needs 1024 eor.l/dbras. Yeah, I hear you smart guys cry: what about 1024 eor.l without any loop?! Right, that IS the fastest solution, but is VERY memory consuming (2 Kb). Instead, join a loop and a few eor.l:
.1 move eor.l eor.l eor.l eor.l dbra #4096/4/4-1,d7 d0,(a0)+ d0,(a0)+ d0,(a0)+ d0,(a0)+ d7,.1
This is faster than the loop before. I think about 8 or 16 eor.l's is just fine, depending on the size of the mem to be handled (and the wanted speed!). Also, mind the cache on 68020+ processors, the loop code must be small enough to fit in it for highest speeds. Try to do as much as possible within one loop (but considering the text above) instead of a few loops after each other.
MEMORY CLEARING/FILLING. ----------------------------------------------------------------------------
A common problem is how to clear or fill some memory in a short time. If it is CHIP-MEMORY, use the blitter (only D-channel, see below). In this case you can still do other things with your 680x0 while the blitter is busy erasing. If it is FAST-MEMORY, you can use the method from above, with clr.l instead of eor.l, but there is a much faster way:
move.l sp,TempSp lea MemEnd,sp moveq #0,d0 ;...for all 7 data regs... moveq #0,d7 move.l d0,a0 ;...for 6 address regs... move.l d0,a6
Now, repeat this instruction as often as required to erase the memory. (memsize/60 times). You may need an additional movem.l to erase the last few bytes. Get sp(=a7) back at the end with (guess..):
move.l TempSp,sp
If you are low on mem, put a few movem.l in a loop. But, now you need a loop-counter register, so you'll only clear 56 bytes in one movem.l.
In the case of CHIP memory, you can use both the blitter and the processor simultaneously to clear much CHIP mem in a VERY short time... It takes some experimentation to find the best sizes to clear with the blitter and with the processor. BUT, ALWAYS USE A WaitBlit() AFTER CLEARING SIMULTANEOUSLY, even if you think you know that the blitter is finished before your processor is.
and it magically works! For those without Devpac, the relevent code is included in this archive as constartup.i
The return code is the value in D0 at the end of your program, so for a clean exit, always clear d0 immediately before your final RTS. Of course you can use the return code in your code to allow conditional branching after your code in a script file. For example:
* Simple example - assemble to checkbutton opt lbtst bne.s moveq rts .notpressed moveq rts #6,$bfe001 .notpressed #0,d0 ; check left mouse button (hardware)
#5,d0
Assemble this, and you have a program that can tell if the mouse button is pressed during bootup. Ideal for switching between startup sequences, for example with this amigados script file.
checkbutton if WARN execute s:startup-nomousepressed else execute s:startup-mousepressed endif
Now, only essential system activity will dare to steal time from your code. This means you can now carry on using dos.library to load files from hard drives, CD-ROM, etc, while your code is running. Try using this instead of Forbid() and Permit(), and insert a new floppy disk while your code is running. Wow... The system recognises the disk change.... But remember to add your input handler!!! Of course this is purely up to you. You may prefer to Forbid() when your code is running (it makes it easier to write).
Sprite Initialisation
Some people don't initialize the sprites they don't want to use correctly. (This reminds me of Soundtracker.) A common error is unwanted sprites pointing at address $0. If the longword at address $0 isn't zero you'll get some funny looking sprites at unpredictable places. The right way of getting rid of sprites is to point them to an address you for sure know is #$00000000 (0.l), and with AGA you may need to point to FOUR long words of 0 on a 64-bit boundary
CNOP pointhere: dc.l 0,8 0,0,0,0
The second problem is people turning off the sprite DMA at the wrong time. Vertical stripes on the screen are not always beautiful. Wrong time means that you turn off the DMA when it is "drawing" a sprite. It is very easy to avoid this. Just turn off the DMA when the raster is in the vertical blank area. Currently V39 Kickstart has a bug where sprite resolution and width are not always reset when you run your own code. See Fixing Sprites in AGA
And then it goes on to use 300 as a hard coded value, never refering to DMAWait! Now, until I can get some free time to write a reliable scanline-wait routine to replace their DBRA loops (does anyone want to write a better Protracker player? Free fame & publicity :-), I suggest you change the references to 300 in the code (except in the data tables!) to DMAWait, and you make the DMAWait value *MUCH* higher. I use 1024 on this Amiga 3000 without any apparent problem, but perhaps it's safer to use a value around 2000. Amiga 4000/040 owners and those with 68040 cards tell me that between 1800 and 2000 are reasonable values... There is a better Protracker replay routine in the source/ folder.
.pal
.pal
This test *may* work under 1.3, but the code in Kickstart 1.2/1.3 rom is totally broken, so it can guess wrong about NTSC/PAL quite often! Check startup.asm for a way to combine the two tests together... This is fine *EXCEPT* for one thing... It only tells you what video system the system was booted under. If you have a PAL machine and you run a 60hz interlaced workbench (for less flicker) it's fine because the demo still runs in 50hz (as long as your system runs from 50hz power). However, NTSC owners can lose out, because if their display is capable of PAL (by running a PAL fixer or running a PAL display mode) this code completely ignores them and runs NTSC anyway, however, if NTSC users select PAL from their boot menu (2.x and 3.0 only) then it will work. For demos and games you'd probably only want to run 50Hz anyway..
Now, if you want to force a machine into the other display system you need some magic pokes: Here you go (beware other bits in $dff1dc can do nasty things. One bit can reverese the polarity on the video sync, not to healthy for some monitors I've heard...) To turn a NTSC system into PAL (50Hz)
move.w #32,$dff1dc ; Magically PAL
Remember: Not all displays can handle both display systems! Commdore 1084/1084S, Philips 8833/8852 and multisync monitors will, but very few US TV's will handle PAL signals. It might be polite for PAL demos to ask NTSC users if they wish to switch to PAL (by the magic poke) or quit.
$VER: startup.asm V7.tested (17.4.92) Valid on day of purchase only. No re-admission. No rain-checks. Now less bugs and more likely to work. Tested with Hisoft Devpac V3 and Argasm V1.09d Now added OS legal code to switch sprite resolutions. Ok, big deal this 'demo' doesn't use any sprites. But that's the sort of effort I go to on your behalf! - CJ opt l-,CHKIMM ; auto link, optimise on ; need not be in chipram
section mycode,code incdir include include include include include include include
"include:" "exec/types.i" "exec/funcdef.i" ; keep code simple and "exec/exec.i" ; the includes! "libraries/dosextens.i" "graphics/gfxbase.i" "intuition/screens.i" "graphics/videocontrol.i" ; Well done CBM! ; They keep on ; forgetting these! ; Allows startup from icon
move.l 4.w,a6 ; get ExecBase lea intname(pc),a1 ; moveq #39,d0 ; Kickstart 3.0 or higher jsr _LVOOpenLibrary(a6) move.l d0,_IntuitionBase ; store intuitionbase Note! if this fails then kickstart is <V39.
4.w,a6 ; get ExecBase gfxname(pc),a1 ; graphics name #33,d0 ; Kickstart 1.2 or higher _LVOOpenLibrary(a6) d0 End ; failed to open? Then quit
move.l _GfxBase,a6 .skip sub.l a1,a1 jsr _LVOLoadView(a6) jsr _LVOWaitTOF(a6) jsr _LVOWaitTOF(a6) ; ; ; ; ; ; ; ;
; ; ; ;
Note: Something could come along inbetween the LoadView and your copper setup. But only if you decide to run something else after you start loading the demo. That's far to stupid to bother testing for in my opininon!!! If you want to stop this, then a Forbid() won't work (WaitTOF() disables Forbid state) so you'll have to do Forbid() *and* write your own WaitTOF() replacement. No thanks... I'll stick to running one demo at a time :-) move.l cmp.w blt.s 4.w,a6 #36,LIB_VERSION(a6) ; check for Kickstart 2 .oldks ; nope...
; kickstart 2 or higher.. We can check for NTSC properly... move.l btst bne.s bra.s .oldks _GfxBase,a6 #2,gb_DisplayFlags(a6) .pal .ntsc
#mycopper,$dff080.L .lp
#mycopperntsc,$dff080.L
.lp
CloseDown: tst.l beq.s bsr _IntuitionBase .sk ; Intuiton open? ; if not, skip...
ReturnSpritesToNormal
.sk
move.l wbview(pc),a1 move.l _GfxBase,a6 jsr _LVOLoadView(a6) jsr _LVOWaitTOF(a6) jsr _LVOWaitTOF(a6) move.l move.l jsr move.l move.l jsr move.l beq.s move.l jsr
; and rethink....
; close graphics.library
_IntuitionBase,d0 End ; if not open, don't close! d0,a1 _LVOCloseLibrary(a6) #0,d0 ; clear d0 for exit ; back to workbench/cli
; ; ; ; ; ; ; ;
This bit fixes problems with sprites in V39 kickstart it is only called if intuition.library opens, which in this case is only if V39 or higher kickstart is installed. If you require intuition.library you will need to change the openlibrary code to open V33+ Intuition and add a V39 test before calling this code (which is only required for V39+ Kickstart)
FixSpritesSetup: move.l _IntuitionBase,a6 lea wbname,a0 jsr _LVOLockPubScreen(a6) tst.l beq.s move.l move.l move.l lea move.l jsr move.l move.l move.l move.l move.l lea jsr move.l move.l jsr d0 .error d0,wbscreen d0,a0
sc_ViewPort+vp_ColorMap(a0),a0 taglist,a1 _GfxBase,a6 ; open graphics.library first! _LVOVideoControl(a6) ; resolution,oldres ; store old resolution
#SPRITERESN_140NS,resolution #VTAG_SPRITERESN_SET,taglist wbscreen,a0 sc_ViewPort+vp_ColorMap(a0),a0 taglist,a1 _LVOVideoControl(a6) ; set sprites to lores wbscreen,a0 _IntuitionBase,a6 _LVOMakeScreen(a6)
jsr
_LVORethinkDisplay(a6)
; Sprites are now set back to 140ns in a system friendly manner! .error rts ReturnSpritesToNormal: ; If you mess with sprite resolution you must return resolution ; back to workbench standard on return! This code will do that... move.l beq.s move.l move.l lea move.l move.l jsr move.l move.l jsr move.l sub.l jsr .error rts wbscreen,d0 .error d0,a0 oldres,resolution ; change taglist taglist,a1 sc_ViewPort+vp_ColorMap(a0),a0 _GfxBase,a6 _LVOVideoControl(a6) ; return sprites to normal. _IntuitionBase,a6 wbscreen,a0 _LVOMakeScreen(a6) wbscreen,a1 a0,a0 _LVOUnlockPubScreen(a6)
wbview dc.l 0 _GfxBase dc.l 0 _IntuitionBase dc.l oldres dc.l 0 wbscreen dc.l 0
taglist dc.l VTAG_SPRITERESN_GET resolution dc.l SPRITERESN_ECS dc.l TAG_DONE,0 wbname gfxname intname dc.b dc.b dc.b "Workbench",0 "graphics.library",0 "intuition.library",0
section mydata,data_c mycopper dc.w $100,$0200 dc.w $180,$00 dc.w $8107,$fffe dc.w $180 co dc.w $f0f dc.w $d607,$fffe dc.w $180,$ff0 dc.w $ffff,$fffe dc.w $ffff,$fffe
; wait for $8107,$fffe ; background red ; wait for $d607,$fffe ; background yellow
mycopperntsc dc.w dc.w dc.w dc.w dc.w dc.w dc.w dc.w end
; otherwise no display! ; ; ; ; wait for $6e07,$fffe background red wait for $b007,$fffe background yellow