gee, imagine me stuffing it up :P (this one's wrapped properly)
| Follow Ups | Post Followup | CTC's Advanced Rom Hacking and Translation Underground |
Posted by TheGun ( | 202.12.144.19) on June 25, 1999 at 02:33:21:
In Reply to: asm3.htm posted by TheGun on June 25, 1999 at 02:31:19:
65816 Opcodes
Covered in this Document:
This
document covers the opcodes native to the 65816 cpu. If you think you
can skim over this section once or twice, then come back for reference
when you need it, you won't get very far at all.
Even though some opcodes are more important (read common) than others, you are advised to study this entire listing many times over. Not having a thorough understanding of opcodes when you are reading or writing code will
cause problems. Granted there's a lot of information here, but
assembly is as much the ability to memorize and regurgitate (think
typical highschool busy-work) as any other talent.
Opcode Structure
Until
now, the use of the word opcode and instruction has been fairly
interchangeable. From now on, the correct terminology will be used - an
instruction is a category of opcodes (such as LDA), but an opcode is a
single instruction bound to one addressing mode (LDA Absolute). In
other words, ADD is an instruction as the addressing mode is not
specified, but ADD #1234 is an opcode with operand. No two opcodes are
the same - each addressing mode of each instruction has it's own hex
code specifying such. For example:
LDA #$1234 ; in a hex editor, this would appear as A9 34 12
LDA $1234 ; in a hex editor, this would appear as AD 34 12
As
you should see, the two operations above use the same instruction
(LDA), but appear differently in a hex editor - LDA Immediate being A9h
and LDA Absolute being ADh. A9h and ADh are two different opcodes, but share the same instruction.
The Opcodes
This section will have the following format:
| Instruction
Description
Examples
Flags Affected Addressing Mode
#
ab
abl
[d],y
| Syntax
XXX #$9000
XXX $0780
XXX $089044
XXX [$0A],y
| # of Bytes
3*
3
4
2
|
* Exceptions |
This borrows greatly from other documents, residing at 6502.txt and *****, whose authors I am unsure of. Whoever created them, I would like to say now that their efforts are greatly appreciated.
The instructions are grouped so that similar opcodes follow each other. It's a bit more logical than alphabetical sorting.
Of great importance is the Addressing Mode table. The addressing modes listed there are the only possible functions of that instruction.
If you wrote LDX $800000 and tried to assemble it, your assembler
would (hopefully) give you an invalid operand error. That is because
LDX Absolute Long is just not possible - that instruction was not
allocated a hex code when the official spec was written, so it didn't
make it's way into the chip. Basically, if you want to use a certain
instruction in your code, make sure the addressing mode you're trying to
use is valid.
As
you must already know, this instruction loads either 1 or 2 bytes into
the A register. The number of bytes loaded depends on the status of the
P register's M flag.
D = 0100h
DB = 80h
S = 01FDh
M flag = 0 LDA $8000 ; Load into A the 2 bytes at $808000 (absolute addressing)
LDA $60 ; Load 2 bytes from the address $000160 (direct addressing)
LDA $01,s ; Load 2 bytes from $0001FE (stack relative)
When
you load data into the A register, you inherently alter flags in the P
register. Remember that some flags in the P register are constantly
updated relative to your actions. Here are the flags you could alter,
along with the data that would trigger the change.
N (Negative) LDA #$8000 ; since the high bit of A will become set after this, it's presumed negative Z (Zero) LDA #$0000 ; since A will now be zeroed, the Zero flag becomes set
Addressing Mode
#
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
LDA #$50
LDA $8000
LDA $8000,X
LDA $8000,Y
LDA $C01000
LDA $C01000,X
LDA $01
LDA $01,X
LDA ($50)
LDA ($50),Y
LDA [$03]
LDA [$03],Y
LDA ($80,X)
LDA $03,S
LDA ($03,S),Y
| Bytes
2*
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
This
instruction is very similar to LDA, but with far fewer addressing modes
to use. This lack of flexibility is fitting, as you can't perform
arithmetic on X anyway. The number of bytes loaded into X depends on
the IndeX flag of the P register - if that flag is set you only load 1
byte.
D = 0100h
X flag = 0 LDX #$120 ; Load into X the constant 120h (immediate addressing)
LDX $60 ; Load 2 bytes from the address $000160 (direct addressing)
Flags:
N (Negative) LDX #$8000
Z (Zero) LDX #$0000
Addressing Mode
#
ab
ab,y
d
d,y
| Syntax
LDX #$80
LDX $8000
LDX $8000,Y
LDX $04
LDX $04,Y
| Bytes
2*
3
3
2
2
|
* Operand is 1 byte when X flag = 1, 2 bytes if X is 0
This
instruction is almost the same as LDX, differing only in 2 of the
addressing modes. The number of bytes loaded into Y depends on the
IndeX flag of the P register - if that flag is set you only load 1 byte.
D = 0100h
X flag = 0 LDY #$120 ; Y = 120h (immediate addressing)
LDY $60 ; Load 2 bytes from the address $000160 (direct addressing)
Flags:
N (Negative) LDY #$8000
Z (Zero) LDY #$0000
Addressing Mode
#
ab
ab,x
d
d,x
| Syntax
LDY #$80
LDY $8000
LDY $8000,X
LDY $04
LDY $04,X
| Bytes
2*
3
3
2
2
|
* Operand is 1 byte when X flag = 1, 2 bytes if X is 0
This
instruction stores the contents of the A register to a location
specified by the operand. The number of bytes you store is affected by
the M flag (as almost all Accumulator instructions are). If M=0, the
low byte of A will be stored at a location, then the high byte will be
stored in the following location (location + 1).
D = 0100h
DB = 80h
S = 01FDh
M flag = 0 STA $1000 ; Store 2 bytes at $801000 (absolute addressing)
STA $60 ; Store 2 bytes at $000160 (direct addressing)
STA $01,s ; Store 2 bytes at $0001FE (stack relative)
None of the flags in the P register are affected by the STA, STX and STY operations.
Addressing Mode
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
STA $1000
STA $1000,X
STA $1000,Y
STA $7E0000
STA $7E0000,X
STA $03
STA $03,X
STA ($06)
STA ($06),Y
STA [$10]
STA [$10],Y
STA ($10,X)
STA $01,S
STA ($01,S),Y
| Bytes
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
This
instruction is very similar to STA - again differing only in the
available addressing modes. Also, the number of bytes you store depends
on the Index bit in the P register (the X bit).
D = 0100h
DB = 7Eh
X flag = 1 STX $9000 ; Store 1 byte at $7E9000 (absolute addressing)
STX $60 ; Store 1 byte at $000160 (direct addressing)
Addressing Mode
ab
d
d,y
| Syntax
STX $2000
STX $97
STX $97,y
| Bytes
3
2
2
|
As
you should have guessed - virtually identical to STX except for a
single addressing mode. The P register's X flag controls the number of
bytes stored.
D = 0100h
DB = 04h
X flag = 0 STY $9000 ; Store 2 bytes at $049000 (absolute addressing)
STY $F0 ; Store 2 bytes at $0001F0 (direct addressing)
Addressing Mode
ab
d
d,x
| Syntax
STY $F000
STY $0A
STY $0A,X
| Bytes
3
2
2
|
This instruction adds the operand onto the value in A, and also adds the Carry flag (hence Add with Carry).
You may remember that the carry flag is set (amongst other
circumstances) when an addition results in a number larger than the A
register can hold. This quality can be used to obtain addition results
larger than 2 bytes - after adding 2 values, if the carry flag is set
you know the answer's greater than FFFFh.
As
always, though, the size of the numbers added depends on the M flag -
if it's set to 1, you can only add 1 byte from the operand onto A's
lower byte, giving addition results from 0 to FFh (1FEh using the carry
bit). When M=0, addition can give answers from 0 to FFFFh (1FFFEh using
a carry).
Since the carry bit is always
added, it is customary (and strongly advised) that this flag is cleared
before using ADC. This is done with the CLC (Clear Carry) opcode.
DB = C0h
S = 01FFh
M flag = 0
X flag = 0 PHX ; push 2 bytes from X onto the stack (at locations $0001FF and $0001FE)
CLC ; make sure the Carry flag is clear (0)
ADC $01,s ; add the A register, the carry flag and the 2 bytes at $0001FE
PLX ; pull X back off the stack
LDA #$0100 ; A = 0100h
CLC ; Carry flag = 0
ADC $8000 ; A now equals 8100h
Flags:
N (Negative) LDA #$7000
CLC
ADC #$8000 ; the high bit of A will become set after this operation
V (Overflow) LDA #$7000
CLC
ADC #$7000 ; A and the operand are positive but the result's negative - a signed overflow is triggered
Z (Zero) LDA #$8000 ; since A will now be zeroed, the Zero flag becomes set
CLC ; (the carry and overflow flags would also be set here)
ADC #$8000
C (Carry) LDA #$F000 ; doing this sum on paper would give you a carry after the highest bit (10000h)
CLC
ADC #$2000
Addressing Mode
#
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
ADC #$80
ADC $1000
ADC $1000,X
ADC $1000,Y
ADC $C11000
ADC $C11000,X
ADC $09
ADC $09,X
ADC ($0B)
ADC ($0B),Y
ADC [$0D]
ADC [$0D],Y
ADC ($0B,X)
ADC $01,S
ADC ($01,S),Y
| Bytes
2*
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
This
instruction subtracts the operand from the value in A, and uses the
Carry flag as somewhere to borrow from. If the subtraction didn't
require a borrow, the Carry flag remains set (since you -always- set the
carry flag before a SBC, otherwise there'd be nowhere to borrow from!).
If the subtraction required a borrow, the Carry flag would be zeroed.
If you don't know what borrowing means in relation to subtraction, read
a maths book.
Of course, you will have
realized that the number of bytes you subtract from and with depends on
the M flag's setting. Why do you think I've been repeating that all
this time?!
DB = C0h
S = 01FFh
M flag = 0
X flag = 0 PHY ; push 2 bytes from Y onto the stack (low byte at $0001FE, high byte at $0001FF)
SEC ; make sure the Carry flag is set (so we can borrow from it)
SBC $01,s ; subtract Y (the two bytes at $0001FE) from A, storing the result in A
PLY ; pull Y back off the stack
LDA #$0100 ; A = 0100h
SEC ; Carry flag = 1
SBC $8000 ; A now equals 8100h, Carry flag cleared (we needed a borrow)
Flags:
N (Negative) LDA #$1000 ; the high bit of A will become set after this operation
SEC
SBC #$8000
V (Overflow) LDA #$1000 ; both numbers are positive but the answer's negative - a signed overflow is triggered
SEC
SBC #$2000
Z (Zero) LDA #$8000 ; since A will now be zeroed, the Zero flag becomes set
SEC
SBC #$8000
C (Carry) LDA #$1000 ; this subtraction requires a borrow - that borrow is 'taken' from the Carry flag,
SEC ; so after the SBC, the carry flag would be cleared
SBC #$2000
Addressing Mode
#
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
SBC #$80
SBC $0100
SBC $0100,X
SBC $0100,Y
SBC $808100
SBC $808100,X
SBC $77
SBC $77,X
SBC ($88)
SBC ($88),Y
SBC [$99]
SBC [$99],Y
SBC ($A0,X)
SBC $01,S
SBC ($01,S),Y
| Bytes
2*
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
This
instruction shifts the operand left by one bit, effectively doubling
the source. The highest bit of the operand is shifted into the Carry
flag, useful for creating a bitplane of on/off flags (once you shift,
you test the carry flag). Quick doubling and bitplanes are the most
common uses for this instruction. The lowest bit of the target is
always zeroed after an ASL.
This
instruction can operate on both the A register or a memory location, so
the number of bytes affected at a time is governed by the M flag. In
the case of the A register being shifted when M=1, the process is
simple:
On
the other hand, if a 2-byte memory location is shifted (M=0), the
reverse byte ordering of the 65816 makes things a bit more complicated:
Generally
you won't have to worry about this complication, but it helps to be
aware of nuances like this, especially when debugging 'interesting'
code.
D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0 LDA $80 ; A = %1111111100000000 = FF00h, Carry flag unknown
ASL A ; A = %1111111000000000 = FE00h, Carry flag is set
LDA #$0100 ; A = %0000000100000000, Carry flag unknown
ASL A ; A = %0000001000000000, Carry flag is cleared
Flags:
N (Negative) LDA #$4000
ASL A ; A's highest bit will become set after this
Z (Zero) LDA #$8000
ASL A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h
C (Carry) ASL $80 ; the high byte ($000081) has it's highest bit set, which is moved into the Carry flag
Addressing Mode
A
ab
ab,x
d
d,x
| Syntax
ASL A
ASL $8000
ASL $8000,X
ASL $90
ASL $90,X
| Bytes
1
3
3
2
2
|
Similar to ASL, this instruction shifts the operand right by one bit, effectively halving and
rounding down the source. The lowest bit of the operand is shifted
into the Carry flag, giving similar bitplane uses to ASL. The highest
bit of the operand is always made zero after LSR is executed.
It
is useful that the Carry flag is altered by this instruction, as it
allows you to divide by powers of 2 with a remainder. For example:
- Say the A register contains "3" ( %00000011 )
- If this is shifted left one bit, it will become 1 ( %00000001 )
- Since
the Carry flag is now set, though, you can say "3 divided by 2 equals 1
(the A register) with remainder 1 (the Carry flag).
The
actual code involved in this division/remainder use for LSR is a bit
too complex for this section, but will be covered in a later section.
D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0 LDA $80 ; A = %1111111100000000 = FF00h, Carry flag unknown
LSR A ; A = %0111111110000000 = 7F80h, Carry flag is cleared
LDA #$0100 ; A = %0000000100000000 = 0100h, Carry flag unknown
LSR A ; A = %0000000010000000 = 0080h, Carry flag is cleared
Flags:
N (Negative) Since the highest bit is cleared, the N flag is -always- set to 0 by LSR
Z (Zero) LDA #$0001
LSR A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h
C (Carry) LDA #$FFFF
LSR A ; the lowest bit (which is set) is moved into the Carry flag
Addressing Mode
A
ab
ab,x
d
d,x
| Syntax
LSR A
LSR $1000
LSR $1000,X
LSR $05
LSR $05,X
| Bytes
1
3
3
2
2
|
Again,
this instruction is similar to ASL and LSR, though it is not quite as
destructive. Whereas ASL and LSR moved zeroes into the lowest/highest
bits, the ROL instruction moves the Carry flag into the highest
bit, so it's possible to continuously rotate the operand without
eventually destroying the data therein. This feature of the ROL/ROR
instructions lets you shift a large, contiguous block of memory left or
right without zeroing bits in-between. Here is a demonstration:
In that example, you could keep ROL'ing A until it reached C0h again.
D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0 LDA $80 ; A = %1111111100000000 = FF00h
SEC ; A = %1111111100000000 = FF00h, Carry flag is set
ROL A ; A = %1111111000000001 = FE01h, Carry flag set
ROL A ; A = %1111110000000011 = FC03h, Carry flag set
Flags:
N (Negative) LDA #$4000
ROL A ; A's highest bit will become set
Z (Zero) CLC ; so we don't rotate a 1 into A
LDA #$8000
ROL A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h
C (Carry) LDA #$FFFF
ROL A ; the highest bit (which is set) is moved into the Carry flag
Addressing Mode
A
ab
ab,x
d
d,x
| Syntax
ROL A
ROL $1200
ROL $1200,X
ROL $03
ROL $03,X
| Bytes
1
3
3
2
2
|
As
you should have guessed, this instruction is very similar to the
previous 3 - in this case the operand is shifted to the right by 1 bit,
the Carry flag is moved into the highest bit, and the lowest bit
(shifted out of existence) is moved into the Carry flag.
D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0 LDA $80 ; A = %1111111100000000 = FF00h
SEC ; A = %1111111100000000 = FF00h, Carry flag is set
ROR A ; A = %1111111110000000 = FF80h, Carry flag cleared
ROR A ; A = %0111111111000000 = 7FC0h, Carry flag cleared
Flags:
N (Negative) SEC
ROR A ; A's highest bit will become set (Carry flag shifted into A)
Z (Zero) CLC ; so we don't rotate a 1 into A
LDA #$0001
ROR A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h
C (Carry) LDA #$FFFF
ROR A ; the lowest bit (which is set) is moved into the Carry flag
Addressing Mode
A
ab
ab,x
d
d,x
| Syntax
ROR A
ROR $1000
ROR $1000,X
ROR $09
ROR $09,X
| Bytes
1
3
3
2
2
|
PHA,
and the next 6 push instructions, are all extremely similar. PHA
stores the contents of the A register (1 or 2 bytes, depending on the M
flag) at the memory location pointed to by the S register. This action
changes the value of the S register to point to the next free byte on
the stack.
S = 01FFh
M flag = 0 LDA #$7700
PHA ; 00h is stored at $0001FF, 77h at $0001FE
; S register now contains 01FDh ( S = S - # of bytes pushed )
No flags are affected by the Push instructions.
Since
all the other push instructions work the same as PHA, here's a brief
listing of what they push and how many bytes end up on the stack:
The
reason for pushing the PB register may not be obvious, as PB can only
be modified with a jump-style instruction. It's most often used to make
sure the DB register points to the same bank as PB, so any addressing
modes that are DB sensitive will load from the bank whose code is
currently being run.
PLA,
and the next 5 pull instructions, are also extremely similar to one
another (surprised?). PLA loads into A either 1 or 2 bytes from the S
register + 1 (+1 because S points to the next free byte). The S register is then incremented by the number of bytes pulled.
S = 01FFh
M flag = 1 PHB ; The DB register is stored at $0001FF, the Stack register is decremented by 1
PLA ; load a byte from S+1, which is the value of DB we just pushed
That bit of code allows us to transfer DB to A, something not possible with the set of transfer instructions.
You
should remember that the push instructions don't alter any flags -
however the pull instructions do. For most of these instructions, the
flags altered are the same as those for a typical LDA instruction, since
after all a PLA is effectively doing LDA (S+1).
Flags affected by PLA, PLB, PLD, PLX and PLY:
N (Negative) LDA #$80 ; Negative flag is set
PHA ; flags unchanged
LDA #$10 ; Negative flag is cleared
PLB ; Negative flag is set again - DB now contains #$80 Z (Zero) LDA #$00 ; Zero flag is set
PHA ; Zero flag unchanged
INC A ; Zero flag cleared (A = 01h)
PLA ; Zero flag set again (A = 00h)
The
PLP instruction can alter any and all of the flags in the P register -
since you're simply pulling a byte off the stack and sticking it in P:
M flag = 1 LDA #$20
PHA
PLP ; bit 5 of P is now set, all other flags are cleared.
PLP cannot alter the emulation bit, however, as that bit is only changeable with the XCE instruction.
The
branch instructions are (almost all) conditional operators - they alter
which course your code takes depending on conditions specified by the P
register. These instructions are immensely important to assembly on
any cpu - think how limited code would be that couldn't say "If this,
run code x, if that, run code y". Understanding these instructions is vital to any 65816 assembly work you wish to do.
DB = 00h
$000800 = 40h
M flag = 1 LDA $0800 ; A = 40h
ASL A ; Carry flag becomes clear (high bit moved into carry)
BCC SomeCode ; if the Carry flag is clear, branch to the code label SomeCode
LDA #$05 ; if the Carry flag was set, this code would be executed as the above branch would fail
SomeCode ; this is a code label - the assembler uses these to figure out where branches go
STA $0800 ; Store A at $000800. The value stored depends on whether the BCC was successful.
In
this example, the ASL A affected the carry flag. If that flag became
clear, the BCC SomeCode would make the CPU jump to SomeCode. If the
flag became set, the CPU would look at the BCC instruction and
completely ignore it, going straight on to the next instruction. Here
we see a basic 'if' statement - if the highest bit of $000800 was set,
store #$05 at $000800. If the highest bit was clear, store the shifted
value there instead.
No flags are affected by the branch instructions, whether they succeed or fail.
BCS
is extremely similar to the BCC, only the branching condition is
reversed. This is helpful for similar conditions to BCC, such as
bitplanes, as well as extended addition. By extended addition, I mean
calculating sums that would otherwise be too large for the A register to
express:
$000080 = 00h
$000081 = 80h
D = 00h
M flag = 0 LDA $80 ; A = 8000h
CLC
ADC #$8000 ; A = 0000h, Carry flag becomes set
STA $80
BCS Carry ; if the carry flag became set, jump to the code label Carry
RTS ; if the carry flag was clear, this code would have been executed
Carry
INC $82 ; if a carry occurred, increment the high byte of $80
RTS ; RTS causes a subroutine to return to the code that called it
This
code isn't the most efficient that could have been written, but it
serves the purpose. Here, a number has 8000h added to it, then if the
result is greater than 10000h (carry flag set, in other words), the 2
bytes at $82 are incremented. Remember the reverse byte ordering - $80
is the lowest byte, $81 the high byte. This can be extended to say $82
is an even higher byte, and $83 is higher still. This code actually
allows 32 bit addition - though some extra code would be needed for it
to be actually useful.
BEQ,
along with its sister function BNE, branch depending on the status of
the Zero flag. If the zero flag becomes set by some action, BEQ will
succeed (jump to new code, in other words). If the zero flag is clear,
the branch will fail and the cpu will continue processing like the
opcode wasn't there.
This instruction is
useful for seeing if a variable is zero or not (duh). It's useful for
things like joypad testing, where values of zero mean nothing is
happening:
$004218 = 00h ; these memory locations are SNES registers - not like normal memory
$004219 = 00h
M flag = 0 LDA $004218 ; $004218 returns the status of player 1's joypad
BEQ NoAction ; if no buttons have been pressed, don't do any joypad processing
; insert joypad response code here ;
NoAction ; code label
RTS ; return to calling code
If
any of the bits in A had become set, the BEQ would have failed and
joypad processing would have occurred. As it happens, no buttons had
been pressed so the joypad processing was skipped altogether.
The
companion to BEQ, this opcode will branch if the zero flag is cleared.
This is useful for the same reasons as BEQ - it's pretty much a
personal decision which to use (though sometimes one makes more sense
than the other does). BNE is especially useful for loops, where a value
is continuously being counted down.
DB = 7Eh
M flag = 1
X flag = 1 LDA #$00 ; set up the A, X and Y registers
LDX #$08
LDY #$00
Repeat ; another label - all these do to your code is make it more readable
STA $8000,y ; store 00h at $7E8000+Y
INY ; add 1 to Y
DEX ; subtract 1 from X
BNE Repeat ; if X hasn't reached 0 yet, loop back to Repeat
The
loop at Repeat will cycle through 8 times, storing 8 copies of 00h at
$7E8000. It is a good example of what the index registers are designed
for - Y is indexing the storing of values, and X is counting down the
loop.
As
this instruction tests the setting of the N flag, it is useful for both
detecting negative numbers and quickly testing the high bit of a
variable. If you decide to use 7 bit values for your text, with the
sign bit denoting a special action or substring, you could have code
like the following:
M flag = 1 LDA $118400,x ; load a byte from $118400+X - N flag is set if it's negative
BMI Special ; if the highest bit is set, jump to the label Special
; normal text code ;
Special ; code label
; special char code ;
Hopefully
you understand the concept of branching now - if the condition for the
branch is met (in this case, if the N flag is set), the cpu jumps to
wherever the branch points. If the condition fails, the cpu continues
on to the next instruction following the branch.
This
instruction is the opposite of BMI - it branches if the N flag is clear
(the last action gave a positive result). This is useful for seeing if
the high bit is clear, so it lends itself to waiting for the snes to
reach it's VBlank. The VBlank is the period when you can safely update
the on-screen graphics, as the snes has finished drawing a frame and is
waiting for the electron gun to get back to the top of the screen.
M flag = 1TestVBL
LDA $004210 ; the high bit of this register is set if the VBlank period has been reached
BPL TestVBL ; keep loading $004210 until the high bit is set
In
this very common loop, the PPU register $4210 is continuously tested to
see if it's high bit is set - at which point you can safely update
vram/sprites.
Contrary
to the previous conditional branch operations, this opcode forces the
cpu to jump without testing any of the P register's flags - hence the
name. This is useful for cleaning up after other branches, as sometimes
you want your code to continue past another, conditional section:
M flag = 1 LDA $7E9011 ; load a variable
BMI SomeCode ; if the high bit's set, jump to SomeCode
; if the high bit was clear, this code is executed:
; insert unimportant code ;
BRA CleanUp ; now we jump to CleanUp
SomeCode
; more unimportant code ;
CleanUp ; whether or not the BMI was successful, this code is run
; code that was required either way ;
RTS
If
the BRA statement wasn't in there, as soon as the code following the
BMI was run, the cpu would have continued on to run whatever is at
SomeCode, which is often not desirable. The BRA statement lets us
bypass SomeCode and go straight to CleanUp.
I've
never actually had to use this instruction, or it's mirror image BVS,
so it's not too easy to think up an example. Basically it just branches
if the V flag is clear, which can be done by a myriad of actions.
See BVC.
This
instruction is exactly the same as BRA, only you can branch further.
If you remember the addressing modes (as you surely do :) all the branch
instructions have a 1 byte signed operand, letting them jump a maximum
of 128 bytes backwards or 127 bytes forwards in your code. The BRL (and
PER) instruction allows a 2 byte operand letting you jump 32768 bytes
backwards or 32767 byte forwards. Although that makes it almost
identical in functionality to the JMP instruction, remember the operand is relative
to the current location, so there's nothing stopping you copying your
code in a hex editor, pasting it somewhere else and still having it run
properly - something a JMP instruction would merrily crash.
Whether
you use BRA or BRL in your code pretty much depends on what kind of
errors you get - if you assemble your code and get "error - branch out
of range" all over the place, you'll need to either optimize your code
or stick in a few BRL's here and there.
As
described in the previous document, these block move instructions use
the A, X and Y registers to move data from somewhere to somewhere else.
They have a few uses, but dma is generally used instead.
M flag = 0
X flag = 0 LDA #$007F ; transfer 80h bytes (A+1)
LDX #$8000 ; from $7E8000
LDY #$8001 ; to $7E8001
MVN $7E, $7E
A
is assigned the number of bytes minus 1 to transfer, X the starting
word address and Y the destination word address. The operand then
specifies the source bank and destination bank. After the move, A will
equal FFFFh, X will be whatever it started at + A + 1, and Y will also
be it's initial value + A + 1.
The
setting of the M flag is completely ignored by the MVN/MVP instructions.
If the X flag is set to 1, however, these instructions assume the high
bytes of X and Y are 00h.
MVN copies bytes forwards in memory, starting at X -> Y, then X+1 -> Y+1, then X+2 -> Y+2 etc.
No flags are affected by MVN/MVP.
Addressing Mode
axy
| Syntax
MVN $7E,$7F
| Bytes
3
|
Similar
to MVN, but X and Y are decremented instead of incremented. The block
move starts with X -> Y, then X-1 -> Y-1, then X-2, Y-2 etc. until
A passes through 0. Only really useful for zeroing ram in front of the
stack, and other trivial matters.
Addressing Mode
axy
| Syntax
MVP $90,$7E
| Bytes
3
|
This
handy instruction stores a zero in the memory location you specify. If
the M flag is set to 0, 2 bytes are zeroed, compared to 1 byte zeroed
if M = 1.
M flag = 1
X flag = 0
DB = 7Eh LDX #$0800
LDY #$0000
Repeat
STZ $1000,y ; store 00h at $7E1000+Y
INY
DEX ; repeat this loop 800h times
BNE Repeat
This
code will store 00h in the first 800h bytes at $7E1000. Not the
fastest way to zero memory, but effective and readable nonetheless.
None of the flags in the P register are affected by STZ.
Addressing Mode
ab
ab,x
d
d,x
| Syntax
STZ $8000
STZ $1000,X
STZ $80
STZ $50,X
| Bytes
3
3
2
2
|
This
instruction doesn't do anything to the DB register, it's just an
operand-free way to swap the high and low bytes of A. The need to swap
the low and high bytes of A (known as A and B for this instruction
alone) pops up every now and then, so it's worth knowing about. When
the M flag is set to 1, it's useful to store a temporary byte variable
with XBA (the high byte of A will be otherwise untouchable, much like
pushing it onto the stack). Will wonders never cease.
Another
name for the A register stems from this instruction - 'C' denotes A as
being 2 bytes (A = 1 byte, B = 1 byte, C = 2 bytes). A being called C
rears its ugly head in the register transfer instructions (TCD instead of TAD).
Flags:
N (Negative) LDA #$0080 ; let's assume the M flag is 0 for now
XBA ; A is now 8000h, presumed negative Z (Zero) LDA #$00FF ; even if the M flag is set to 0,
XBA ; the zero flag is set if A's low byte becomes zero
The
Compare instructions, along with the branching ones, are the most
fundamental ways to perform if..else analysis. In all its simple glory,
you load a value into a register, 'compare' it with another value, then
branch somewhere depending on the result.
The compare instructions actually simulate the SBC command in every way shape and form, EXCEPT that the register in question is never
altered. Comparing does NOT alter the overflow flag, but the N, Z and C
flags are all altered by the same conditions as SBC. That is, SBC #$50
and CMP #$50 would set the same flags as each other (excluding V), but
the CMP wouldn't alter the A register as SBC would. And, of course, as
CMP focuses on the A register, the number of bytes you fetch from the
operand, and the number of bytes you compare against in A, are dependent
on the M flag.
The number of uses for
CMP means there's no 'ultimate' example that will display every known
use for the instruction, but here is a routine use for it:
D = 0000h
M flag = 1
$000080 = 03h LDA $80 ; A = 03h
CMP #$01 ; 03h - 01h = 02h -> Zero flag is cleared (result not zero)
BEQ Code01 ; BEQ fails because Z = 0
CMP #$02 ; 03h - 02h = 01h -> Z = 0
BEQ Code02 ; branch fails
CMP #$03 ; 03h - 03h = 00h -> Z = 1
BEQ Code03 ; BEQ succeeds - cpu jumps to Code03 (wherever that is)
BRA Normal ; if none of the previous BEQ's worked, BRA to Normal
Throughout
that series of CMP instructions, the value of A remained constant.
There was actually nothing stopping you replacing all the CMP #$xx
instructions with LDA $80, SBC #$xx, as the correct code would have
eventually be found.
Flags:
N (Negative) LDA #$1000
CMP #$2000 ; 1000h - 2000h = F000h -> most significant bit is set so N flag is set
BMI SomeCode Z (Zero) LDA #$1000
CMP #$2000 ; 1000h - 2000h = F000h -> result not zero, so Z flag cleared
BNE SomeCode
C (Carry) LDA #$1000
CMP #$2000 ; 1000h - 2000h = F000h -> carry was required (A < operand), so C = 0
BCC SomeCode
The
Carry flag is an interesting one for the compare instructions - the
previous setting of C is obliterated after a compare is executed. That
is, it wouldn't matter if you put a CLC or SEC before a compare
instruction, the result would be the same.
Addressing Mode
#
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
CMP #$87
CMP $8000
CMP $8000,X
CMP $8000,Y
CMP $7E3000
CMP $7E3000,X
CMP $03
CMP $03,X
CMP ($06)
CMP ($06),Y
CMP [$09]
CMP [$09],Y
CMP ($0C,X)
CMP $01,S
CMP ($01,S),Y
| Bytes
2*
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
This
instruction functions identically to CMP, though obviously it compares
the operand against X instead of A. The number of addressing modes has
been drastically reduced, though CPX is rarely used for anything but
Immediate addressing.
The number of bytes X is compared against (and taken into account by the subtraction) is governed by the X flag.
DB = 00h
M flag = 1
X flag = 0 LDX #$0000
Repeat
STZ $0000,x
INX
CPX #$1F00 ; once X reaches 1F00h, set the zero flag
BNE Repeat ; keeps looping until X reaches 1F00h (zeroes 1F00h bytes)
Flags:
N (Negative) LDX #$1000
CPX #$2000 ; 1000h - 2000h = F000h -> most significant bit is set so N flag is set
BMI SomeCode
Z (Zero) LDX #$1000
CPX #$1000 ; 1000h - 1000h = 0000h -> result zero, so Z flag set
BEQ SomeCode
C (Carry) LDX #$6000
CPX #$5000 ; 6000h - 5000h = 1000h -> carry was NOT required (A >= operand), so C = 1
BCC SomeCode
Addressing Mode
#
ab
d
| Syntax
CPX #$89
CPX $1200
CPX $12
| Bytes
2*
3
2
|
* Operand is 1 byte when X flag = 1, 2 bytes if X is 0
This
instruction is the same as CPX in every way - even addressing modes -
except for the fact that it focuses on the Y register. The number of
bytes Y is compared against (and taken into account by the subtraction)
is governed by the Y flag.
For examples/flags/addressing modes, see CPX.
This
single-byte instruction simply eats up clock cycles - 2 to be exact.
It doesn't alter any flags or any registers at all - just takes 2 clock
cycles to run. NOP is most useful for time-sensitive hardware-related
issues, such as multiplication. In the world of snes hardware
multiplication/division (there is none built into the 65816, so nintendo
added several registers capable of these functions), you have to wait
15 or so clock cycles after you store the values to be computed, so a common way to waste that time is with NOP.
In
terms of hacking, though, NOP is a useful way to clear out unwanted
checksum calculation, copy-protection routines or other unwanted code.
These
3 instructions, BRK (Break), COP (Coprocessor) and STP (Stop) are
completely and utterly useless in the SNES universe - they are simply
remnants carried over from the fact that the 65816 was actually used in
real computers, computers that needed these extra interrupts.
If you really want to learn about these instructions, consult the all-knowing, all-seeing EPR.
This
handy little instruction clears the Carry flag of the P register.
Useful for setting up addition, and not a heck of a lot else.
This clears the Decimal flag, thus leaving the snes in the good wholesome state of hexadecimal arithmetic.
By
clearing the Interrupt Disable flag in the P register, you allow
interrupts to take control of the CPU when they are triggered. More
specifically, you cause the cpu to jump to the NMI vector every time you
reach the Vertical Blank (scanline 224 in NTSC mode), as well as
jumping to the IRQ vector if you enabled the Horizontal or Vertical
Interrupts.
The actual usage of interrupts is a bit complex to explain here, and will be covered later.
This
clears the Overflow flag, which is only ever much use if you're
attempting signed addition/subtraction (remember you trigger signed
overflows when adds/subs overflow the high bit of A).
Setting
the Carry flag is always advised before a SBC instruction, and apart
from that it's also useful for the ROR/ROL instructions to move a 1 into
a variable's top or bottom bit.
Setting
the Decimal Flag to 1 invokes the 65816's decimal mode, where any loads
into registers convert the regular hex number to the bastard child of
decimal and hex (0100h = 0256h in decimal mode).
There
are some remote uses for decimal mode, such as printing a decimal
number on the screen (just store a variable, invoke decimal mode, load
the variable, and it's already converted), but not much else.
By
setting the Interrupt Disable flag you turn off any interrupts the snes
tries to conjure up. That means you have complete control over the cpu
for as long as you want, without having to fear an interrupt appearing
and destroying the stack. This instruction is almost always executed at
the very beginning of most snes games, as initialization routines don't
really need to know (or care) when you're entering VBlank.
The
AND instruction should be immediately familiar to any C/C++ programmer -
it simply performs a bitwise AND with A and the operand, storing the
result in A ( A &= operand ). If you're not a C/C++ programmer (god
help you), the AND instruction looks at the bits in A and the bits in
the operand, then stores in A only the bits that were set in both:
A
0
1
0
1
| Operand
0
0
1
1
| Result
0
0
0
1
|
| A:
Operand:
Result:
| | 11011110 11011110
00011100 11000111
00011100 11000110
|
A:
Operand:
Result:
| | 10001100 10001100
00110011 00110011
00000000 00000000
|
|
AND is a useful way to isolate certain parts of a value - AND #$0F will leave the low nibble of a variable in A for example.
And, of course, the number of bytes you AND with depends on the M flag.
$000080 = 45h
M flag = 1
D = 0000h LDA $80 ; A = 45h
AND #$40 ; A = 40h (01000101b & 01000000b)
BNE Bit6Set ; if a bit remains set, branch (zero flag clear)
Here,
we AND a variable in A with 40h, which will leave either bit 6 set or
all bits clear. Testing individual bits is a typical use for AND.
Flags:
Z (Zero) LDA #$FF ; all bits set
AND #$00 ; all bits clear -> Z flag set (11111111b & 00000000b) N (Negative) LDA #$FF ; all bits set -> N flag set
AND #$7F ; high bit cleared -> N flag cleared
Addressing Mode
#
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
AND #$0F
AND $9000
AND $9000,X
AND $9000,Y
AND $819000
AND $819000,X
AND $03
AND $03,X
AND ($06)
AND ($06),Y AND [$F0]
AND [$F0],Y
AND ($70,X)
AND $03,S
AND ($03,S),Y
| Bytes
2*
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
As
with AND, this bitwise operator works the same as the C/C++ equivalent (
A |= operand ). For those not up to date on boolean logic, when you OR
two numbers, any bits that were set in either number are set in the
result. Here are the same example numbers as AND:
A
0
1
0
1
| Operand
0
0
1
1
| Result
0
1
1
1
|
| A:
Operand:
Result:
| | 11011110 11011110
00011100 11000111
11011110 11011111
|
A:
Operand:
Result:
| | 10001100 10001100
00110011 00110011
10111111 10111111
|
|
ORA
can be used to combine two variables (read joypad 1 and 2 at the same
time, for example), as well as more advanced functions such as
overlapping font tiles (variable width fonts, in other words). Also,
the number of bytes computed depends on the M flag.
M flag = 0 LDA $004218 ; load player 1's joypad information (2 bytes)
ORA $00421A ; combine with player 2's joypad information (2 bytes)
That
piece of code will let the game read joypad information whether the
player is using joypad 1, 2 or both at once. Final Fantasy II is an
example of this. ORA is also useful for setting individual bits in A,
as ORA #$80 will set the negative bit of A, for instance.
Flags:
Z (Zero) LDA #$00 ; zero flag set
ORA #$FF ; zero flag cleared - A = FFh N (Negative) LDA #$01
ORA #$80 ; sets high bit in A -> N flag is set
Addressing Mode
#
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
ORA #$80
ORA $1000
ORA $1000,X
ORA $1000,Y
ORA $7E9000
ORA $7E9000,X
ORA $43
ORA $43,X
ORA ($46)
ORA ($46),Y
ORA [$90]
ORA [$90],Y
ORA ($00,X)
ORA $01,S
ORA ($01,S),Y
| Bytes
2*
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
In-keeping
with bitwise operators, EOR performs an exclusive or between A and the
operand ( A ^= operand ). When you exclusively OR on the 65816, if a
bit of the operand is set, the corresponding bit in A is flipped - 0
becomes 1, 1 becomes 0. If a bit in the operand is clear, the
corresponding bit in A is left untouched.
A
0
1
0
1
| Operand
0
0
1
1
| Result
0
1
1
0
|
| A:
Operand:
Result:
| | 11011110 11011110
00011100 11000111
11000010 00011001
|
A:
Operand:
Result:
| | 10001100 10001100
00110011 00110011
10111111 10111111
|
|
This
instruction is mostly used to get the twos-complement of a variable for
the addition of negative values. The number of bytes EOR affects
depends on the M flag.
M flag = 0 LDA $80 ; contains number of bytes to go backwards
EOR #$FFFF ; this and the INC perform 2's complement
INC
CLC
ADC $82 ; subtract offset from $82
That
somewhat cryptic code will make sense to people familiar with binary
math, but not many others. EOR does have it's uses, though since this
is an assembly document, not a disection of algorithms, it won't be
discussed here.
Flags:
Z (Zero) LDA #$FF ; zero flag cleared
ORA #$FF ; all bits in A are flipped -> A = 00h and Z flag is set N (Negative) LDA #$01 ; N flag is cleared
EOR #$80 ; flips high bit in A -> N flag is set in this case
Addressing Mode
#
ab
ab,x
ab,y
abl
abl,x
d
d,x
(d)
(d),y
[d]
[d],y
(d,x)
d,s
(d,s),y
| Syntax
EOR #$FF
EOR $1E00
EOR $1E00,X
EOR $1E00,Y
EOR $C01000
EOR $C01000,X
EOR $04
EOR $04,X EOR ($07)
EOR ($07),Y
EOR [$09]
EOR [$09],Y
EOR ($0A,X)
EOR $01,S
EOR ($01,S),Y
| Bytes
2*
3
3
3
4
4
2
2
2
2
2
2
2
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
BIT
is a useful instruction for testing variables without actually altering
any registers - useful for times when you would normally use AND to
test bits, but you don't want to corrupt whatever's in A. BIT operates
in two 'modes' if you will - one where the operand is immediate ( #$????
) and one where it's any other addressing mode.
When
using immediate addressing, the operand is ANDed with A and the Z flag
altered depending on the result of the AND. The A register isn't
actually altered, however. This way, you could say ' BIT #$20 ' to test
bit 5 of A, without trashing all the other bits.
When
using any other addressing, the operand is ANDed with the A register
(again without actually altering A), but the N and V flags are set to
the highest and second-highest bits of the result, as well as the Z flag
being set/cleared.
Confusing at first,
but this instruction is worth understanding. As is expected, the number
of bytes affected depends on the M flag.
M flag = 1 LDA $90 ; load a variable
BIT #$10 ; is bit 4 set?
BNE Bit4Set ; A is still intact at this point
In
this bit of code, the byte at $90 is AND'ed with #$10 - but the result
is -NOT- stored anywhere. If the result of the AND was non-zero, bit 4
of $90 must have been set so the branch succeeds.
Flags $80 = 7Fh in these examples:
Z (Zero) LDA $80 ; zero flag cleared
BIT #$0F ; if any of the last four bits are set, Z = 0 N (Negative) LDA #$80 ; N flag is set
BIT $80 ; 80h & 7Fh = 00h -> high bit of result is moved into N
V (Overflow) LDA #$C0 ; bits 6 & 7 of A are set
BIT $80 ; C0h & 7Fh = 40h -> 6th bit of result is moved into V
Addressing Mode
#
ab
ab,x
d
d,x
| Syntax
BIT #$C0
BIT $1000
BIT $1000,X
BIT $04
BIT $04,X
| Bytes
2*
3
3
2
2
|
* Operand is 1 byte when M flag = 1, 2 bytes if M is 0
This
instruction is basically a shortcut for OR'ing two numbers and storing
the result. Firstly, it logically OR's the operand with the A register,
then stores the result at the operand. That just saves you having to
execute ORA followed by STA. The number of bytes affected depends on
the M flag.
One quirk of TSB is in it's
setting of the Z flag. Instead of setting the flag if A OR'ed with the
operand is zero, it sets it if A AND'ed with the operand is zero. In
this respect, it sets the Z flag under the same conditions as BIT would.
M flag = 0 LDA #$8000 ; A = 8000h (negative)
TSB $80 ; $80 = A | $80
LDA #$8000 ; this sets $80 the same as the code above
ORA $80
STA $80
In
that piece of code, the value at $80 (whatever it is) has it's highest
bit set. In the second bit of code using ORA, the A register would have
been changed from 8000h to whatever A | $80 was. The TSB command does
not alter A under any circumstances.
Flags
Z (Zero) LDA #$80 ; zero flag cleared
TSB $55 ; Z flag is set if A AND'ed with operand is zero
Addressing Mode
ab
d
| Syntax
TSB $C000
TSB $04
| Bytes
3
2
|
TRB
is similar to TSB in that it replaces another common action - zeroing
certain bits in a memory location. Quite simply, TRB zeroes any bits at
a memory location that are set in A. So, if bit 7 of A is set, bit 7
of the memory location will be cleared. The actual logic behind this
instruction is to get the compliment of A (flip every bit), AND it with
the memory location, then store the result at that memory location. It
is identical to performing EOR #$FF, AND memory, STA memory, though A is
not altered. It's used to clear certain bits in a memory location, not
surprisingly. The M flag affects the number of bytes computed.
M flag = 1 LDA #$80 ; high bit set
TRB $80 ; high bit of $80 will now be cleared
To
figure this out manually, first flip all the bits in A, giving you 7Fh
(80h = 10000000b, 01111111b = 7Fh). Then AND this with the memory
location, and it's obvious the high bit will be zeroed and the rest
untouched, regardless of what is already at that memory location.
Flags
Z (Zero) LDA #$FF ; zero flag cleared - this will reset all bits at a memory location
TRB $55 ; Z flag is set if compliment of A AND'ed with operand is zero
Addressing Mode
ab
d
| Syntax
TRB $C000
TRB $04
| Bytes
3
2
|
INC
is an extremely common, and extremely simple, instruction. It simply
adds 1 to the operand, be it the A register or a memory location. The
number of bytes the increment affects is 1 or 2, depending on the M
flag, but with the flags set by INC you could easily expand it to a 4
byte counter if needs be. Alsok, it has become a common feature in
assemblers that having INC by itself with no operand corresponds to INC
A. The number of bytes incremented (in A or a memory location) is
affected by the M flag.
M flag = 0 LDA $90 ; A = ?
CLC
ADC #$8000 ; add #$8000 to $90
STA $90
BCC NoCarry ; if the result's less than 10000h, fall through
INC $92 ; if a carry occurred, inc the 3rd and 4th bytes of $90 (32 bit counter)
NoCarry
RTS
Here's
your basic 32-bit counter code - if the answer's too big for $90,
increment $92 ($90 = low, $91 = high, $93 = higher, $94 = highest).
Flags:
Z (Zero) LDA #$00 ; zero flag set
INC ; A = 01h, zero flag cleared N (Negative) LDA #$7F ; N flag is clear
INC ; A = 80h, N flag is set
Addressing Mode
A
ab
ab,x
d
d,x
| Syntax
INC A
INC $1100
INC $1100,X
INC $20
INC $20,x
| Bytes
1
3
3
2
2
|
INX
is ridiculously simple - it adds 1 to the X register. That's about it,
really. Whether 1 or 2 bytes in X are affected depends on the X flag's
setting.
M flag = 1
X flag = 0 LDX #$0000
LDY #$1000
Repeat
STZ $2000,x ; zero 1000h bytes at $2000
INX
DEY
BNE Repeat
Flags:
Z (Zero) LDX #$FFFF ; zero flag cleared
INX ; X = 000h, zero flag set
N (Negative) LDX #$7FFF ; N flag is clear
INX ; X = 8000h, N flag is set
See INX - this instruction works exactly the same.
The
opposite to INC, this instruction subtracts 1 from either A or a memory
address. Much the same as INC, has too many uses to bother listing.
The M flag controls how many bytes are affected by the decrement.
M flag = 0WasteTime
LDA #$0800 ; a surprising number of squaresoft games do this
DEC
BNE WasteTime
Flags:
Z (Zero) LDA #$00 ; zero flag set
DEC ; A = FFh, zero flag cleared
N (Negative) LDA #$00 ; N flag is clear
INC ; A = FFh, N flag is set
Addressing Mode
A
ab
ab,x
d
d,x
| Syntax
DEC A
DEC $1000
DEC $1000,X
DEC $00
DEC $00,X
| Bytes
1
3
3
2
2
|
Decrement X does exactly what you think it should - subtracts 1 from X. The number of bytes in X affected depend on the X flag.
M flag = 0
X flag = 0
DB = 00h LDX #$1000
ZeroVRAM
STZ $2118 ; write 0000h into the SNES video ram ($2118 is another register)
DEX
BNE ZeroVRAM
Flags:
Z (Zero) LDX #$0001 ; zero flag clear
DEX ; X = 0000h, zero flag set
N (Negative) LDX #$0000 ; N flag is clear
DEX ; X = FFFFh, N flag is set
See DEX.
TAX
is the first of the register tranfser instructions - operand free ways
to copy one register's contents to another without resorting to
push/pull instructions. There are a number of reasons you'd want to use
these instructions - they're fast (2 cycles versus 7 for push/pull),
can help avoid messy use of the stack, and can supplement instructions
with few addressing modes. By supplementing, I mean you could use LDA's
absolute long addressing to fetch a value, then TAX it to X, getting
around the fact that LDX can't use absolute long addressing.
One
interesting quirk about the transfer instructions is how many bytes
they copy - what if the M flag is 1 but the X flag is 0? What about the
other way around? To deal with this, each of the transfer functions
have a rule governing how many bytes to copy. In the case of TAX:
The number of bytes transferred is the current width of X, as in the set with the X flag
So,
whenever the X flag is 0, the full 2 bytes of A are transferred across.
When X = 1, only the low byte of A is transferred to the low byte of
X.
M flag = 0
X flag = 0 LDA [$03],y ; LDX doesn't have the [d],y addressing mode
TAX ; transfer the 2 bytes loaded to X
Flags:
Z (Zero) LDA #$0000 ; zero flag set
LDX #$0001 ; zero flag clear
TAX ; zero flag set
N (Negative) LDA #$8000 ; N flag set
LDX #$0000 ; N flag clear
TAX ; N flag set
Works exactly the same as TAX (follows the same rule and all), but copies A's contents to Y.
The number of bytes transferred is controlled by the setting of the X flag
This
instruction calls the A register C, and for good reason. As 'C'
denotes the high and low bytes of A, it means that 2 bytes are always
transferred, regardless of the M flag's setting.
TCD
is useful for setting up a new direct page somewhere other than 0000h.
This has been used in commercial games to allocate temporary memory and
allow a low-level implementation of threads. For example, most games
set their D register to something different when talking to the SPC, so
1-byte operands can be used where 2 would be required normally.
2 bytes are always transferred by TCD
Flags are affected in the same way as TAX
This
instruction allows you to change where the stack register points. This
is helpful for initializing the snes as the stack defaults to the
position 01FFh, which doesn't leave much room for allocating memory and
such. As 'C' is used in the instruction mnemonic, 2 bytes are always
transferred.
2 bytes are always transferred by TCS
No flags are affected by TCS
Again,
since C is in the mnemonic, it means 2 bytes are always transferred.
TDC is a useful way to zero A (LDA #$0000), as the D register is
normally set to 0000h. There are times it isn't, though, which can
cause huge problems if you're expecting 0000h instead of 1E00h.
2 bytes are always transferred by TDC
Flags are affected in the same way as TAX.
As
the name implies, this instruction transfers the 2 bytes in the stack
register to the A register, useful for allocating memory in front of the
stack. Flags are affected in the same way as TAX.
2 bytes are always transferred by TSC
M flag = 0 TSC ; A = S
SEC
SBC #$0F ; A holds the address 10h bytes in front of the stack
TCD ; $00 would now access the memory 0Fh bytes in front of the stack
Allocating
memory in this fashion can get both extremely messy and complex, but as
it is extremely useful in 65816 coding it will be covered in the next
section.
This
instruction is almost the same as TSC, but in this case the transfer
isn't always 2 bytes in size. If the X flag is set to 1, only the
stack's low byte is moved to X's low byte. Otherwise, 2 bytes are
transferred. Apart from that, the P register's flags are affected
identically to TAX.
The number of bytes transferred is governed by the X flag
The opposite to TAX, this transfers the contents of X to A. Flags affected are the same as TAX.
The M flag governs the number of bytes transferred. If the X flag = 1, the high byte of the transfer will be 00h
TXS
is used for the same reasons as TCS - to set the stack to wherever you
want it. As 2 bytes are always transferred, if X is 1 byte wide the
high byte transferred will be 00h. Flags are affected identically to
TAX.
2 bytes are always transferred. If the X flag = 1, the high byte of the transfer will be 00h
TXY
is useful for addressing modes where only Y can be used, such as (d),y
and [d],y. The number of bytes transferred depends on the X flag.
Flags are affected identically to TAX.
The number of bytes transferred is dictated by the X flag
This instruction is identical to TXA in every way, except Y is transferred instead of X.
See TXY, but with the two index registers reversed.
JMP
allows you to jump to any bit of code inside the current bank, as
dictated by PB. Although sounding similar to BRL, it's extra addressing
modes set it apart, allowing complex structures like indirect jump
tables to be created simply and easily. JMP doesn't record the current
PC address in the stack like JSR does, so you can't simply return from
whatever code you jump to. This problem can be solved with the PER
instruction, however, or simply with JSR's own indirect indexed
capability.
Indirect jumps are a favorite
of many games that use IRQ interrupts - you point the interrupt to a
RAM location so you can continually change where the IRQ jumps to.
M flag = 1
X flag = 1 LDA $40 ; load a variable
ASL ; double it
TAX ; use X as the index to..
JMP ($9000,x) ; a jump table
In
this bit of code, we load up whatever is in $40, double it (as each
member of a jump table requires 2 bytes), transfer it to X, then jump
'through' that location. To expand on that, $9000 contains a number of 2
byte addresses to code in the current PB bank. By adding (variable+2)
to the base location of $9000, we jump to the piece of code
corresponding to that variable. Jump tables are used in almost every
snes game ever made, often as a quick replacement huge CMP, BEQ
statements.
No flags are altered by JMP.
Addressing Mode
ab
(ab)
(ab,x)
| Syntax
JMP $99A0
JMP ($1000)
JMP ($1000,X)
| Bytes
3
3
3
|
JML
is an extention to JMP that allows the PB register to be altered as
well - letting you jump to any code in the 65816's full 24-bit address
space. It doesn't have the (ab,x) addressing mode of JMP, but it's full
24-bit range can be very useful. No flags are altered by JML
Adressing Mode
abl
(ab)
| Syntax
JML $C00000
JML ($0200)
| Bytes
4
3
|
JSR
is an extremely useful instruction. It firstly pushes the 16-bit
address of the following instruction onto the stack, then jumps to the
code pointed to by the operand. When a RTS instruction is encountered,
the address that was pushed onto the stack is pulled into PC, thus
returning the cpu to the code just following the JSR. In simple terms,
it lets you jump to some code, process something, execute RTS, and
return to where the JSR was called. Same principle as a function call
in C/C++. As with JMP and JML, no flags are altered.
X Flag = 1 LDX #60 ; X = 60 decimal
Waiter
JSR Wait ; CPU jumps to Wait, pushes the address of DEX onto the stack
DEX ; count down X
BNE Waiter
Wait
WAI ; waits for an interrupt to occur
RTS ; once an interrupt hits, return to wherever the JSR came from
This
code actually does something - wastes 1 second of cpu time to be
specific. First of all, X is set to 60, which is the number of times
the NMI interrupt hits every second. Then, we JSR to Wait, which puts
the 65816 into a power-down state until an interrupt hits. Once the
interrupt has been triggered, the RTS pulls the address of DEX back off
the stack and into the PC register, forcing the cpu to continue
processing at DEX.
Addressing Mode
ab
(ab,x)
| Syntax
JSR $9000
JSR ($9000,X)
| Bytes
3
3
|
JSL
is an extension of JSR Absolute, allowing full 24-bit jumps to anywhere
in the 65816 address space. As such, it pushes 3 bytes onto the stack
to record the next instruction, and requires a RTL instruction to
return, instead of RTS.
Addressing Mode
abl
| Syntax
JSL $ED4000
| Bytes
4
|
This
is the companion instruction to JSR, allowing you to return to whatever
code called it. Though it sounds fun and wholesome, all RTS does is
pull two bytes off the stack and put them into the PC register.
Unfortunately, if you've been pushing and pulling a lot of values in
your subroutine, a misplaced RTS will simply pull off of the stack
whatever the last thing was that you pushed on.
Whenever
you're using subroutines, you have to make sure all your push actions
have been pulled off at some point, or a RTS will direct the cpu to
who-knows-where.
As
was the case with RTS/JSR, RTL allows you to return from a JSL
instruction back to the original code. RTL pulls 2 bytes off the stack
and into PC, then a third byte into PB. As always, caution must be
taken to make sure the stack has had everything pulled back off that was
pushed after the JSL, or horrid things will happen.
RTI
is yet another return-from-subroutine type instruction, but specially
tailored for interrupts. When an interrupt hits, it immediately causes
the 65816 to jump to the appropriate vector. Before this jump, the
interrupt calling routine will push the 3 bytes of the next instruction
onto the stack, followed by the P register (equivalent of JSL, PHP).
Because of this extra byte being pushed, RTI is designed to perform a
PLP instruction, then dump 3 more bytes from the stack into PC and PB
registers. Despite RTI automatically preserving the P register for you,
almost all commercial games decide to PHP immediately inside the
interrupt code anyway. Go figure, I guess.
The
name of this instruction is quite misleading - it actually pushes the
2-byte operand straight into the stack. A name of Push Effective
Immediate would have made more sense. It's useful for times when you
want to get a certain value straight into the stack, without first
having to load it into a register then pushing it.
The number of bytes pushed is always 2, regardless of any setting in the P flag. No flags are affected.
M flag = 1 LDA #$FF
PHA
LDA #$00
PHA
PEA $00FF
The
two pieces of code above perform the same action, though in the first
bit A is zeroed, which may not be desired. The syntax for PEA is
interesting - it appears to mean "push onto the stack the two bytes at
$00FF", however this is definitely NOT the case. It simply means "push
FFh then 00h onto the stack".
Addressing Mode
#
| Syntax
PEA $5050
| Bytes
3
|
This
friendly looking instruction pushes 2 bytes onto the stack through a
location on the D-Page. Useful for the same reasons as PEA. Two bytes
are always pushed by PEI, and no flags are altered.
M flag = 1 LDA ($00)
PHA
LDA ($01)
PHA
PEI ($00)
Both the above pieces of code give the same end result - the 2 bytes at ($00) are pushed onto the stack.
Addressing Mode
(d)
| Syntax
PEI ($01)
| Bytes
2
|
PER
is another of the load-directly-into-the-stack instructions, this time
pushing the 2 byte result of (PC + Operand + 2). I can think of few
uses for this off the top of my head.
As
is implied by the name, this instruction allows you to set/clear the
hidden emulation flag of the 65816. At some point, all snes games are
going to execute the famous CLC, XCE sequence to get the cpu out of
emulation mode, which it thoughtfully starts up in.
After
the XCE has been executed, the Carry flag is assigned the previous
value of the Emulation bit. In the case of a CLC, XCE at startup, the
Carry flag would be set to 1 afterwards. This has all kinds of uses if
you're writing an NES emulator and want to know what mode you're in (?).
As
demonstrated earlier, executing this instruction makes the 65816 idle
until an interrupt hits. If interrupts are disabled (I flag = 1), the
instruction simulates a NOP and continues to the next instruction.
Congrats!
Now
that you're completely fluent with the opcodes (hohoho), it's time to
use them in some meaningful code. The next section covers assembly
techniques and common, useful bits of code for any hacking work - Coding in 65816 Assemblers.
Follow Ups:
Post a Followup