gee, imagine me stuffing it up :P (this one's wrapped properly)


| Follow Ups | Post Followup | CTC's Advanced Rom Hacking and Translation Underground |

Posted by TheGun ( | 202.12.144.19) on June 25, 1999 at 02:33:21:

In Reply to: asm3.htm posted by TheGun on June 25, 1999 at 02:31:19:


65816 Opcodes

Covered in this Document:


































This document covers the opcodes native to the 65816 cpu. If you think you can skim over this section once or twice, then come back for reference when you need it, you won't get very far at all.

Even though some opcodes are more important (read common) than others, you are advised to study this entire listing many times over. Not having a thorough understanding of opcodes when you are reading or writing code will cause problems. Granted there's a lot of information here, but assembly is as much the ability to memorize and regurgitate (think typical highschool busy-work) as any other talent.


Opcode Structure

Until now, the use of the word opcode and instruction has been fairly interchangeable. From now on, the correct terminology will be used - an instruction is a category of opcodes (such as LDA), but an opcode is a single instruction bound to one addressing mode (LDA Absolute). In other words, ADD is an instruction as the addressing mode is not specified, but ADD #1234 is an opcode with operand. No two opcodes are the same - each addressing mode of each instruction has it's own hex code specifying such. For example:


LDA #$1234 ; in a hex editor, this would appear as A9 34 12
LDA $1234 ; in a hex editor, this would appear as AD 34 12

As you should see, the two operations above use the same instruction (LDA), but appear differently in a hex editor - LDA Immediate being A9h and LDA Absolute being ADh. A9h and ADh are two different opcodes, but share the same instruction.


The Opcodes

This section will have the following format:


Instruction

Description

Examples

Flags Affected


Addressing Mode

#

ab

abl

[d],y

Syntax

XXX #$9000

XXX $0780

XXX $089044

XXX [$0A],y

# of Bytes

3*

3

4

2

* Exceptions


This borrows greatly from other documents, residing at 6502.txt and *****, whose authors I am unsure of. Whoever created them, I would like to say now that their efforts are greatly appreciated.

The instructions are grouped so that similar opcodes follow each other. It's a bit more logical than alphabetical sorting.

Of great importance is the Addressing Mode table. The addressing modes listed there are the only possible functions of that instruction. If you wrote LDX $800000 and tried to assemble it, your assembler would (hopefully) give you an invalid operand error. That is because LDX Absolute Long is just not possible - that instruction was not allocated a hex code when the official spec was written, so it didn't make it's way into the chip. Basically, if you want to use a certain instruction in your code, make sure the addressing mode you're trying to use is valid.


LDA - Load into A

As you must already know, this instruction loads either 1 or 2 bytes into the A register. The number of bytes loaded depends on the status of the P register's M flag.


D = 0100h
DB = 80h
S = 01FDh
M flag = 0

LDA $8000 ; Load into A the 2 bytes at $808000 (absolute addressing)
LDA $60 ; Load 2 bytes from the address $000160 (direct addressing)
LDA $01,s ; Load 2 bytes from $0001FE (stack relative)

When you load data into the A register, you inherently alter flags in the P register. Remember that some flags in the P register are constantly updated relative to your actions. Here are the flags you could alter, along with the data that would trigger the change.


N (Negative) LDA #$8000 ; since the high bit of A will become set after this, it's presumed negative

Z (Zero) LDA #$0000 ; since A will now be zeroed, the Zero flag becomes set


Addressing Mode

#

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

LDA #$50

LDA $8000

LDA $8000,X

LDA $8000,Y

LDA $C01000

LDA $C01000,X

LDA $01

LDA $01,X

LDA ($50)

LDA ($50),Y

LDA [$03]

LDA [$03],Y

LDA ($80,X)

LDA $03,S

LDA ($03,S),Y

Bytes

2*

3

3

3

4

4

2

2

2

2

2

2

2

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




LDX - Load into X

This instruction is very similar to LDA, but with far fewer addressing modes to use. This lack of flexibility is fitting, as you can't perform arithmetic on X anyway. The number of bytes loaded into X depends on the IndeX flag of the P register - if that flag is set you only load 1 byte.


D = 0100h
X flag = 0

LDX #$120 ; Load into X the constant 120h (immediate addressing)
LDX $60 ; Load 2 bytes from the address $000160 (direct addressing)

Flags:
N (Negative) LDX #$8000

Z (Zero) LDX #$0000


Addressing Mode

#

ab

ab,y

d

d,y

Syntax

LDX #$80

LDX $8000

LDX $8000,Y

LDX $04

LDX $04,Y

Bytes

2*

3

3

2

2

* Operand is 1 byte when X flag = 1, 2 bytes if X is 0




LDY - Load into Y

This instruction is almost the same as LDX, differing only in 2 of the addressing modes. The number of bytes loaded into Y depends on the IndeX flag of the P register - if that flag is set you only load 1 byte.


D = 0100h
X flag = 0

LDY #$120 ; Y = 120h (immediate addressing)
LDY $60 ; Load 2 bytes from the address $000160 (direct addressing)

Flags:
N (Negative) LDY #$8000

Z (Zero) LDY #$0000


Addressing Mode

#

ab

ab,x

d

d,x

Syntax

LDY #$80

LDY $8000

LDY $8000,X

LDY $04

LDY $04,X

Bytes

2*

3

3

2

2

* Operand is 1 byte when X flag = 1, 2 bytes if X is 0




STA - Store from A


This instruction stores the contents of the A register to a location specified by the operand. The number of bytes you store is affected by the M flag (as almost all Accumulator instructions are). If M=0, the low byte of A will be stored at a location, then the high byte will be stored in the following location (location + 1).


D = 0100h
DB = 80h
S = 01FDh
M flag = 0

STA $1000 ; Store 2 bytes at $801000 (absolute addressing)
STA $60 ; Store 2 bytes at $000160 (direct addressing)
STA $01,s ; Store 2 bytes at $0001FE (stack relative)

None of the flags in the P register are affected by the STA, STX and STY operations.


Addressing Mode

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

STA $1000

STA $1000,X

STA $1000,Y

STA $7E0000

STA $7E0000,X

STA $03

STA $03,X

STA ($06)

STA ($06),Y

STA [$10]

STA [$10],Y

STA ($10,X)

STA $01,S

STA ($01,S),Y

Bytes

3

3

3

4

4

2

2

2

2

2

2

2

2

2




STX - Store from X


This instruction is very similar to STA - again differing only in the available addressing modes. Also, the number of bytes you store depends on the Index bit in the P register (the X bit).


D = 0100h
DB = 7Eh
X flag = 1

STX $9000 ; Store 1 byte at $7E9000 (absolute addressing)
STX $60 ; Store 1 byte at $000160 (direct addressing)


Addressing Mode

ab

d

d,y

Syntax

STX $2000

STX $97

STX $97,y

Bytes

3

2

2




STY - Store from Y


As you should have guessed - virtually identical to STX except for a single addressing mode. The P register's X flag controls the number of bytes stored.


D = 0100h
DB = 04h
X flag = 0

STY $9000 ; Store 2 bytes at $049000 (absolute addressing)
STY $F0 ; Store 2 bytes at $0001F0 (direct addressing)


Addressing Mode

ab

d

d,x

Syntax

STY $F000

STY $0A

STY $0A,X

Bytes

3

2

2




ADC - Add with Carry

This instruction adds the operand onto the value in A, and also adds the Carry flag (hence Add with Carry). You may remember that the carry flag is set (amongst other circumstances) when an addition results in a number larger than the A register can hold. This quality can be used to obtain addition results larger than 2 bytes - after adding 2 values, if the carry flag is set you know the answer's greater than FFFFh.

As always, though, the size of the numbers added depends on the M flag - if it's set to 1, you can only add 1 byte from the operand onto A's lower byte, giving addition results from 0 to FFh (1FEh using the carry bit). When M=0, addition can give answers from 0 to FFFFh (1FFFEh using a carry).

Since the carry bit is always added, it is customary (and strongly advised) that this flag is cleared before using ADC. This is done with the CLC (Clear Carry) opcode.


DB = C0h
S = 01FFh
M flag = 0
X flag = 0

PHX ; push 2 bytes from X onto the stack (at locations $0001FF and $0001FE)
CLC ; make sure the Carry flag is clear (0)
ADC $01,s ; add the A register, the carry flag and the 2 bytes at $0001FE
PLX ; pull X back off the stack

LDA #$0100 ; A = 0100h
CLC ; Carry flag = 0
ADC $8000 ; A now equals 8100h

Flags:
N (Negative) LDA #$7000
CLC
ADC #$8000 ; the high bit of A will become set after this operation

V (Overflow) LDA #$7000
CLC
ADC #$7000 ; A and the operand are positive but the result's negative - a signed overflow is triggered

Z (Zero) LDA #$8000 ; since A will now be zeroed, the Zero flag becomes set
CLC ; (the carry and overflow flags would also be set here)
ADC #$8000

C (Carry) LDA #$F000 ; doing this sum on paper would give you a carry after the highest bit (10000h)
CLC
ADC #$2000


Addressing Mode

#

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

ADC #$80

ADC $1000

ADC $1000,X

ADC $1000,Y

ADC $C11000

ADC $C11000,X

ADC $09

ADC $09,X

ADC ($0B)

ADC ($0B),Y

ADC [$0D]

ADC [$0D],Y

ADC ($0B,X)

ADC $01,S

ADC ($01,S),Y

Bytes

2*

3

3

3

4

4

2

2

2

2

2

2

2

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




SBC - Subtract with Carry

This instruction subtracts the operand from the value in A, and uses the Carry flag as somewhere to borrow from. If the subtraction didn't require a borrow, the Carry flag remains set (since you -always- set the carry flag before a SBC, otherwise there'd be nowhere to borrow from!). If the subtraction required a borrow, the Carry flag would be zeroed. If you don't know what borrowing means in relation to subtraction, read a maths book.

Of course, you will have realized that the number of bytes you subtract from and with depends on the M flag's setting. Why do you think I've been repeating that all this time?!


DB = C0h
S = 01FFh
M flag = 0
X flag = 0

PHY ; push 2 bytes from Y onto the stack (low byte at $0001FE, high byte at $0001FF)
SEC ; make sure the Carry flag is set (so we can borrow from it)
SBC $01,s ; subtract Y (the two bytes at $0001FE) from A, storing the result in A
PLY ; pull Y back off the stack

LDA #$0100 ; A = 0100h
SEC ; Carry flag = 1
SBC $8000 ; A now equals 8100h, Carry flag cleared (we needed a borrow)

Flags:
N (Negative) LDA #$1000 ; the high bit of A will become set after this operation
SEC
SBC #$8000

V (Overflow) LDA #$1000 ; both numbers are positive but the answer's negative - a signed overflow is triggered
SEC
SBC #$2000

Z (Zero) LDA #$8000 ; since A will now be zeroed, the Zero flag becomes set
SEC
SBC #$8000

C (Carry) LDA #$1000 ; this subtraction requires a borrow - that borrow is 'taken' from the Carry flag,
SEC ; so after the SBC, the carry flag would be cleared
SBC #$2000


Addressing Mode

#

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

SBC #$80

SBC $0100

SBC $0100,X

SBC $0100,Y

SBC $808100

SBC $808100,X

SBC $77

SBC $77,X

SBC ($88)

SBC ($88),Y

SBC [$99]

SBC [$99],Y

SBC ($A0,X)

SBC $01,S

SBC ($01,S),Y

Bytes

2*

3

3

3

4

4

2

2

2

2

2

2

2

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




ASL - Arithmetic Shift Left

This instruction shifts the operand left by one bit, effectively doubling the source. The highest bit of the operand is shifted into the Carry flag, useful for creating a bitplane of on/off flags (once you shift, you test the carry flag). Quick doubling and bitplanes are the most common uses for this instruction. The lowest bit of the target is always zeroed after an ASL.

This instruction can operate on both the A register or a memory location, so the number of bytes affected at a time is governed by the M flag. In the case of the A register being shifted when M=1, the process is simple:

On the other hand, if a 2-byte memory location is shifted (M=0), the reverse byte ordering of the 65816 makes things a bit more complicated:

Generally you won't have to worry about this complication, but it helps to be aware of nuances like this, especially when debugging 'interesting' code.


D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0

LDA $80 ; A = %1111111100000000 = FF00h, Carry flag unknown
ASL A ; A = %1111111000000000 = FE00h, Carry flag is set

LDA #$0100 ; A = %0000000100000000, Carry flag unknown
ASL A ; A = %0000001000000000, Carry flag is cleared

Flags:
N (Negative) LDA #$4000
ASL A ; A's highest bit will become set after this

Z (Zero) LDA #$8000
ASL A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h

C (Carry) ASL $80 ; the high byte ($000081) has it's highest bit set, which is moved into the Carry flag


Addressing Mode

A

ab

ab,x

d

d,x

Syntax

ASL A

ASL $8000

ASL $8000,X

ASL $90

ASL $90,X

Bytes

1

3

3

2

2




LSR - Logical Shift Right

Similar to ASL, this instruction shifts the operand right by one bit, effectively halving and rounding down the source. The lowest bit of the operand is shifted into the Carry flag, giving similar bitplane uses to ASL. The highest bit of the operand is always made zero after LSR is executed.

It is useful that the Carry flag is altered by this instruction, as it allows you to divide by powers of 2 with a remainder. For example:

The actual code involved in this division/remainder use for LSR is a bit too complex for this section, but will be covered in a later section.


D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0

LDA $80 ; A = %1111111100000000 = FF00h, Carry flag unknown
LSR A ; A = %0111111110000000 = 7F80h, Carry flag is cleared

LDA #$0100 ; A = %0000000100000000 = 0100h, Carry flag unknown
LSR A ; A = %0000000010000000 = 0080h, Carry flag is cleared

Flags:
N (Negative) Since the highest bit is cleared, the N flag is -always- set to 0 by LSR

Z (Zero) LDA #$0001
LSR A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h

C (Carry) LDA #$FFFF
LSR A ; the lowest bit (which is set) is moved into the Carry flag


Addressing Mode

A

ab

ab,x

d

d,x

Syntax

LSR A

LSR $1000

LSR $1000,X

LSR $05

LSR $05,X


Bytes

1

3

3

2

2




ROL - Rotate Left

Again, this instruction is similar to ASL and LSR, though it is not quite as destructive. Whereas ASL and LSR moved zeroes into the lowest/highest bits, the ROL instruction moves the Carry flag into the highest bit, so it's possible to continuously rotate the operand without eventually destroying the data therein. This feature of the ROL/ROR instructions lets you shift a large, contiguous block of memory left or right without zeroing bits in-between. Here is a demonstration:

In that example, you could keep ROL'ing A until it reached C0h again.


D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0

LDA $80 ; A = %1111111100000000 = FF00h
SEC ; A = %1111111100000000 = FF00h, Carry flag is set
ROL A ; A = %1111111000000001 = FE01h, Carry flag set
ROL A ; A = %1111110000000011 = FC03h, Carry flag set

Flags:
N (Negative) LDA #$4000
ROL A ; A's highest bit will become set

Z (Zero) CLC ; so we don't rotate a 1 into A
LDA #$8000
ROL A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h

C (Carry) LDA #$FFFF
ROL A ; the highest bit (which is set) is moved into the Carry flag


Addressing Mode

A

ab

ab,x

d

d,x

Syntax

ROL A

ROL $1200

ROL $1200,X

ROL $03

ROL $03,X


Bytes

1

3

3

2

2




ROR - Rotate Right

As you should have guessed, this instruction is very similar to the previous 3 - in this case the operand is shifted to the right by 1 bit, the Carry flag is moved into the highest bit, and the lowest bit (shifted out of existence) is moved into the Carry flag.


D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0

LDA $80 ; A = %1111111100000000 = FF00h
SEC ; A = %1111111100000000 = FF00h, Carry flag is set
ROR A ; A = %1111111110000000 = FF80h, Carry flag cleared
ROR A ; A = %0111111111000000 = 7FC0h, Carry flag cleared

Flags:
N (Negative) SEC
ROR A ; A's highest bit will become set (Carry flag shifted into A)

Z (Zero) CLC ; so we don't rotate a 1 into A
LDA #$0001
ROR A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h

C (Carry) LDA #$FFFF
ROR A ; the lowest bit (which is set) is moved into the Carry flag


Addressing Mode

A

ab

ab,x

d

d,x

Syntax

ROR A

ROR $1000

ROR $1000,X

ROR $09

ROR $09,X

Bytes

1

3

3

2

2




PHA - Push A

PHA, and the next 6 push instructions, are all extremely similar. PHA stores the contents of the A register (1 or 2 bytes, depending on the M flag) at the memory location pointed to by the S register. This action changes the value of the S register to point to the next free byte on the stack.


S = 01FFh
M flag = 0

LDA #$7700
PHA ; 00h is stored at $0001FF, 77h at $0001FE
; S register now contains 01FDh ( S = S - # of bytes pushed )

No flags are affected by the Push instructions.

Since all the other push instructions work the same as PHA, here's a brief listing of what they push and how many bytes end up on the stack:

The reason for pushing the PB register may not be obvious, as PB can only be modified with a jump-style instruction. It's most often used to make sure the DB register points to the same bank as PB, so any addressing modes that are DB sensitive will load from the bank whose code is currently being run.




PLA - Pull A

PLA, and the next 5 pull instructions, are also extremely similar to one another (surprised?). PLA loads into A either 1 or 2 bytes from the S register + 1 (+1 because S points to the next free byte). The S register is then incremented by the number of bytes pulled.


S = 01FFh
M flag = 1

PHB ; The DB register is stored at $0001FF, the Stack register is decremented by 1
PLA ; load a byte from S+1, which is the value of DB we just pushed

That bit of code allows us to transfer DB to A, something not possible with the set of transfer instructions.

You should remember that the push instructions don't alter any flags - however the pull instructions do. For most of these instructions, the flags altered are the same as those for a typical LDA instruction, since after all a PLA is effectively doing LDA (S+1).


Flags affected by PLA, PLB, PLD, PLX and PLY:
N (Negative) LDA #$80 ; Negative flag is set
PHA ; flags unchanged
LDA #$10 ; Negative flag is cleared
PLB ; Negative flag is set again - DB now contains #$80

Z (Zero) LDA #$00 ; Zero flag is set
PHA ; Zero flag unchanged
INC A ; Zero flag cleared (A = 01h)
PLA ; Zero flag set again (A = 00h)

The PLP instruction can alter any and all of the flags in the P register - since you're simply pulling a byte off the stack and sticking it in P:


M flag = 1

LDA #$20
PHA
PLP ; bit 5 of P is now set, all other flags are cleared.

PLP cannot alter the emulation bit, however, as that bit is only changeable with the XCE instruction.




BCC - Branch on Carry Clear

The branch instructions are (almost all) conditional operators - they alter which course your code takes depending on conditions specified by the P register. These instructions are immensely important to assembly on any cpu - think how limited code would be that couldn't say "If this, run code x, if that, run code y". Understanding these instructions is vital to any 65816 assembly work you wish to do.


DB = 00h
$000800 = 40h
M flag = 1

LDA $0800 ; A = 40h
ASL A ; Carry flag becomes clear (high bit moved into carry)
BCC SomeCode ; if the Carry flag is clear, branch to the code label SomeCode

LDA #$05 ; if the Carry flag was set, this code would be executed as the above branch would fail

SomeCode ; this is a code label - the assembler uses these to figure out where branches go
STA $0800 ; Store A at $000800. The value stored depends on whether the BCC was successful.

In this example, the ASL A affected the carry flag. If that flag became clear, the BCC SomeCode would make the CPU jump to SomeCode. If the flag became set, the CPU would look at the BCC instruction and completely ignore it, going straight on to the next instruction. Here we see a basic 'if' statement - if the highest bit of $000800 was set, store #$05 at $000800. If the highest bit was clear, store the shifted value there instead.

No flags are affected by the branch instructions, whether they succeed or fail.




BCS - Branch on Carry Set

BCS is extremely similar to the BCC, only the branching condition is reversed. This is helpful for similar conditions to BCC, such as bitplanes, as well as extended addition. By extended addition, I mean calculating sums that would otherwise be too large for the A register to express:


$000080 = 00h
$000081 = 80h
D = 00h
M flag = 0

LDA $80 ; A = 8000h
CLC
ADC #$8000 ; A = 0000h, Carry flag becomes set
STA $80
BCS Carry ; if the carry flag became set, jump to the code label Carry

RTS ; if the carry flag was clear, this code would have been executed

Carry
INC $82 ; if a carry occurred, increment the high byte of $80
RTS ;
RTS causes a subroutine to return to the code that called it

This code isn't the most efficient that could have been written, but it serves the purpose. Here, a number has 8000h added to it, then if the result is greater than 10000h (carry flag set, in other words), the 2 bytes at $82 are incremented. Remember the reverse byte ordering - $80 is the lowest byte, $81 the high byte. This can be extended to say $82 is an even higher byte, and $83 is higher still. This code actually allows 32 bit addition - though some extra code would be needed for it to be actually useful.




BEQ - Branch if Equal

BEQ, along with its sister function BNE, branch depending on the status of the Zero flag. If the zero flag becomes set by some action, BEQ will succeed (jump to new code, in other words). If the zero flag is clear, the branch will fail and the cpu will continue processing like the opcode wasn't there.

This instruction is useful for seeing if a variable is zero or not (duh). It's useful for things like joypad testing, where values of zero mean nothing is happening:


$004218 = 00h ; these memory locations are
SNES registers - not like normal memory
$004219 = 00h
M flag = 0

LDA $004218 ; $004218 returns the status of player 1's joypad
BEQ NoAction ; if no buttons have been pressed, don't do any joypad processing

; insert joypad response code here ;

NoAction ; code label
RTS ; return to calling code

If any of the bits in A had become set, the BEQ would have failed and joypad processing would have occurred. As it happens, no buttons had been pressed so the joypad processing was skipped altogether.




BNE - Branch if Not Equal

The companion to BEQ, this opcode will branch if the zero flag is cleared. This is useful for the same reasons as BEQ - it's pretty much a personal decision which to use (though sometimes one makes more sense than the other does). BNE is especially useful for loops, where a value is continuously being counted down.


DB = 7Eh
M flag = 1
X flag = 1

LDA #$00 ; set up the A, X and Y registers
LDX #$08
LDY #$00

Repeat ; another label - all these do to your code is make it more readable
STA $8000,y ; store 00h at $7E8000+Y
INY ; add 1 to Y
DEX ; subtract 1 from X
BNE Repeat ; if X hasn't reached 0 yet, loop back to Repeat

The loop at Repeat will cycle through 8 times, storing 8 copies of 00h at $7E8000. It is a good example of what the index registers are designed for - Y is indexing the storing of values, and X is counting down the loop.




BMI - Branch if Minus

As this instruction tests the setting of the N flag, it is useful for both detecting negative numbers and quickly testing the high bit of a variable. If you decide to use 7 bit values for your text, with the sign bit denoting a special action or substring, you could have code like the following:


M flag = 1

LDA $118400,x ; load a byte from $118400+X - N flag is set if it's negative
BMI Special ; if the highest bit is set, jump to the label Special

; normal text code ;

Special ; code label
; special char code ;

Hopefully you understand the concept of branching now - if the condition for the branch is met (in this case, if the N flag is set), the cpu jumps to wherever the branch points. If the condition fails, the cpu continues on to the next instruction following the branch.




BPL - Branch if Plus

This instruction is the opposite of BMI - it branches if the N flag is clear (the last action gave a positive result). This is useful for seeing if the high bit is clear, so it lends itself to waiting for the snes to reach it's VBlank. The VBlank is the period when you can safely update the on-screen graphics, as the snes has finished drawing a frame and is waiting for the electron gun to get back to the top of the screen.


M flag = 1

TestVBL
LDA $004210 ; the high bit of this register is set if the VBlank period has been reached
BPL TestVBL ; keep loading $004210 until the high bit is set

In this very common loop, the PPU register $4210 is continuously tested to see if it's high bit is set - at which point you can safely update vram/sprites.




BRA - Branch Always

Contrary to the previous conditional branch operations, this opcode forces the cpu to jump without testing any of the P register's flags - hence the name. This is useful for cleaning up after other branches, as sometimes you want your code to continue past another, conditional section:


M flag = 1

LDA $7E9011 ; load a variable
BMI SomeCode ; if the high bit's set, jump to SomeCode

; if the high bit was clear, this code is executed:
; insert unimportant code ;
BRA CleanUp ; now we jump to CleanUp

SomeCode
; more unimportant code ;

CleanUp ; whether or not the BMI was successful, this code is run
; code that was required either way ;
RTS

If the BRA statement wasn't in there, as soon as the code following the BMI was run, the cpu would have continued on to run whatever is at SomeCode, which is often not desirable. The BRA statement lets us bypass SomeCode and go straight to CleanUp.




BVC - Branch on Overflow Clear

I've never actually had to use this instruction, or it's mirror image BVS, so it's not too easy to think up an example. Basically it just branches if the V flag is clear, which can be done by a myriad of actions.




BVC - Branch on Overflow Set

See BVC.




BRL - Branch Long

This instruction is exactly the same as BRA, only you can branch further. If you remember the addressing modes (as you surely do :) all the branch instructions have a 1 byte signed operand, letting them jump a maximum of 128 bytes backwards or 127 bytes forwards in your code. The BRL (and PER) instruction allows a 2 byte operand letting you jump 32768 bytes backwards or 32767 byte forwards. Although that makes it almost identical in functionality to the JMP instruction, remember the operand is relative to the current location, so there's nothing stopping you copying your code in a hex editor, pasting it somewhere else and still having it run properly - something a JMP instruction would merrily crash.

Whether you use BRA or BRL in your code pretty much depends on what kind of errors you get - if you assemble your code and get "error - branch out of range" all over the place, you'll need to either optimize your code or stick in a few BRL's here and there.




MVN - Move Negative

As described in the previous document, these block move instructions use the A, X and Y registers to move data from somewhere to somewhere else. They have a few uses, but dma is generally used instead.


M flag = 0
X flag = 0

LDA #$007F ; transfer 80h bytes (A+1)
LDX #$8000 ; from $7E8000
LDY #$8001 ; to $7E8001
MVN $7E, $7E

A is assigned the number of bytes minus 1 to transfer, X the starting word address and Y the destination word address. The operand then specifies the source bank and destination bank. After the move, A will equal FFFFh, X will be whatever it started at + A + 1, and Y will also be it's initial value + A + 1.

The setting of the M flag is completely ignored by the MVN/MVP instructions. If the X flag is set to 1, however, these instructions assume the high bytes of X and Y are 00h.

MVN copies bytes forwards in memory, starting at X -> Y, then X+1 -> Y+1, then X+2 -> Y+2 etc.

No flags are affected by MVN/MVP.


Addressing Mode

axy

Syntax

MVN $7E,$7F

Bytes

3




MVP - Move Positive

Similar to MVN, but X and Y are decremented instead of incremented. The block move starts with X -> Y, then X-1 -> Y-1, then X-2, Y-2 etc. until A passes through 0. Only really useful for zeroing ram in front of the stack, and other trivial matters.


Addressing Mode

axy

Syntax

MVP $90,$7E

Bytes

3




STZ - Store Zero

This handy instruction stores a zero in the memory location you specify. If the M flag is set to 0, 2 bytes are zeroed, compared to 1 byte zeroed if M = 1.


M flag = 1
X flag = 0
DB = 7Eh

LDX #$0800
LDY #$0000

Repeat
STZ $1000,y ; store 00h at $7E1000+Y
INY
DEX ; repeat this loop 800h times
BNE Repeat

This code will store 00h in the first 800h bytes at $7E1000. Not the fastest way to zero memory, but effective and readable nonetheless.

None of the flags in the P register are affected by STZ.


Addressing Mode

ab

ab,x

d

d,x

Syntax

STZ $8000

STZ $1000,X

STZ $80

STZ $50,X

Bytes

3

3

2

2




XBA - Exchange B with A

This instruction doesn't do anything to the DB register, it's just an operand-free way to swap the high and low bytes of A. The need to swap the low and high bytes of A (known as A and B for this instruction alone) pops up every now and then, so it's worth knowing about. When the M flag is set to 1, it's useful to store a temporary byte variable with XBA (the high byte of A will be otherwise untouchable, much like pushing it onto the stack). Will wonders never cease.

Another name for the A register stems from this instruction - 'C' denotes A as being 2 bytes (A = 1 byte, B = 1 byte, C = 2 bytes). A being called C rears its ugly head in the register transfer instructions (TCD instead of TAD).


Flags:
N (Negative) LDA #$0080 ; let's assume the M flag is 0 for now
XBA ; A is now 8000h, presumed negative

Z (Zero) LDA #$00FF ; even if the M flag is set to 0,
XBA ; the zero flag is set if A's low byte becomes zero




CMP - Compare with A

The Compare instructions, along with the branching ones, are the most fundamental ways to perform if..else analysis. In all its simple glory, you load a value into a register, 'compare' it with another value, then branch somewhere depending on the result.

The compare instructions actually simulate the SBC command in every way shape and form, EXCEPT that the register in question is never altered. Comparing does NOT alter the overflow flag, but the N, Z and C flags are all altered by the same conditions as SBC. That is, SBC #$50 and CMP #$50 would set the same flags as each other (excluding V), but the CMP wouldn't alter the A register as SBC would. And, of course, as CMP focuses on the A register, the number of bytes you fetch from the operand, and the number of bytes you compare against in A, are dependent on the M flag.

The number of uses for CMP means there's no 'ultimate' example that will display every known use for the instruction, but here is a routine use for it:


D = 0000h
M flag = 1
$000080 = 03h

LDA $80 ; A = 03h

CMP #$01 ; 03h - 01h = 02h -> Zero flag is cleared (result not zero)
BEQ Code01 ; BEQ fails because Z = 0

CMP #$02 ; 03h - 02h = 01h -> Z = 0
BEQ Code02 ; branch fails

CMP #$03 ; 03h - 03h = 00h -> Z = 1
BEQ Code03 ; BEQ succeeds - cpu jumps to Code03 (wherever that is)

BRA Normal ; if none of the previous BEQ's worked, BRA to Normal

Throughout that series of CMP instructions, the value of A remained constant. There was actually nothing stopping you replacing all the CMP #$xx instructions with LDA $80, SBC #$xx, as the correct code would have eventually be found.


Flags:
N (Negative) LDA #$1000
CMP #$2000 ; 1000h - 2000h = F000h -> most significant bit is set so N flag is set
BMI SomeCode

Z (Zero) LDA #$1000
CMP #$2000 ; 1000h - 2000h = F000h -> result not zero, so Z flag cleared
BNE SomeCode

C (Carry) LDA #$1000
CMP #$2000 ; 1000h - 2000h = F000h -> carry was required (A < operand), so C = 0
BCC SomeCode

The Carry flag is an interesting one for the compare instructions - the previous setting of C is obliterated after a compare is executed. That is, it wouldn't matter if you put a CLC or SEC before a compare instruction, the result would be the same.


Addressing Mode

#

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

CMP #$87

CMP $8000

CMP $8000,X

CMP $8000,Y

CMP $7E3000

CMP $7E3000,X

CMP $03

CMP $03,X

CMP ($06)

CMP ($06),Y

CMP [$09]

CMP [$09],Y

CMP ($0C,X)

CMP $01,S

CMP ($01,S),Y

Bytes

2*

3

3

3

4

4

2

2

2

2

2

2

2

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




CPX - Compare with X

This instruction functions identically to CMP, though obviously it compares the operand against X instead of A. The number of addressing modes has been drastically reduced, though CPX is rarely used for anything but Immediate addressing.

The number of bytes X is compared against (and taken into account by the subtraction) is governed by the X flag.


DB = 00h
M flag = 1
X flag = 0

LDX #$0000

Repeat
STZ $0000,x
INX
CPX #$1F00 ; once X reaches 1F00h, set the zero flag
BNE Repeat ; keeps looping until X reaches 1F00h (zeroes 1F00h bytes)

Flags:
N (Negative) LDX #$1000
CPX #$2000 ; 1000h - 2000h = F000h -> most significant bit is set so N flag is set
BMI SomeCode

Z (Zero) LDX #$1000
CPX #$1000 ; 1000h - 1000h = 0000h -> result zero, so Z flag set
BEQ SomeCode

C (Carry) LDX #$6000
CPX #$5000 ; 6000h - 5000h = 1000h -> carry was NOT required (A >= operand), so C = 1
BCC SomeCode


Addressing Mode

#

ab

d

Syntax

CPX #$89

CPX $1200

CPX $12

Bytes

2*

3

2

* Operand is 1 byte when X flag = 1, 2 bytes if X is 0




CPY - Compare with Y

This instruction is the same as CPX in every way - even addressing modes - except for the fact that it focuses on the Y register. The number of bytes Y is compared against (and taken into account by the subtraction) is governed by the Y flag.

For examples/flags/addressing modes, see CPX.




NOP - No Operation

This single-byte instruction simply eats up clock cycles - 2 to be exact. It doesn't alter any flags or any registers at all - just takes 2 clock cycles to run. NOP is most useful for time-sensitive hardware-related issues, such as multiplication. In the world of snes hardware multiplication/division (there is none built into the 65816, so nintendo added several registers capable of these functions), you have to wait 15 or so clock cycles after you store the values to be computed, so a common way to waste that time is with NOP.

In terms of hacking, though, NOP is a useful way to clear out unwanted checksum calculation, copy-protection routines or other unwanted code.




BRK, COP and STP

These 3 instructions, BRK (Break), COP (Coprocessor) and STP (Stop) are completely and utterly useless in the SNES universe - they are simply remnants carried over from the fact that the 65816 was actually used in real computers, computers that needed these extra interrupts.

If you really want to learn about these instructions, consult the all-knowing, all-seeing EPR.




CLC - Clear Carry Flag

This handy little instruction clears the Carry flag of the P register. Useful for setting up addition, and not a heck of a lot else.




CLD - Clear Decimal Flag

This clears the Decimal flag, thus leaving the snes in the good wholesome state of hexadecimal arithmetic.




CLI - Clear Interrupt Disable Flag

By clearing the Interrupt Disable flag in the P register, you allow interrupts to take control of the CPU when they are triggered. More specifically, you cause the cpu to jump to the NMI vector every time you reach the Vertical Blank (scanline 224 in NTSC mode), as well as jumping to the IRQ vector if you enabled the Horizontal or Vertical Interrupts.

The actual usage of interrupts is a bit complex to explain here, and will be covered later.




CLV - Clear Overflow Flag

This clears the Overflow flag, which is only ever much use if you're attempting signed addition/subtraction (remember you trigger signed overflows when adds/subs overflow the high bit of A).




SEC - Set Carry Flag

Setting the Carry flag is always advised before a SBC instruction, and apart from that it's also useful for the ROR/ROL instructions to move a 1 into a variable's top or bottom bit.




SED - Set Decimal Flag

Setting the Decimal Flag to 1 invokes the 65816's decimal mode, where any loads into registers convert the regular hex number to the bastard child of decimal and hex (0100h = 0256h in decimal mode).

There are some remote uses for decimal mode, such as printing a decimal number on the screen (just store a variable, invoke decimal mode, load the variable, and it's already converted), but not much else.




SEI - Set Interrupt Disable Flag

By setting the Interrupt Disable flag you turn off any interrupts the snes tries to conjure up. That means you have complete control over the cpu for as long as you want, without having to fear an interrupt appearing and destroying the stack. This instruction is almost always executed at the very beginning of most snes games, as initialization routines don't really need to know (or care) when you're entering VBlank.




AND - Bitwise And with A

The AND instruction should be immediately familiar to any C/C++ programmer - it simply performs a bitwise AND with A and the operand, storing the result in A ( A &= operand ). If you're not a C/C++ programmer (god help you), the AND instruction looks at the bits in A and the bits in the operand, then stores in A only the bits that were set in both:


A

0

1

0

1

Operand

0

0

1

1

Result

0

0

0

1


A:

Operand:

Result:

11011110 11011110

00011100 11000111

00011100 11000110


A:

Operand:

Result:

10001100 10001100

00110011 00110011

00000000 00000000

AND is a useful way to isolate certain parts of a value - AND #$0F will leave the low nibble of a variable in A for example.

And, of course, the number of bytes you AND with depends on the M flag.


$000080 = 45h
M flag = 1
D = 0000h

LDA $80 ; A = 45h
AND #$40 ; A = 40h (01000101b & 01000000b)
BNE Bit6Set ; if a bit remains set, branch (zero flag clear)

Here, we AND a variable in A with 40h, which will leave either bit 6 set or all bits clear. Testing individual bits is a typical use for AND.


Flags:
Z (Zero) LDA #$FF ; all bits set
AND #$00 ; all bits clear -> Z flag set (11111111b & 00000000b)

N (Negative) LDA #$FF ; all bits set -> N flag set
AND #$7F ; high bit cleared -> N flag cleared


Addressing Mode

#

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

AND #$0F

AND $9000

AND $9000,X

AND $9000,Y

AND $819000

AND $819000,X

AND $03

AND $03,X

AND ($06)

AND ($06),Y
AND [$F0]

AND [$F0],Y

AND ($70,X)

AND $03,S

AND ($03,S),Y

Bytes

2*

3

3

3

4

4

2

2

2

2

2

2

2

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




ORA - Bitwise OR with A

As with AND, this bitwise operator works the same as the C/C++ equivalent ( A |= operand ). For those not up to date on boolean logic, when you OR two numbers, any bits that were set in either number are set in the result. Here are the same example numbers as AND:


A

0

1

0

1

Operand

0

0

1

1

Result

0

1

1

1


A:

Operand:

Result:

11011110 11011110

00011100 11000111

11011110 11011111


A:

Operand:

Result:

10001100 10001100

00110011 00110011

10111111 10111111

ORA can be used to combine two variables (read joypad 1 and 2 at the same time, for example), as well as more advanced functions such as overlapping font tiles (variable width fonts, in other words). Also, the number of bytes computed depends on the M flag.


M flag = 0

LDA $004218 ; load player 1's joypad information (2 bytes)
ORA $00421A ; combine with player 2's joypad information (2 bytes)

That piece of code will let the game read joypad information whether the player is using joypad 1, 2 or both at once. Final Fantasy II is an example of this. ORA is also useful for setting individual bits in A, as ORA #$80 will set the negative bit of A, for instance.


Flags:
Z (Zero) LDA #$00 ; zero flag set
ORA #$FF ; zero flag cleared - A = FFh

N (Negative) LDA #$01
ORA #$80 ; sets high bit in A -> N flag is set


Addressing Mode

#

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

ORA #$80

ORA $1000

ORA $1000,X

ORA $1000,Y

ORA $7E9000

ORA $7E9000,X

ORA $43

ORA $43,X

ORA ($46)

ORA ($46),Y

ORA [$90]

ORA [$90],Y

ORA ($00,X)

ORA $01,S

ORA ($01,S),Y

Bytes

2*

3

3

3

4

4

2

2

2

2

2

2

2

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




EOR - Exclusive OR with A

In-keeping with bitwise operators, EOR performs an exclusive or between A and the operand ( A ^= operand ). When you exclusively OR on the 65816, if a bit of the operand is set, the corresponding bit in A is flipped - 0 becomes 1, 1 becomes 0. If a bit in the operand is clear, the corresponding bit in A is left untouched.


A

0

1

0

1

Operand

0

0

1

1

Result

0

1

1

0


A:

Operand:

Result:

11011110 11011110

00011100 11000111

11000010 00011001


A:

Operand:

Result:

10001100 10001100

00110011 00110011

10111111 10111111

This instruction is mostly used to get the twos-complement of a variable for the addition of negative values. The number of bytes EOR affects depends on the M flag.


M flag = 0

LDA $80 ; contains number of bytes to go backwards
EOR #$FFFF ; this and the INC perform 2's complement
INC
CLC
ADC $82 ; subtract offset from $82

That somewhat cryptic code will make sense to people familiar with binary math, but not many others. EOR does have it's uses, though since this is an assembly document, not a disection of algorithms, it won't be discussed here.


Flags:
Z (Zero) LDA #$FF ; zero flag cleared
ORA #$FF ; all bits in A are flipped -> A = 00h and Z flag is set

N (Negative) LDA #$01 ; N flag is cleared
EOR #$80 ; flips high bit in A -> N flag is set in this case


Addressing Mode

#

ab

ab,x

ab,y

abl

abl,x

d

d,x

(d)

(d),y

[d]

[d],y

(d,x)

d,s

(d,s),y

Syntax

EOR #$FF

EOR $1E00

EOR $1E00,X

EOR $1E00,Y

EOR $C01000

EOR $C01000,X

EOR $04

EOR $04,X

EOR ($07)

EOR ($07),Y

EOR [$09]

EOR [$09],Y

EOR ($0A,X)

EOR $01,S

EOR ($01,S),Y


Bytes

2*

3

3

3

4

4

2

2

2

2

2

2

2

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




BIT - Test Bits with A

BIT is a useful instruction for testing variables without actually altering any registers - useful for times when you would normally use AND to test bits, but you don't want to corrupt whatever's in A. BIT operates in two 'modes' if you will - one where the operand is immediate ( #$???? ) and one where it's any other addressing mode.

When using immediate addressing, the operand is ANDed with A and the Z flag altered depending on the result of the AND. The A register isn't actually altered, however. This way, you could say ' BIT #$20 ' to test bit 5 of A, without trashing all the other bits.

When using any other addressing, the operand is ANDed with the A register (again without actually altering A), but the N and V flags are set to the highest and second-highest bits of the result, as well as the Z flag being set/cleared.

Confusing at first, but this instruction is worth understanding. As is expected, the number of bytes affected depends on the M flag.


M flag = 1

LDA $90 ; load a variable
BIT #$10 ; is bit 4 set?
BNE Bit4Set ; A is still intact at this point

In this bit of code, the byte at $90 is AND'ed with #$10 - but the result is -NOT- stored anywhere. If the result of the AND was non-zero, bit 4 of $90 must have been set so the branch succeeds.


Flags $80 = 7Fh in these examples:
Z (Zero) LDA $80 ; zero flag cleared
BIT #$0F ; if any of the last four bits are set, Z = 0

N (Negative) LDA #$80 ; N flag is set
BIT $80 ; 80h & 7Fh = 00h -> high bit of result is moved into N

V (Overflow) LDA #$C0 ; bits 6 & 7 of A are set
BIT $80 ; C0h & 7Fh = 40h -> 6th bit of result is moved into V


Addressing Mode

#

ab

ab,x

d

d,x

Syntax

BIT #$C0

BIT $1000

BIT $1000,X

BIT $04

BIT $04,X

Bytes

2*

3

3

2

2

* Operand is 1 byte when M flag = 1, 2 bytes if M is 0




TSB - Test and Set Bits with A

This instruction is basically a shortcut for OR'ing two numbers and storing the result. Firstly, it logically OR's the operand with the A register, then stores the result at the operand. That just saves you having to execute ORA followed by STA. The number of bytes affected depends on the M flag.

One quirk of TSB is in it's setting of the Z flag. Instead of setting the flag if A OR'ed with the operand is zero, it sets it if A AND'ed with the operand is zero. In this respect, it sets the Z flag under the same conditions as BIT would.


M flag = 0

LDA #$8000 ; A = 8000h (negative)
TSB $80 ; $80 = A | $80

LDA #$8000 ; this sets $80 the same as the code above
ORA $80
STA $80

In that piece of code, the value at $80 (whatever it is) has it's highest bit set. In the second bit of code using ORA, the A register would have been changed from 8000h to whatever A | $80 was. The TSB command does not alter A under any circumstances.


Flags
Z (Zero) LDA #$80 ; zero flag cleared
TSB $55 ; Z flag is set if A AND'ed with operand is zero


Addressing Mode

ab

d

Syntax

TSB $C000

TSB $04

Bytes

3

2




TRB - Test and Reset Bits

TRB is similar to TSB in that it replaces another common action - zeroing certain bits in a memory location. Quite simply, TRB zeroes any bits at a memory location that are set in A. So, if bit 7 of A is set, bit 7 of the memory location will be cleared. The actual logic behind this instruction is to get the compliment of A (flip every bit), AND it with the memory location, then store the result at that memory location. It is identical to performing EOR #$FF, AND memory, STA memory, though A is not altered. It's used to clear certain bits in a memory location, not surprisingly. The M flag affects the number of bytes computed.


M flag = 1

LDA #$80 ; high bit set
TRB $80 ; high bit of $80 will now be cleared

To figure this out manually, first flip all the bits in A, giving you 7Fh (80h = 10000000b, 01111111b = 7Fh). Then AND this with the memory location, and it's obvious the high bit will be zeroed and the rest untouched, regardless of what is already at that memory location.


Flags
Z (Zero) LDA #$FF ; zero flag cleared - this will reset all bits at a memory location
TRB $55 ; Z flag is set if compliment of A AND'ed with operand is zero


Addressing Mode

ab

d

Syntax

TRB $C000

TRB $04

Bytes

3

2




INC - Increment Memory or A

INC is an extremely common, and extremely simple, instruction. It simply adds 1 to the operand, be it the A register or a memory location. The number of bytes the increment affects is 1 or 2, depending on the M flag, but with the flags set by INC you could easily expand it to a 4 byte counter if needs be. Alsok, it has become a common feature in assemblers that having INC by itself with no operand corresponds to INC A. The number of bytes incremented (in A or a memory location) is affected by the M flag.


M flag = 0

LDA $90 ; A = ?
CLC
ADC #$8000 ; add #$8000 to $90
STA $90
BCC NoCarry ; if the result's less than 10000h, fall through

INC $92 ; if a carry occurred, inc the 3rd and 4th bytes of $90 (32 bit counter)

NoCarry
RTS

Here's your basic 32-bit counter code - if the answer's too big for $90, increment $92 ($90 = low, $91 = high, $93 = higher, $94 = highest).


Flags:
Z (Zero) LDA #$00 ; zero flag set
INC ; A = 01h, zero flag cleared

N (Negative) LDA #$7F ; N flag is clear
INC ; A = 80h, N flag is set


Addressing Mode

A

ab

ab,x

d

d,x

Syntax

INC A

INC $1100

INC $1100,X

INC $20

INC $20,x

Bytes

1

3

3

2

2




INX - Increment X

INX is ridiculously simple - it adds 1 to the X register. That's about it, really. Whether 1 or 2 bytes in X are affected depends on the X flag's setting.


M flag = 1
X flag = 0

LDX #$0000
LDY #$1000

Repeat
STZ $2000,x ; zero 1000h bytes at $2000
INX
DEY
BNE Repeat

Flags:
Z (Zero) LDX #$FFFF ; zero flag cleared
INX ; X = 000h, zero flag set

N (Negative) LDX #$7FFF ; N flag is clear
INX ; X = 8000h, N flag is set




INY - Increment Y

See INX - this instruction works exactly the same.




DEC - Decrement Memory or A

The opposite to INC, this instruction subtracts 1 from either A or a memory address. Much the same as INC, has too many uses to bother listing. The M flag controls how many bytes are affected by the decrement.


M flag = 0

WasteTime
LDA #$0800 ; a surprising number of squaresoft games do this
DEC
BNE WasteTime

Flags:
Z (Zero) LDA #$00 ; zero flag set
DEC ; A = FFh, zero flag cleared

N (Negative) LDA #$00 ; N flag is clear
INC ; A = FFh, N flag is set


Addressing Mode

A

ab

ab,x

d

d,x

Syntax

DEC A

DEC $1000

DEC $1000,X

DEC $00

DEC $00,X

Bytes

1

3

3

2

2




DEX - Decrement X

Decrement X does exactly what you think it should - subtracts 1 from X. The number of bytes in X affected depend on the X flag.


M flag = 0
X flag = 0
DB = 00h

LDX #$1000

ZeroVRAM
STZ $2118 ; write 0000h into the SNES video ram ($2118 is another register)
DEX
BNE ZeroVRAM

Flags:
Z (Zero) LDX #$0001 ; zero flag clear
DEX ; X = 0000h, zero flag set

N (Negative) LDX #$0000 ; N flag is clear
DEX ; X = FFFFh, N flag is set




DEY - Decrement Y

See DEX.




TAX - Transfer A to X

TAX is the first of the register tranfser instructions - operand free ways to copy one register's contents to another without resorting to push/pull instructions. There are a number of reasons you'd want to use these instructions - they're fast (2 cycles versus 7 for push/pull), can help avoid messy use of the stack, and can supplement instructions with few addressing modes. By supplementing, I mean you could use LDA's absolute long addressing to fetch a value, then TAX it to X, getting around the fact that LDX can't use absolute long addressing.

One interesting quirk about the transfer instructions is how many bytes they copy - what if the M flag is 1 but the X flag is 0? What about the other way around? To deal with this, each of the transfer functions have a rule governing how many bytes to copy. In the case of TAX:

The number of bytes transferred is the current width of X, as in the set with the X flag

So, whenever the X flag is 0, the full 2 bytes of A are transferred across. When X = 1, only the low byte of A is transferred to the low byte of X.


M flag = 0
X flag = 0

LDA [$03],y ; LDX doesn't have the [d],y addressing mode
TAX ; transfer the 2 bytes loaded to X

Flags:
Z (Zero) LDA #$0000 ; zero flag set
LDX #$0001 ; zero flag clear
TAX ; zero flag set

N (Negative) LDA #$8000 ; N flag set
LDX #$0000 ; N flag clear
TAX ; N flag set




TAY - Transfer A to Y

Works exactly the same as TAX (follows the same rule and all), but copies A's contents to Y.

The number of bytes transferred is controlled by the setting of the X flag




TCD - Transfer C (A & B) to D

This instruction calls the A register C, and for good reason. As 'C' denotes the high and low bytes of A, it means that 2 bytes are always transferred, regardless of the M flag's setting.

TCD is useful for setting up a new direct page somewhere other than 0000h. This has been used in commercial games to allocate temporary memory and allow a low-level implementation of threads. For example, most games set their D register to something different when talking to the SPC, so 1-byte operands can be used where 2 would be required normally.

2 bytes are always transferred by TCD

Flags are affected in the same way as TAX




TCS - Transfer C to S

This instruction allows you to change where the stack register points. This is helpful for initializing the snes as the stack defaults to the position 01FFh, which doesn't leave much room for allocating memory and such. As 'C' is used in the instruction mnemonic, 2 bytes are always transferred.

2 bytes are always transferred by TCS

No flags are affected by TCS




TDC - Transfer D to C

Again, since C is in the mnemonic, it means 2 bytes are always transferred. TDC is a useful way to zero A (LDA #$0000), as the D register is normally set to 0000h. There are times it isn't, though, which can cause huge problems if you're expecting 0000h instead of 1E00h.

2 bytes are always transferred by TDC

Flags are affected in the same way as TAX.




TSC - Transfer Stack to C

As the name implies, this instruction transfers the 2 bytes in the stack register to the A register, useful for allocating memory in front of the stack. Flags are affected in the same way as TAX.

2 bytes are always transferred by TSC


M flag = 0

TSC ; A = S
SEC
SBC #$0F ; A holds the address 10h bytes in front of the stack
TCD ; $00 would now access the memory 0Fh bytes in front of the stack

Allocating memory in this fashion can get both extremely messy and complex, but as it is extremely useful in 65816 coding it will be covered in the next section.




TSX - Transfer S to X

This instruction is almost the same as TSC, but in this case the transfer isn't always 2 bytes in size. If the X flag is set to 1, only the stack's low byte is moved to X's low byte. Otherwise, 2 bytes are transferred. Apart from that, the P register's flags are affected identically to TAX.

The number of bytes transferred is governed by the X flag




TXA - Transfer X to A

The opposite to TAX, this transfers the contents of X to A. Flags affected are the same as TAX.

The M flag governs the number of bytes transferred. If the X flag = 1, the high byte of the transfer will be 00h




TXS - Transfer X to S

TXS is used for the same reasons as TCS - to set the stack to wherever you want it. As 2 bytes are always transferred, if X is 1 byte wide the high byte transferred will be 00h. Flags are affected identically to TAX.

2 bytes are always transferred. If the X flag = 1, the high byte of the transfer will be 00h




TXY - Transfer X to Y

TXY is useful for addressing modes where only Y can be used, such as (d),y and [d],y. The number of bytes transferred depends on the X flag. Flags are affected identically to TAX.

The number of bytes transferred is dictated by the X flag




TYA - Transfer Y to A

This instruction is identical to TXA in every way, except Y is transferred instead of X.




TYX - Transfer Y to X

See TXY, but with the two index registers reversed.




JMP - Jump to new PC

JMP allows you to jump to any bit of code inside the current bank, as dictated by PB. Although sounding similar to BRL, it's extra addressing modes set it apart, allowing complex structures like indirect jump tables to be created simply and easily. JMP doesn't record the current PC address in the stack like JSR does, so you can't simply return from whatever code you jump to. This problem can be solved with the PER instruction, however, or simply with JSR's own indirect indexed capability.

Indirect jumps are a favorite of many games that use IRQ interrupts - you point the interrupt to a RAM location so you can continually change where the IRQ jumps to.


M flag = 1
X flag = 1

LDA $40 ; load a variable
ASL ; double it
TAX ; use X as the index to..
JMP ($9000,x) ; a jump table

In this bit of code, we load up whatever is in $40, double it (as each member of a jump table requires 2 bytes), transfer it to X, then jump 'through' that location. To expand on that, $9000 contains a number of 2 byte addresses to code in the current PB bank. By adding (variable+2) to the base location of $9000, we jump to the piece of code corresponding to that variable. Jump tables are used in almost every snes game ever made, often as a quick replacement huge CMP, BEQ statements.

No flags are altered by JMP.


Addressing Mode

ab

(ab)

(ab,x)

Syntax

JMP $99A0

JMP ($1000)

JMP ($1000,X)

Bytes

3

3

3




JML - Jump Long

JML is an extention to JMP that allows the PB register to be altered as well - letting you jump to any code in the 65816's full 24-bit address space. It doesn't have the (ab,x) addressing mode of JMP, but it's full 24-bit range can be very useful. No flags are altered by JML


Adressing Mode

abl

(ab)

Syntax

JML $C00000

JML ($0200)

Bytes

4

3




JSR - Jump to Subroutine

JSR is an extremely useful instruction. It firstly pushes the 16-bit address of the following instruction onto the stack, then jumps to the code pointed to by the operand. When a RTS instruction is encountered, the address that was pushed onto the stack is pulled into PC, thus returning the cpu to the code just following the JSR. In simple terms, it lets you jump to some code, process something, execute RTS, and return to where the JSR was called. Same principle as a function call in C/C++. As with JMP and JML, no flags are altered.


X Flag = 1

LDX #60 ; X = 60 decimal

Waiter
JSR Wait ; CPU jumps to Wait, pushes the address of DEX onto the stack
DEX ; count down X
BNE Waiter

Wait
WAI ; waits for an interrupt to occur
RTS ; once an interrupt hits, return to wherever the JSR came from

This code actually does something - wastes 1 second of cpu time to be specific. First of all, X is set to 60, which is the number of times the NMI interrupt hits every second. Then, we JSR to Wait, which puts the 65816 into a power-down state until an interrupt hits. Once the interrupt has been triggered, the RTS pulls the address of DEX back off the stack and into the PC register, forcing the cpu to continue processing at DEX.


Addressing Mode

ab

(ab,x)

Syntax

JSR $9000

JSR ($9000,X)

Bytes

3

3




JSL - Jump to Subroutine Long

JSL is an extension of JSR Absolute, allowing full 24-bit jumps to anywhere in the 65816 address space. As such, it pushes 3 bytes onto the stack to record the next instruction, and requires a RTL instruction to return, instead of RTS.


Addressing Mode

abl

Syntax

JSL $ED4000

Bytes

4


RTS - Return from Subroutine

This is the companion instruction to JSR, allowing you to return to whatever code called it. Though it sounds fun and wholesome, all RTS does is pull two bytes off the stack and put them into the PC register. Unfortunately, if you've been pushing and pulling a lot of values in your subroutine, a misplaced RTS will simply pull off of the stack whatever the last thing was that you pushed on.

Whenever you're using subroutines, you have to make sure all your push actions have been pulled off at some point, or a RTS will direct the cpu to who-knows-where.




RTL - Return from Subroutine Long

As was the case with RTS/JSR, RTL allows you to return from a JSL instruction back to the original code. RTL pulls 2 bytes off the stack and into PC, then a third byte into PB. As always, caution must be taken to make sure the stack has had everything pulled back off that was pushed after the JSL, or horrid things will happen.




RTI - Return from Interrupt

RTI is yet another return-from-subroutine type instruction, but specially tailored for interrupts. When an interrupt hits, it immediately causes the 65816 to jump to the appropriate vector. Before this jump, the interrupt calling routine will push the 3 bytes of the next instruction onto the stack, followed by the P register (equivalent of JSL, PHP). Because of this extra byte being pushed, RTI is designed to perform a PLP instruction, then dump 3 more bytes from the stack into PC and PB registers. Despite RTI automatically preserving the P register for you, almost all commercial games decide to PHP immediately inside the interrupt code anyway. Go figure, I guess.




PEA - Push Effective Absolute

The name of this instruction is quite misleading - it actually pushes the 2-byte operand straight into the stack. A name of Push Effective Immediate would have made more sense. It's useful for times when you want to get a certain value straight into the stack, without first having to load it into a register then pushing it.

The number of bytes pushed is always 2, regardless of any setting in the P flag. No flags are affected.


M flag = 1

LDA #$FF
PHA
LDA #$00
PHA

PEA $00FF

The two pieces of code above perform the same action, though in the first bit A is zeroed, which may not be desired. The syntax for PEA is interesting - it appears to mean "push onto the stack the two bytes at $00FF", however this is definitely NOT the case. It simply means "push FFh then 00h onto the stack".


Addressing Mode

#

Syntax

PEA $5050

Bytes

3




PEI - Push Effective Indirect

This friendly looking instruction pushes 2 bytes onto the stack through a location on the D-Page. Useful for the same reasons as PEA. Two bytes are always pushed by PEI, and no flags are altered.


M flag = 1

LDA ($00)
PHA
LDA ($01)
PHA

PEI ($00)

Both the above pieces of code give the same end result - the 2 bytes at ($00) are pushed onto the stack.


Addressing Mode

(d)

Syntax

PEI ($01)

Bytes

2




PER - Push Effective Relative

PER is another of the load-directly-into-the-stack instructions, this time pushing the 2 byte result of (PC + Operand + 2). I can think of few uses for this off the top of my head.


SEP - Set Bits in P




REP - Reset Bits in P




XCE - Exchange Carry with Emulation Bit

As is implied by the name, this instruction allows you to set/clear the hidden emulation flag of the 65816. At some point, all snes games are going to execute the famous CLC, XCE sequence to get the cpu out of emulation mode, which it thoughtfully starts up in.

After the XCE has been executed, the Carry flag is assigned the previous value of the Emulation bit. In the case of a CLC, XCE at startup, the Carry flag would be set to 1 afterwards. This has all kinds of uses if you're writing an NES emulator and want to know what mode you're in (?).




WAI - Wait for Interrupt

As demonstrated earlier, executing this instruction makes the 65816 idle until an interrupt hits. If interrupts are disabled (I flag = 1), the instruction simulates a NOP and continues to the next instruction.



Congrats!

Now that you're completely fluent with the opcodes (hohoho), it's time to use them in some meaningful code. The next section covers assembly techniques and common, useful bits of code for any hacking work - Coding in 65816 Assemblers.


Follow Ups:



Post a Followup

Name:
Password:
E-Mail:

Subject:

Comments:

Optional Link URL:
Link Title:
Optional Image URL:
No Text: Ongoing Text: Translation Request: Release:


| Follow Ups | Post Followup | CTC's Advanced Rom Hacking and Translation Underground |

Home