The following warnings occurred:
Warning [2] count(): Parameter must be an array or an object that implements Countable - Line: 895 - File: showthread.php PHP 7.3.33 (Linux)
File Line Function
/showthread.php 895 errorHandler->error




Users browsing this thread: 1 Guest(s)
asm hacking: "faking" short jumps from your code (in a new bank) to old code

#1
Posts: 45
Threads: 4
Thanks Received: 7
Thanks Given: 1
Joined: Jul 2013
Reputation: 4
Status
None
A technique for assembler hacking. There's a lot of explanation here of what I'm doing. The actual code is very short. You can just scroll down and copy it if you don't want to understand it. But I wouldn't recommend that (problems might crop up).

So: I've been lurking these forums for the last while. After reading a bunch of documents, I've finally started toying around with actual hacking. I have fairly ambitious plans, but I don't want to make any promises or go into detail until I have some degree of success.

Anyway: I've noticed people complaining about the difficulty of inserting new code or making significant changes to code in existing banks. In particular, C1 (where a lot of complex graphic routines seem to reside) seems to have only a few bytes of free space, which means you can't really add any new subroutines, nor make existing ones significantly longer. You can't just use another bank, because interfacing between banks is difficult unless the code is specifically designed for it. The destination bank has to "know" the jump comes from another bank, and use the appropriate RTL statement, instead of a regular RTS.

In the existing code, there are subroutines that have their own "long access" helper subroutines. These tiny subroutines just accept long jumps, then make a short jump to the subroutine they provide access to, and then RTL. This means the code can be accessed from another bank, and also with a regular JSR from the same bank.

Now, you could just write your own new "long access" subroutines and insert them into unused bytes in the destination bank. But, for every subroutine you want to jump to from a separate bank, you have to write it its own unique access subroutine. C1 in particular has so few free bytes, and so many separate tiny subroutines, this is a problem. (It is also a problem in other banks.)

Here's my solution. This allows us to write our own code in a brand new bank in an expanded rom, and have it call any subroutine in C1 or any existing bank. Only two bytes of new code have to be inserted into the destination bank.

The two bytes I added to C1:
Code:
01FFE5: 60 6B

This is right after the last byte of existing code. It is a RTS followed by a RTL. Most of the work is done in the new bank, where we presumably have a lot of space. Our new code will jump to these two bytes in the destination bank, after pushing the right values onto the stack.

Now, how to fake a short jump.

Write your new ASM code, and use this macro wherever you would do a JSR if you were in the same bank.
Code:
_long_short_jump_c1 .MACRO c1addr
phk
per #9
pea #$FFE5
pea c1addr-1
jmp >$c1ffe5
.ENDM

(Look up these opcodes if you want to edit hexes by hand.)

Anyway, to go through the logic step by step:

phk: push the current program bank onto the stack. Then...
per #9: push onto the stack the current value of the program counter, plus a constant value of 9.

With the above, you push onto the stack exactly what would be pushed onto the stack if you did a long JSR (aka JSL) from this part of the code. These values will be pulled when an RTL executes, making the cpu return here.

The +9 is important. That sets the address in the stack to the last byte of the last instruction that is a part of this macro. When we do a RTL, and pull these values from the stack, the program counter will be incremented by 1 before reading the next instruction. This means we will resume reading code right after this macro.

pea #$FFE5: push the absolute 16-bit address in C1 where we wrote those two new bytes. (Obviously, use a different value if you wrote them to a different local address.)

This pushes the same thing that a 16-bit JSR command would push, if it were called from $C1FFE5. So, we're faking that we've made a short jump from our two-byte access function, to somewhere else.

pea c1addr-1: The c1addr-1 is a variable in the macro, not a literal value. What you literally want to push to the stack is the 16-bit address of your destination subroutine, minus 1. The minus 1 is important, unless you want to start on the second byte of the destination subroutine (you don't).
jmp >$c1ffe5: Finally we do long jump to the entry-point in C1. Note this is a regular JMP, and not a JSR. That means it does not in itself push anything to the stack. It is not designed to be paired with a return statement.

OK, so we jump to $c1ffe5. The first thing we'll read is a RTS. This is a (short-address) return from subroutine. But, counter to its name, we're using it to jump to the destination subroutine. Got it?

The top values on the stack are the local address of the destination subroutine. We pushed those there ourselves. So, we "return" to our destination code.

It doesn't matter that we never actually "left" from that location, nor that there is not actually a JSR statement there. Nothing stops us from using a "return" statement in the opposite way intended.

So then, the subroutine in C1 will execute as normal. At the end it will call its own RTS. This will return to $FFE5, because those bytes are now on the top of the stack (after we set them there manually). We'll then execute the second opcode we wrote, at $FFE6: the RTL. The top of the stack is now the appropriate 24-bit address in our new block of code (again, we wrote it ourselves in our macro). So, we'll return to our new code, and continue executing as normal.

I've tested this by copying the NMI routine used to update graphics in battle (originally found at C1/0BAA) to a new bank. This subroutine happens to call a ton of other C1 subroutines, so this technique is very necessary. I made the necessary replacements to the JSR statements, and changed nothing else. Then I redirected the NMI to my new code (the address is set in C2).

It works like a charm. The game behaves exactly as normal, which it should, since the code is essentially identical, but in a new location.

Now that I have a proof of concept, I can change the code as I see fit without any significant limitations on space, and still interface with the existing code.

Theoretically, these extra pushes and jumps entail some overhead CPU cycles. I don't know if this will be a problem. There is no significant slowdown in my snes9x emulator. Since the NMI function is called every frame, and it uses a lot of in-bank jumps, this would be one of the "worst" functions to move to a new bank. But I don't notice any issues. Will it be a problem in another emulator, or on a slower computer, or with real hardware? I don't think so, but I can't say.

So anyway, this technique is now known, if it wasn't before. I hope it helps some people with their asm hacks.
  Find
Quote  
[-] The following 1 user says Thank You to Eggers for this post:
  • C-Dude (07-01-2023)



Messages In This Thread
asm hacking: "faking" short jumps from your code (in a new bank) to old code - by Eggers - 07-27-2013, 05:45 PM

Forum Jump:

Users browsing this thread: 1 Guest(s)


Theme by Madsiur2017Custom Graphics by JamesWhite