Expanded palette hack

Users browsing this thread: 1 Guest(s)

#11

11-28-2019, 12:26 AM (This post was last modified: 11-28-2019, 08:56 AM by assassin. Edit Reason: idea for another section; then 1st section's code didn't work; then had "STA $nnC0,X"s backwards. that's like 9 edits; abandoning thread for 24+ hrs! )

assassin

Figaro Guard

Posts: 200
Threads: 1
Thanks Received: 10
Thanks Given: 0
Joined: Oct 2015
Reputation: 18

Status

None

as for Square code, there's a loop that includes two sequences of "ASL / ROR $10" to reverse the bit order in bytes. what i have now tweaks the ends, saving 3 * 2 = 6 cycles per loop iteration (of 64).

but it just dawned on me: we're reversing two bytes back-to-back, so intersperse the fuckers!

Code:
LDA $03C0,X

STA $10

LDA $10C0,X

; do next 2 instructions 8 times

ROL

ROR $10

ROL

STA $03C0,X

LDA $10

STA $10C0,X

CPU cycles:

asl a = 2
ror d = 5

lda d = 3

---

rol a = 2
sta d = 3

----------

that'll save (7 * 8) - 5 + 3 = 54 cycles per loop iteration (of 64).

EDIT 4: switched back to using Var $10 as opposed to XBA-pair initial method, as it's faster.

===============

elsewhere and unrelated:

if we bring down C1_3E37 thru C1_3E47 to after the added "REP #$30", we can word-ize the loop. it'll add 3 instructions taking 4 bytes (and clobber Variable $11, which we're no longer using and was just a scratch variable anyway), but we cut the iterations in half.

inserting an "STA $1A" (or STX?) at C1_3E2B will let us zap 4 instructions of the new code. we can do something similar with Y and $1C, but we'll need an "STZ $1D", so we're saving 2 bytes and negligible cycles at most. it's a bit odd to relocate the $1A store but not the $1C, yet if speed is paramount...

Website Find

Thank You

Quote