Users browsing this thread: 1 Guest(s)
Expanded palette hack
11-28-2019, 12:26 AM
(This post was last modified: 11-28-2019, 08:56 AM by assassin.
Edit Reason: idea for another section; then 1st section's code didn't work; then had "STA $nnC0,X"s backwards. that's like 9 edits; abandoning thread for 24+ hrs!
)
as for Square code, there's a loop that includes two sequences of "ASL / ROR $10" to reverse the bit order in bytes. what i have now tweaks the ends, saving 3 * 2 = 6 cycles per loop iteration (of 64).
but it just dawned on me: we're reversing two bytes back-to-back, so intersperse the fuckers!
CPU cycles:
asl a = 2
ror d = 5
lda d = 3
---
rol a = 2
sta d = 3
----------
that'll save (7 * 8) - 5 + 3 = 54 cycles per loop iteration (of 64).
EDIT 4: switched back to using Var $10 as opposed to XBA-pair initial method, as it's faster.
===============
elsewhere and unrelated:
if we bring down C1_3E37 thru C1_3E47 to after the added "REP #$30", we can word-ize the loop. it'll add 3 instructions taking 4 bytes (and clobber Variable $11, which we're no longer using and was just a scratch variable anyway), but we cut the iterations in half.
inserting an "STA $1A" (or STX?) at C1_3E2B will let us zap 4 instructions of the new code. we can do something similar with Y and $1C, but we'll need an "STZ $1D", so we're saving 2 bytes and negligible cycles at most. it's a bit odd to relocate the $1A store but not the $1C, yet if speed is paramount...
but it just dawned on me: we're reversing two bytes back-to-back, so intersperse the fuckers!
Code:
LDA $03C0,X
STA $10
LDA $10C0,X
; do next 2 instructions 8 times
ROL
ROR $10
ROL
STA $03C0,X
LDA $10
STA $10C0,X
CPU cycles:
asl a = 2
ror d = 5
lda d = 3
---
rol a = 2
sta d = 3
----------
that'll save (7 * 8) - 5 + 3 = 54 cycles per loop iteration (of 64).
EDIT 4: switched back to using Var $10 as opposed to XBA-pair initial method, as it's faster.
===============
elsewhere and unrelated:
if we bring down C1_3E37 thru C1_3E47 to after the added "REP #$30", we can word-ize the loop. it'll add 3 instructions taking 4 bytes (and clobber Variable $11, which we're no longer using and was just a scratch variable anyway), but we cut the iterations in half.
inserting an "STA $1A" (or STX?) at C1_3E2B will let us zap 4 instructions of the new code. we can do something similar with Y and $1C, but we'll need an "STZ $1D", so we're saving 2 bytes and negligible cycles at most. it's a bit odd to relocate the $1A store but not the $1C, yet if speed is paramount...
The following 2 users say Thank You to assassin for this post:
• C-Dude (11-28-2019), Turbotastic (11-28-2019)
• C-Dude (11-28-2019), Turbotastic (11-28-2019)
« Next Oldest | Next Newest »
|
||||
Users browsing this thread: 1 Guest(s)