Users browsing this thread: 1 Guest(s)
Expanded palette hack

#11
Posts: 200
Threads: 1
Thanks Received: 10
Thanks Given: 0
Joined: Oct 2015
Reputation: 18
Status
None
as for Square code, there's a loop that includes two sequences of "ASL / ROR $10" to reverse the bit order in bytes.  what i have now tweaks the ends, saving 3 * 2 = 6  cycles per loop iteration (of 64).

but it just dawned on me: we're reversing two bytes back-to-back, so intersperse the fuckers!

Code:
LDA $03C0,X
STA $10
LDA $10C0,X

; do next 2 instructions 8 times
ROL
ROR $10

ROL
STA $03C0,X
LDA $10
STA $10C0,X

CPU cycles:

asl a = 2
ror d = 5

lda d = 3

---

rol a = 2
sta d = 3

----------

that'll save (7 * 8) - 5 + 3 = 54 cycles per loop iteration (of 64).

EDIT 4: switched back to using Var $10 as opposed to XBA-pair initial method, as it's faster.

===============

elsewhere and unrelated:

if we bring down C1_3E37 thru C1_3E47 to after the added "REP #$30", we can word-ize the loop.  it'll add 3 instructions taking 4 bytes (and clobber Variable $11, which we're no longer using and was just a scratch variable anyway), but we cut the iterations in half.

inserting an "STA $1A" (or STX?) at C1_3E2B will let us zap 4 instructions of the new code.  we can do something similar with Y and $1C, but we'll need an "STZ $1D", so we're saving 2 bytes and negligible cycles at most.  it's a bit odd to relocate the $1A store but not the $1C, yet if speed is paramount...
Quote  
[-] The following 2 users say Thank You to assassin for this post:
  • C-Dude (11-28-2019), Turbotastic (11-28-2019)

#12
Posts: 377
Threads: 34
Thanks Received: 10
Thanks Given: 7
Joined: Dec 2018
Reputation: 18
Status
Moog
(11-28-2019, 12:26 AM)assassin Wrote: inserting an "STA $1A" (or STX?) at C1_3E2B will let us zap 4 instructions of the new code.  we can do something similar with Y and $1C, but we'll need an "STZ $1D", so we're saving 2 bytes and negligible cycles at most.  it's a bit odd to relocate the $1A store but not the $1C, yet if speed is paramount...
The way you described this, it sounds like this optimization would be beneficial even if the palette hack isn't in place.
Am I understanding you correctly? This reduces the processing time any time a byte needs to have its bits reversed?
  Find
Quote  

#13
Posts: 200
Threads: 1
Thanks Received: 10
Thanks Given: 0
Joined: Oct 2015
Reputation: 18
Status
None
C-Dude Wrote:Am I understanding you correctly? This reduces the processing time any time a byte needs to have its bits reversed?
yes, provided there's a second byte that needs its bits reversed at the same time.  this optimization intersperses the two reversals; each shift or rotate now does double duty.

Quote:The way you described this, it sounds like this optimization would be beneficial even if the palette hack isn't in place.
to be sure, the optimization you describe, not the one you quoted.  the latter was in the "elsewhere and unrelated" section of my post. Tongue
Quote  
[-] The following 1 user says Thank You to assassin for this post:
  • C-Dude (11-30-2019)

#14
Posts: 377
Threads: 34
Thanks Received: 10
Thanks Given: 7
Joined: Dec 2018
Reputation: 18
Status
Moog
(11-29-2019, 11:01 PM)assassin Wrote:
C-Dude Wrote:Am I understanding you correctly? This reduces the processing time any time a byte needs to have its bits reversed?
yes, provided there's a second byte that needs its bits reversed at the same time.  this optimization intersperses the two reversals; each shift or rotate now does double duty.

Quote:The way you described this, it sounds like this optimization would be beneficial even if the palette hack isn't in place.
to be sure, the optimization you describe, not the one you quoted.  the latter was in the "elsewhere and unrelated" section of my post. Tongue

Oh, I'm sorry about that! I read your post the other day, but didn't get around to responding.  Your last paragraph originally talked about saving cycles, I thought I was quoting that when I responded.  I meant to quote "that'll save (7 * 8) - 5 + 3 = 54 cycles per loop iteration (of 64)."
  Find
Quote  

#15
Posts: 2,548
Threads: 98
Thanks Received: 147
Thanks Given: 156
Joined: Aug 2009
Reputation: 52
Status
Nattak\'d
I got the As65 assembler up and working, but upon assembling I'm getting a whole bunch of "Illegal addressing mode" errors on many different lines, even using Eggers' provided ASM file. Your (assassin) optimized version has less of these errors, but of course are still present. I'm not sure what the assembler has a problem with, or how Eggers didn't have them happen to him. Trying out one of the test ASM provided with the assembler download, they work fine and spit out the .obj file as described, but not Eggers' ASM File...

xkas and asar sadly have the same amount of errors basically going on. I'm not sure what could be done if the recommended assembler for the patch doesn't agree with the included ASM file. ¯\_(ツ)_/¯

I could list all the lines the errors are on and hope someone can figure it out, but that's about it!

Here's a pic of errors from assembling Eggers ASM file:
https://imgur.com/lUqZNO2


We are born, live, die and then do the same thing over again.
Quote  
[-] The following 1 user says Thank You to Gi Nattak for this post:
  • assassin (12-02-2019)

#16
Posts: 200
Threads: 1
Thanks Received: 10
Thanks Given: 0
Joined: Oct 2015
Reputation: 18
Status
None
thanks.  haha; that's ridiculous.  apparently, AS65 won't understand parameter-less "ASL" or "LSR" (nor "ROR" or "ROL"), and wants them turned into explicit "ASL A" and "LSR A".  my guess is that plain "INC" and "DEC" have the same affliction, but Eggers' code doesn't use any of those.

EDIT: dunno the text editor situation of the various people reading/posting in this thread, but some preemptive help:
https://superuser.com/questions/193929/n...ers/193933
https://synapse.wordpress.com/2012/02/01...n-windows/

i was dismayed to see that Windows 7 64-bit no longer includes the handy DOS EDIT, which i normally use for line number error codes.  then i was more dismayed to learn that Word 2007 doesn't let you show line numbers *and* turn off Word Wrap in the same document view.  you can still Ctrl+G to goto Lines in the unwrapped (Outline, Draft) views, but the destination will still be based on the wrapped count.  yeah, using a full-on Word processor to edit an .asm file is silly overkill to begin with, but i was desperate!

so those two Notepad links (especially the 2nd) were lifesavers!

eventually, i'll have to download an advanced Notepad alternative (something i've been vowing for over a decade, when hearing that MetaPad was better, but i'm reaallly lazy).  but holy shit; the hoop-jumping required just to see such simple information in a document.  Microsoft is insane.
Quote  

#17
Posts: 200
Threads: 1
Thanks Received: 10
Thanks Given: 0
Joined: Oct 2015
Reputation: 18
Status
None
(11-26-2019, 02:44 AM)assassin Wrote: what i'm focused on going forward are two things in the section from "lbg_replace_done" to "lbg_done_replacement_colors":

1. as you can see, i'm mulling optimizations from "EOR / STA" to TSB and to TRB.  it looks like i put each of those in the right spot based off his comments, but based on his actual code (there and at "lbg_find_unused_color" and "lbg_find_expanded_color"), i think i've got the instructions switched.  thoughts?

did so.  my TSB and TRB were in the right place, because Eggers' comments were correct.  (weird how i'd been reasonably confident otherwise.  and i'm not talking about those coding sessions where i've been up too long and the brain is running on fumes; it's a given that those produce failure often.  i mean where i'm plenty alert, looking right at the code, and somehow conclude the wrong thing.  [see also: two drafts of my bit reversal optimization, where i stored results to the wrong bytes.])

Quote:2.
Code:
JMP lbg_find_unused_color_pre        ; Return to the beginning of the loop

i half-think he wanted to do "JMP lbg_find_unused_color" there, based on:
a. the nature of most loops (e.g. not re-initializing an index to its starting value)
b. the way he preserves X right before looping
c. how he never branches to "[label_name]_pre*" -titled labels from below.  they're for loop setups as opposed to loop contents, apparently.

if i'm right, changing this one line could save as much time as my other dozens of optimizations put together!

thus, more scrutiny on "lbg_calculate_replacement_colors" through "lbg_done_replacement_colors" would be much appreciated.

yeah, i've since gotten confident enough to incorporate this one.  though instead of a jump to "lbg_find_unused_color" , i added a new label shortly after that to avoid a pointless read.

note that this 2nd optimization switches the order of the blocks discussed in #1, for space/speed reasons -- not because anything was wrong with them.

i'm considering further optimizing this section by also picking up where we left off in the inner, "lbg_find_expanded_color_pre" loop.  probably by writing X to Variable $10, which is temporarily free.  admittedly, there's less urgency to this tweak, because expanded palettes have at most 4 slots that are pointlessly repeated currently, versus up to 12 for the non-expanded ones.

both the existing optimization and the considered one work on the knowledge that:
- unused main palette entries and used expanded ones are each claimed first-come, first-serve, in ascending order.
- code can mark main entries as used and expanded as unused, but never the reverse.  so lower-numbered entries won't suddenly become candidates for a switch midway through.

the two combined mean that if we're replacing Expanded Palette M with Main Palette N, we'll never need to go back and look at Expanded < M or Main < N.

Quote:if i'm right, changing this one line could save as much time as my other dozens of optimizations put together!

probably an overstatement, in retrospect, but it should still be a significant help.
Quote  



Forum Jump:

Users browsing this thread: 1 Guest(s)


Theme by Madsiur2017Custom Graphics by JamesWhite