Guidence on code memory usage

jaymzjulian · March 10, 2020, 01:44:53 AM

EDIT: safely ignore this one if you like, since I managed to make my optimizer way better and get it down to about 2000 lines, which is fine

So, let me preface this with that I'm trying to do something stupid - specifically, i wrote a static recompiler from 6502 to gsbasic, because I wanted to convert some arcade games, and this seemed like a sensible approach (if you've seen the stuff at https://norbertkehrer.github.io/ast_js.html , basically the same sort of thing, but applied to gsbasic).

So this actually works on a bunch of test cases, but if I try and do all of asteroids, well, of course the vectrex32 throws an out of memory and reboots - which kind of makes sense, since it's 7395 lines of basic code. But I'm still somewhat thinking I can optimize my way out of this, which leads me to my question:

When loading basic code, _what_ takes ram vs what doesn't. What I mean by that:
* if i make my variable names smaller, does that save ram, or are they tokenized out?
* if i use a hex representation of numbers, does that help, since they are smaller, or are they tokenized out?
* are comments removed entirely, or do they count for the purposes of this?
* does one big function, vs a bunch of little functions, make a difference as far as you know?

I did some testing, and I can definetly load about 50% of the size, so hopefully some optimization will help me.... but i was kind of thinking in terms of where to focus that, if there was some "big win" that isn't obvious from the outside

Vectrex32 · March 10, 2020, 08:23:40 AM

Variable names are tokenized out (except for a lookup table).
Integers, whether written in decimal or hex, are compiled to 32 bits.
Comments are discarded.
Every function has some overhead, so I guess one big function would take less memory than many small ones.

Are you compiling every assembly instruction to one line of code? The biggest win would be if your recompiler could understand the intent of the assembly code and optimize the generated BASIC. But of course, that's really hard to do.

How do you handle jumps? GSBASIC doesn't have a GOTO and even if the 6502 code isn't spaghetti code, it's hard to figure out the structure.

- Bob

jaymzjulian · March 10, 2020, 08:18:04 PM

Quote from: Vectrex32 on March 10, 2020, 08:23:40 AM
Variable names are tokenized out (except for a lookup table).
Integers, whether written in decimal or hex, are compiled to 32 bits.
Comments are discarded.
Every function has some overhead, so I guess one big function would take less memory than many small ones.

Are you compiling every assembly instruction to one line of code? The biggest win would be if your recompiler could understand the intent of the assembly code and optimize the generated BASIC. But of course, that's really hard to do.

How do you handle jumps? GSBASIC doesn't have a GOTO and even if the 6502 code isn't spaghetti code, it's hard to figure out the structure.

- Bob

The eventual intention is to be better at it than I am - the current model seperates code into sections based on jumps, and puts each of those into a seperate function - so there's code that looks like:

if pc==target_a
func_target_a
elif pc==target_b
func_target_b
endif

There is a _little_ intelligence around loop detection, to turn obvious while loops into while loops (it detects dey/dex/bpl/bne pairs, essentially

), but to be honest this is all very v0.1 "can it even work?" stuff

- if it works out but is slow, next step will be to make an AST out of it, and run a proper optimizer/code generator across that. I have about 20% confidence that this will work out at all though (never did write a static recompiler before)! but hey, who knows....

Vectrex32 · March 10, 2020, 09:00:56 PM

If you're turning jumps into function calls, how do you know when to return from the functions?

jaymzjulian · March 10, 2020, 10:30:52 PM

Quote from: Vectrex32 on March 10, 2020, 09:00:56 PM
If you're turning jumps into function calls, how do you know when to return from the functions?

registers/stack/memory are just globals - it's not as smart as "turn them into functions that take aprams", but rather "subroutines that work on globals"

Vectrex32 · March 11, 2020, 07:45:46 AM

But you still should return at some point, otherwise you keep creating scopes (even if they're empty) and adding to the call stack.

Vectrex32