{%hackmd theme-dark %} # LucidDreams - LLVM IR Transpiler for DreamMaker Binaries (DMB) ###### tags: `Lucid Dreams` `Development` ## Goals * Translate .DMB files to LLVM IR libraries for transpilation into any LLVM supported language. * Design a runtime API for consuming the resulting library * Implement a basic runtime to use for testing features/as reference implementation. ## Design Principles/Guidelines LLVM IR represents a "breaking out of the sandbox" for DreamMaker code, giving it a huge amount of potential flexibility elsewhere in the programming ecosystem, and really graduating it into the territory of a real language. To that end, we have to rethink how DM code is approached. * DMB's structure as a library file that the runtime executes **must** be preserved. * RSC files, DMB embedded or otherwise, will not be manipulated, and instead will be left to the runtime implementer to parse and load into their environment. * The primary function of Lucid, the transpiler, will be to convert the basic structure of DMB assembly/bytecode to LLVM IR, and provide the instructions, a procedure specification, and lookup tables necessary to drive a runtime. * No resources will be provided with the Lucid output library; instead we'll require runtimes to load and interop with their resources directly, and do so when prompted by the library. ## Architectural Overview Fundamentally broken down into three parts: * Typedefs & Variable Declarations * Object/Procedure Declaration & Invocation * Clojures & Scoping [This document describes the broad strokes of the DMB format](https://github.com/20kdc/byond-data-docs/blob/master/formats/DMB.md) There's some gaps in the format document, and part of the work needed to complete the third goal will be figuring out how to interpret those values. For now we can simply copy the data over into the IR's data section(?). [This document describes a crash course in LLVM IR mapping from higher languages.](https://buildmedia.readthedocs.org/media/pdf/mapping-high-level-constructs-to-llvm-ir/latest/mapping-high-level-constructs-to-llvm-ir.pdf) #### Notes Single-Static Assignment is a constraint we'll need to code around. It's gonna be a pain in the butt. ### Typedefs and Variable Declarations #### Notes Looks like the best way to go about making typedefs is a library of IR structs that represent the basic types, and lets us pack whatever we want as runtime metadata into that structure. Might be superfluous but I'd wager a guess the LLVM IR compiler/optimizer will optimize those out in the event it's just a StringID. SSA is a bitch here; There's a lot of variable redeclaration that happens in DM in general so we'll need some well defined strategies for implementing things like PHI nodes and generator strategies for having multiple temporary variables without ballooning our memory usage. I'm not looking forward to this bit. _EDIT:_ See [this section of the LLVM IR tutorial for an example on using `alloca` to drive mutable typed variables.](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl07.html) Some notable flags we'll want to run this with during the optimization step include `opt -mem2reg` which promotes `alloca`s to SSA registers, and handles Phi node creation. Couple basic unit tests should ensure this holds true in the future. To quote the LLVM IR Tutorial Chapter 8, "Simple arrays are very easy and are quite useful for many different applications. Adding them is mostly an exercise in learning how the LLVM [getelementptr](https://llvm.org/docs/LangRef.html#getelementptr-instruction) instruction works: it is so nifty/unconventional, [it has its own FAQ](https://llvm.org/docs/GetElementPtr.html)!" This is basically lists. [Garbage Collection is a first class feature in LLVM IR!](https://llvm.org/docs/GarbageCollection.html) Thank fuck! Oh wait, no, "Note that LLVM does not itself provide a garbage collector — this should be part of your language’s runtime library." Double fuck. I'm wagering a guess here whatever we create will be the standard for awhile, so consider this another major milestone feature. LLVM does provide the hooks into the compiled application to provide runtime GC so there's the majority of the heavy lifting done. Once we replicate the generation of static definition tables like global procs, variables, and resources, I think this'll get easier, but I don't know enough about the format at the moment, the reverse engineering documents are dense with version-based qualifiers and it's making reading through and getting a full picture difficult. Probably will write an accompanying document that has all that simplified to represent the most recent version. Oh, joy, LLVM IR doesn't actually have any concept of strings and the like so we're going to have to write a DMB string compliant string library alongside the existing transpiling work. This is gonna be the place where we spend an inordinate amount of time circling back. ### Object/Procedure Declaration & Invocation #### Notes I'm not thrilled with the prospect of having to magick into existence a set of parameters from every proc definition and get the defaults out of it, but it has to happen. Luckily in this way DM is a lot similar to LLVM so I won't gripe too hard other than to say _fuck having SSA_ here in the case where argument variables are redefined during proc execution. Lord knows I'm guilty of doing this in DM so I can only imagine it happens elsewhere. So there's this quirk that DM has that I fucking hate: `set name "foo"` run in multiple places will actually _overwrite_ the proc definition in the table. I really hope that by the time we've got the DMB this has been compiled out otherwise, we'll _yet again_ be fighting SSA to get around this. I understand why it exists but I still fucking hate it. `. = ..()` is that one little thing that's going to be mildly annoying. In DMB this is just a `CALLPARENT` opcode and some `SETVAL src` instructions etc. To my knowledge inheritance is squarely in the realm of higher level languages, so having some concept in the table of parent functions is gonna be necessary. I'm like 80% sure we could optimize this by trusting the DM compiler somewhat that root procs won't have a parent, and thus any call to a proc could be a `proc_table[procid-1]` macro and be done with it. ### Clojures and Scoping #### Notes I am both thrilled and terrified at this prospect. I'm thinking 90% of this will be out of scope thankfully (haha) by the time it ends up with us and it's just going to be basic transpilation and just extracting local references/global references back to their values in the appropriate tables/compilation contexts. But in the event that's not the case: _Fuck._ ## BYOND Interop API ### Runtime Interface Basically the first goal that doesn't directly involve LLVM IR generation is creating a C Header that defines the functions that the Runtime is expected to provide. #### Notes Root Inheritance tree looks something like: ``` datum | atom / / \ \ area turf obj mob ``` Notable exceptions include `/world`, `/client`, `/list`, and `/savefile`. Creating the API provided by the runtime needs to emulate this structure. There's basic class lifecycle methods (`New`, `Del`), stream operators (`Write`, `Read`) and the `Topic` proc at the top level in `datum` In writing this I'm noticing there's going to be a need to do some careful pre-translation analysis of what procs are associated with what types, such that we get sufficient information to generate the lookup table that plays well with the path system BYOND has. In looking through the proc list, I'm about 80% sure `sleep` is codeword for a generator. I'm not thrilled about this. In LLVM IR's documentation linked above there's a pattern for creating a generator using a lot of labels. I'm thinking the runtime API for this is probably going to be the most complex and need the most work in design; the precident set here for sleep will determine largely how well the runtime provides 1:1 behavior to normal BYOND at first. It's possible we could have a compatability mode flag that compiles with different IR functions provided to run `sleep` in the event we want to change the behavior in the future. _EDIT:_ Looks like what I need is called [Continuation-passing Style](https://en.wikipedia.org/wiki/Continuation-passing_style)