# Simplification of early phases of the CakeML compiler
This is a roadmap to simplification of the CakeML compiler's early phases.
## Motivation
In its current form:
- The CakeML compiler reserves a slot in the globals array for every top-level bound name, even though many of the names won't be used past `flat_elim` and `clos_known`. This is an inefficiency.
- The `clos_known` compiler pass is complicated and trying to rediscover information (which closure is called through which global) that was lost earlier in the compilation chain. It also performs inlining but due to its position in the chain of passes it cannot inline any expression that contains a closure.
- The FlatLang and ClosLang ILs include a complicated `Letrec` construct. Proofs about these languages would become simpler if we were able to remove `Letrec` from FlatLang and ClosLang.
- The `source_to_flat` pass loses too much program structure. At present, if a user writes `local fun f n = ... in fun g k = ... end` the output of `source_to_flat` makes `f` and `g` seem equally globally available in its output. This hampers future optimisations.
## New design
The new design will address all of the points above. Early phases of the compiler will be:
1. source-to-source passes:
- dead code elimination
- perform inlining (mandatory for simple functions such as `+`, but optional for recursive functions such as `map`)
- lift constants to top-level declarations
- lambda lifting, in particular all letrecs are listed to the declaration level
2. the source-to-flat pass modified so that:
- performs the duties of the current `clos_call`
- allocates global slots _and code table slots_
- all declaration-level closures are turned into code table entries, the others are put into globals
- intead of outputting a list of declarations, this pass should generate strucutred output that, via its shape, indicates which functions can refer by name to which other functions
- the list of top-level declarations will be packaged into a main function in the output and that main function will be tagged as execute-once (so that clos_interp still works)
3. FlatLang:
- there will be a code table (like currently in ClosLang)
- there won't be any declarations
- there won't be any `Letrec`
- `flat_elim` will be removed
4. ClosLang:
- there won't be any `Letrec`
- `clos_known` will be removed
- `clos_call` will be removed
5. The rest of the compiler is unchanged, except the structured output of `source_to_flat` should ideally be propagated through the entire compiler.
## Roadmap
Note that the none of the proposed changed require changes to the source semantics.
New source-to-source passes needed:
- dead code elimination at source level
- inlining of simple functions such as `+`
- lambda lifting (particularly if we want to get rid of `Letrec` in FlatLang and ClosLang)
- lifting of constants (note that this can do the actual lifting of lambdas for the lambda lifting pass)
Changes to FlatLang:
- addition of code table (like in ClosLang)
- removal of top-level declarations, instead semantics will start execution more like in BVL, i.e. by calling a function
Changes to ClosLang:
- top-level expressions to be removed, instead semantics will start execution more like in BVL, i.e. by calling a function