# Zig compiler/bootstrapping/stage2 notes These notes pertain to commit `57ac835a0` from the zig repository on github note: this commit might be out of date while you are reading this, however, I had to pick *something* as a point of reference ## The bootstrapping process, road to self-hosted/stage2 Most of this can be found in the zig compiler internals talk: (https://www.youtube.com/watch?v=8MbREuiLQrM (time range `9m41s to 13m46s`) (non-yt video host link pending) Since zig already has its own numbering system to refer to "stages of bootstrapping" , here I will use letters to designate the various levels of bootstrapped-ness of a zig compiler (referred to here as "states"). ### state A: zig compiler source #### what we have all we have is the zig source tree (no zig compiler) #### getting to the next state * compile `src/stage1/*.cpp` into object code * **TODO**: interesting sub-details of that process (hint: start by looking at what CMake is doing ) * mush the object code together into `libzigstage1.a` (any other outputs?) * **TODO**: on linux, show output of ldd on libzigstage1.a, compare with cpp source files) * **TODO**: ditto, but for other OSen ### state B: stage 1 static library #### what we have we have `libzigstage1.a` #### getting to the next state * use `stage1.cpp`, which depends on `libzigstage1.a`, to compile a `zig0.exe` binary (note: extension might not actually be .exe, just using that to indiciate an executable) * show ldd(or equivalent) output for zig0.exe * **TODO**: what are the limitations of zig0.exe? ### state C: ("stage 0") #### what we have we have zig0.exe #### getting to the next state 1. use zig0.exe to compile the completed development efforts contained in `src/*.zig` files (more on what "completed development efforts" means to come shortly). notable among this zig source is `stage1.zig` * **TODO**: explain significance of `stage1.zig` 2. link this output from `zig0.exe` together with the previously built `libzigstage1.a` to build `zig.exe` (stage1 compiler) ### state D: stage1 #### what we have we have a `zig.exe` (called the *stage1* compiler). Note: it still depends on `libzigstage1.a` built previously ### Interjection The terms "self hosted compiler" and "stage2" are often used to refer to the desired outcome of the next state transitions. The main goals and defining aspects of these subsequent transitions are: * a new pure zig codegen (*backend*), with llvm codegen made optional * This backend exists already, currently invokable with `-fno-LLVM`. It is a work in progress, meaning: * It can't yet build itself * It is not stable * It is not sufficiently feature-ful to properly compile existing code that the stage1 LLVM backend can * **TODO**: explain some rationale behind having a no-llvm backend * what's the deal with parse_f128 & gnu extensions? * **TODO**: explain the zig `compiler-rt` biz and its relation to the eponymous LLVM library, weak linking etc * build another compiler, which: * does not have any link dependency on the stage1 code (`libzigstage1.a`) * Much of this work is completed. This is the substance of `src/*.zig` source. * can use the aforementioned zig backend as a default (by then it should be sufficiently stable and featureful to do so) * See this PR: [move `zig cc`, `zig translate-c`, `zig libc`, main(), and linking from stage1 to stage2](https://github.com/ziglang/zig/pull/6250) * build an optional llvm backend tailored to use by the stage2 compiler * **TODO**: what does this mean? Why can't the existing LLVM backend just be used by stage2 here? * **TODO**: elaborate on the rationale * it is expected that the no-llvm backend will not reach performance parity with LLVM for a long time (if ever?) * (bonus step) replace/simplify the `stage1` code used to bootstrap earlier, making use of a stage2 compiler to emit replacement code in c. * **TODO**: elaborate on present state of affairs. * If we stick with an LLVM backend, how much of `libzigstage1.a` can be eliminated from compilers built using stage1 currently? How complete is the stage2 code? * **TODO**: explain- in which places is LLVM currently used in the stage1 compiler? * is it done from `libzigstage1.a`? (yes, see `src/stage1/codegen.cpp` for instance)` * but what's the deal with `src/zig_llvm.cpp`? * **TODO**: link to more of the issues in the issue tracker that give some finer details of the current state of affairs * **TODO**: what other goals are prerequisites of reaching "self hosted compiler"/"stage2"`? * Some misc issues from the repo pertaining to the topic: * [The Grand Bootstrapping Plan](https://github.com/ziglang/zig/issues/853) * [implement the self-hosted LLVM backend](https://github.com/ziglang/zig/issues/6541) * [implement self-hosted CPU model and features detection](https://github.com/ziglang/zig/issues/4591) * [self-hosted compiler: passing behavior tests, std lib tests, and no longer relying on stage1 backend](https://github.com/ziglang/zig/issues/89) * [consolidate libc and libssp into compiler-rt](https://github.com/ziglang/zig/issues/7265) **Here is an imagined set of next states/steps on the way to stage2..** I have here split the process of replacing `libzigtage1.a` up into two steps, though this may not reflect how it is done in reality ### state D (stage 1, cont'd) #### getting to the next state * build new `zig.exe` (which does not depend on `libzigstage1.a` code (except for the LLVM backend), from an existing stage1 compiler * This is still considered a "stage1" compiler due to lack of finished non-llvm backend. * **TODO**: is the last statement of the above accurate? ### state E: zig compiler sans-libzigstage1.a #### what we have * we have a `zig.exe` with no direct dependency on anything produced using `src/stage1/*.cpp` (except maybe for LLVM codegen) * **TODO**: explain how zig uses LLVM in this case. Is it `zig_llvm.cpp`? #### getting to the next state * use no-LLVM backend to generate another `zig.exe` (called "stage2") * TODO: is that correct? will the stage1 compiler use the no-LLVM backend like that, in some way? ### state F: stage2 #### what we have * we have a stage2 `zig.exe`, built from a stage1 `zig.exe` (which in turn could have been built from a different stage1 compiler, or not), but in any case this time it was done without the LLVM backend (maybe, see above TODO). * it does not depend at all on `libzigstage1.a` or other stage1 artifiacts #### getting to the next state * compile another compiler with the stage2 compiler, using nothing created directly with cpp or llvm ### state G: stage3 #### what we have * we have a stage3 `zig.exe`, built from stage2 with no stage1 dependencies. * if,using this compiler, you build the same source code that was used to build this compiler with the no-llvm backend, you get *replica* of this compiler binary back as output (if targeting the same environment as the host). #### getting to the next state * replace the cpp bootstrap code from states A and B with simpler, non-llvm c code via the c-source backend of a zig stage2/stage3 compiler * At first, the source will be written in zig and transpiled to c, but very quickly the c output will be tailored to be maintained on its own, without zig. * issue: [further reduce bootstrapping dependencies by making stage1 output C rather than LLVM IR](https://github.com/ziglang/zig/issues/5246) * **TODO**: talk about some of the challenges involved in doing so: * getting hardware details without a working zig compiler? * other? ### Done! we have a full self hosted compiler with minimal bootstrap dependencies #### Some overall questions still remaining * which parts are mostly working and being shipped right now? (tag with dates) * which parts aren't being shipped yet, but are ready to be? * which parts still aren't working/implemented at all, or are still in a very preliminary state? * TODO: answer these ## Glossary * "self hosted compiler" can mean a variety of things depending on context * most generally: "the code in `src/*.zig` files * some of this is already being used in the `zig.exe` emitted by `zig0.exe` * can mean "the things that we do with the llvm backend yet that we can't yet do with the pure zig backend * such as? (**TODO**) * Stage2 * **TODO**: explanation * zir * **TODO**: explanation ## Some source-, and command- level details ### Overview of of some stage2 source files #### `astgen.zig` * converts AST to untyped zir (`zir.zig`) from the abstract syntax tree * hint: The abstract syntax tree is generated in the stdlib under `std.zig` #### `zir_sema.zig` * analyze zir * do comptime stuff * ascribe types to the ir * emit: * typed ir (ir.zig) * typedvalues * type.zig * value.zig * TypedValue.zig #### `codegen.zig` * convert typed ir into machine code via backends in `codegen/*.zig` #### `Cache.zig` * zig cache stuff * **TODO**: elaborate #### `link.zig` * link the code generated by codegen.zig (or any code really) * **TODO**: explain- is this a full-on linker or a mere frontend to other linker tools? * **TODO**: explain- if the latter of the above, talk about defaults/linker selection options and considerations involved #### `Module.zig` * declares some structs used throughout various stages of the compiler (`Scope`, `Decl`) * coordinates invocations of various parts of the compiler pipeline * creating the AST * invoking `astgen.zig` * dispatching `zir_sema.zig` to analyze the ir * hint: Module.zig is tightly coupled with `zir_sema.zig`. they used to be the same until `0965724e31666d` * could be considered the backbone of the selfhosted compiler #### `Compilation.zig` * deals with all the busy work to put everything together. * **TODO** elaborate (hint: inspect the main loop of function performAllTheWork) #### `main.zig` * parse cli options and pass control to Compilation.zig ### building libzigstage1.a with the current cpp implementation * zig parser (`src/stage1/parse.cpp`) * **TODO**: details * ast node generation (???.cpp) * **TODO**: details * unanalyzed zir generator `src/stage1/???.cpp : function abcxyz` * **TODO**: details *`src/stage1/ir.cpp`: ir_analyze (line 32162) seems to be the entry point * **TODO**: details * **TODO**: explain what will be replaced by zig, what is implemented in that regard, and what still remains ## Future directions of the zig compiler ### Short term * Finish self-hosted ### Medium term * sort out a fancier type system? ### Long term * replace large c/cpp/rust codebases * take over the world