owned this note
owned this note
Published
Linked with GitHub
# Bootstrap staging cleanup
This PR documents the terminology of bootstrap stages and proposes a (mostly) unified approach for assigning stage numbers to individual bootstrap steps and built artifacts. It also proposes some changes to the bootstrap `Mode` enum.
First, some basic terminology:
- Stage 0 `rustc` - beta compiler + beta std[^beta]
- Links to beta std
- Rust programs built using stage 0 rustc *link to* beta std
- Stage 1 `rustc` - in-tree compiler sources built with stage 0 `rustc`
- Links to beta std
- Stage 1 `std` - in-tree library sources built with stage 1 `rustc`
- Rust programs built using stage 1 `rustc` *link to* stage 1 `std`
- Stage 2 `rustc` - in-tree compiler sources built with stage 1 `rustc`
- Links to stage 1 `std`
- Stage 2 `std` - in-tree library sources built with stage 2 `rustc`
[^beta]: Yes, it can be another rustc/cargo configured externally instead of beta, but that's not very important for this document.
### Design
This document assumes the "stage corresponds to what gets built, not what is used to build it" model. This is a consequence of the terminology introduced above. Note that this holds for everything except for std.
- std stage N is built with rustc stage N.
- Anything else stage N is built with rustc stage N - 1.
With this model, it does not seem to make sense to do anything with stage 0 *sources*. In other words, checking/building/testing stage 0 doesn't make much sense (apart from specialized use-cases like running compiletest on a "fake" stage0 compiler). Everything should start from stage 1, which should also be the default stage for most (all?) commands.
Ralf Jung helpfully described this with diagrams [here](https://github.com/rust-lang/rust/issues/142246).
Another interpretation of this numbering is that `N` corresponds to "it is the n-th local build". Stage 1 rustc is simply the first rustc built by bootstrap, stage 1 std is the first std built by bootstrap etc.
## Command matrix
The tables below show the current behavior of bootstrap, and the desired behavior that we want from it. The tables use shortcuts to be more compact. `rN` means `rustc stage N`, `sN` means `std stage N`.
### Compiler (`Mode::Rustc` in bootstrap)
| Command | `--stage` | Current | Desired |
|------------------|----------:|------------------------------|--------------------------|
| `check compiler` | - | Check compiler with `r0` | Same |
| `check compiler` | 0 | Check compiler with `r0` | Error |
| `check compiler` | 1 | Check compiler with `r1` | Check compiler with `r0` |
| `check compiler` | 2 | Check compiler with `r2` | Check compiler with `r1` |
| `build compiler` | - | Build compiler with `r0` | Same |
| `build compiler` | 0 | Build compiler with `r0` | Error |
| `build compiler` | 1 | Build compiler with `r0` | Same |
| `build compiler` | 2 | Build compiler with `r1` | Same |
| `test compiler` | - | Build `r1` and `s1`, test it | Same |
| `test compiler` | 0 | Error | Same |
| `test compiler` | 1 | Build `r1` and `s1`, test it | Same |
| `test compiler` | 2 | Build `r2` and `s2`, test it | Same |
### Library (`Mode::Std` in bootstrap)
| Command | `--stage` | Current | Desired |
|-----------------|----------:|------------------------------|-------------------------|
| `check library` | - | Check library with `r0` | Same |
| `check library` | 0 | Warning | Same |
| `check library` | 1 | Check library with `r1` | Same |
| `check library` | 2 | Check library with `r2` | Same |
| `build library` | - | Build library with `r0` | Same |
| `build library` | 0 | No-op | Error |
| `build library` | 1 | Build library with `r1` | Same |
| `build library` | 2 | Build library with `r2` | Same |
| `test library` | - | Build `r1` and `s1`, test it | Same |
| `test library` | 0 | Error | Same |
| `test library` | 1 | Build `r1` and `s1`, test it | Same |
| `test library` | 2 | Build `r2` and `s2`, test it | Same |
### Rustc tools (`Mode::ToolRustc` in bootstrap)
Tools that depend on compiler (rmeta) artifacts, like miri. In the steps below, when the compiler is checked with `rN`, then `<tool> stage (N+1)` links to the generated rmeta files.
I propose to rename this mode to `Mode::UsesRustcPrivate`, or something similar, to make it clearer what's going on.
The table below holds for `miri`, `clippy`, `rustfmt`, `error_index_generator`, `rust-analyzer`.
| Command | `--stage` | Current | Desired |
|--------------|----------:|--------------------------------------------------------|-----------------------------------------------|
| `check miri` | - | Check compiler and miri with `r0` | Same |
| `check miri` | 0 | Check compiler and miri with `r0` | Error |
| `check miri` | 1 | Build `r1`, check compiler and miri with `r1` | Check compiler and miri with `r0` |
| `check miri` | 2 | Build `r2`, check compiler and miri with `r2` | Build `r1`, check compiler and miri with `r1` |
| `build miri` | - | Build compiler and miri with `r0` | Same |
| `build miri` | 0 | Build compiler and miri with `r0` | Error |
| `build miri` | 1 | Build compiler and miri with `r0` | Same |
| `build miri` | 2 | Build `r1`, build compiler and miri with `r1` | Same |
| `test miri` | - | Build compiler and miri with `r0`, test it | Same |
| `test miri` | 0 | Error | Error |
| `test miri` | 1 | Build compiler and miri with `r0`, test it | Same |
| `test miri` | 2 | Build `r1`, build compiler and miri with `r1`, test it | Same |
`rustdoc` has currently slightly different behavior.
| Command | `--stage` | Current | Desired |
|-----------------|----------:|-----------------------------------------------------------|--------------------------------------------------|
| `check rustdoc` | - | Check compiler and rustdoc with `r0` | Same |
| `check rustdoc` | 0 | Check compiler and rustdoc with `r0` | Error |
| `check rustdoc` | 1 | Build `r1`, check compiler and rustdoc with `r1` | Check compiler and rustdoc with `r0` |
| `check rustdoc` | 2 | Build `r2`, check compiler and rustdoc with `r2` | Build `r1`, check compiler and rustdoc with `r1` |
| `build rustdoc` | - | Build compiler and rustdoc with `r0` | Same |
| `build rustdoc` | 0 | Error | Error |
| `build rustdoc` | 1 | Build compiler and rustdoc with `r0` | Same |
| `build rustdoc` | 2 | Build `r1`, build compiler and rustdoc with `r1` | Same |
| `test rustdoc` | - | Build `r1`, build compiler and rustdoc with `r0`, test it | Same |
| `test rustdoc` | 0 | Error | Same |
| `test rustdoc` | 1 | Build compiler and rustdoc with `r0`, test it | Same |
| `test rustdoc` | 2 | Build `r1`, build compiler and rustdoc with `r1`, test it | Same |
### Bootstrap (stage0) tools (`Mode::ToolBootstrap` in bootstrap)
Tools that can be compiled with the stage0 compiler, e.g. `opt-dist`. These tools have the following invariants:
- They can be built with the stage0 compiler (and it doesn't really make sense to build them with anything else).
- They are always built for the host target, i.e. for the target of the computer that compiles them.
- This stems from the fact that the stage0 compiler can only compile Rust programs for the host target.
- They should never be distributed in our `dist` archives, because of the target limitation above (note: this invariant does not currently hold, we should switch these away from `ToolBootstrap` to `ToolRustcHost`, see below).
We can use one of the following two approaches for them:
1) Ignore `--stage` altogether, and always build them using stage 0 rustc. This is what bootstrap does now.
- It seems like these tools really don't have to participate in staging at all.
- It also keeps backwards compatibility, although I suspect most people just build these tools without `--stage`, because as was stated above, they don't really deal with staging in any way.
2) We can apply the same numbering scheme as for compiler/library, and treat these tools as being "stage 1" when built with stage 0 rustc.
- This makes these tools be more consistent with the rest of the bootstrap steps.
- This allows us to actually build these tools using the in-tree compiler through bootstrap, although this doesn't seem to be a useful use-case, as this wasn't supported before and no one seems to want it.
### Rustc host tools (currently does not exist)
There are several tools that are currently either `ToolRustc` (`LlvmBitcodeLinker`) or `ToolStd` (`LldWrapper`) or `ToolBootstrap` (`WasmComponentLd`) that are really a separate tool type. They are tools that are executed by `rustc` a *host tools*. So when you compile with `rustc` on x64 Linux, that `rustc` will execute these tools which will run on the same target. These are essentially all linkers.
Some notes:
- Currently, these are being compiled unnecessarily. For example, a non-cross-compiling stage 2 build builds `LlvmBitcodeLinker` twice, for absolutely no reason. We can just build it once using the stage 0 compiler (PR WIP :) ).
- Even if they are host tools, they are actually stored in the target bindir (`rustlib/<target>/bin/*`), for the host target of `rustc`. That's how rustup/rustc works.
I propose to call this type of tool `Mode::ToolRustcHost`, or something like that. These tools have the following property: they can be built using a compiler that can compile code for the host target of the compiler that will use these tools at runtime. This is a similar property to the stdlib used by rustc at runtime.
- If we are not cross-compiling, we can just build them with the stage 0 compiler.
- If we are cross-compiling and building a compiler for target `T`, we have to build a stage N compiler that can compile code for `T`, and then build the host tool with it. N should correspond to the top level stage, in order to avoid unnecessary builds. For example, if you do a stage 2 build of a cross-compiled rustc, using stage 1 here would unnecessarily build an additional rustc.
### Cargo
Cargo is a bit weird, because we can always build it with the stage0 compiler, but we also need at least stage 1 `rustc` to run cargo tests (to make sure that they still work). We also can't use `Mode::ToolBootstrap` for Cargo, because we likely compile Cargo for non-host targets on CI. Keeping `Mode::UsesRustcPrivate` for them should work, I suppose. It's kind of similar, just that it doesn't link to the compiler library, just uses the compiler as a binary.
## Other proposals
- We can remove check/build stages from profiles and just use default stage 1 for everything.