# Faster bevy compilation times
All measurements done with `echo performance | sudo -k tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor`
on a system without any programs open except for a terminal. The standard compile fast config was
used as base. The measurements are for debug mode.
## Cargo overhead
To establish a lower-bound on the compilation time I first tested the amount of time Cargo needs to consider all files clean when there have been no changes.
```bash
$ hyperfine --warmup 10 --runs 100 "cargo build --example breakout"
Benchmark #1: cargo build --example breakout
Time (mean ± σ): 164.3 ms ± 0.9 ms [User: 136.6 ms, System: 37.4 ms]
Range (min … max): 162.8 ms … 167.4 ms 100 runs
```
# No-op change
The first actual thing to test is a no-op change, which puts most strain on incremental compilation and the linker, but doesn't involve any codegen.
## Baseline
```bash
$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout
Benchmark #1: cargo build --example breakout
Time (mean ± σ): 2.065 s ± 0.362 s [User: 2.181 s, System: 0.604 s]
Range (min … max): 1.722 s … 4.387 s 100 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
```
```
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 1.064; rss: 257MB link_binary
time: 1.090; rss: 257MB link_binary
time: 1.068; rss: 257MB link_binary
time: 1.093; rss: 257MB link_binary
time: 1.131; rss: 256MB link_binary
time: 1.107; rss: 256MB link_binary
time: 1.087; rss: 256MB link_binary
time: 1.087; rss: 259MB link_binary
time: 1.103; rss: 258MB link_binary
time: 1.130; rss: 258MB link_binary
time: 1.155; rss: 256MB link_binary
```
## `-Cprefer-dynamic`
```bash
$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo rustc --example breakout -- -Cprefer-dynamic"
Benchmark #1: touch examples/game/breakout.rs && cargo rustc --example breakout -- -Cprefer-dynamic
Time (mean ± σ): 2.097 s ± 0.411 s [User: 2.207 s, System: 0.603 s]
Range (min … max): 1.708 s … 4.151 s 100 runs
```
```
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Cprefer-dynamic -Ztime-passes |& rg "link_binary$"; done
time: 1.043; rss: 302MB link_binary
time: 1.132; rss: 246MB link_binary
time: 1.099; rss: 253MB link_binary
time: 1.169; rss: 254MB link_binary
time: 1.116; rss: 252MB link_binary
time: 1.160; rss: 251MB link_binary
time: 1.055; rss: 252MB link_binary
time: 1.073; rss: 253MB link_binary
time: 1.195; rss: 254MB link_binary
time: 1.108; rss: 254MB link_binary
time: 1.142; rss: 252MB link_binary
```
## Bevy dylib (implies `-Cprefer-dynamic` as one of the crates is only available as dylib)
```bash
$ git apply - <<EOF
diff --git a/Cargo.toml b/Cargo.toml
index 8d18c296..71bc971e 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -15,6 +15,9 @@ categories = ["game-engines", "graphics", "gui", "rendering"]
readme = "README.md"
exclude = ["assets/**/*", "tools/**/*", ".github/**/*", "crates/**/*"]
+[lib]
+crate-type = ["dylib"]
+
[features]
default = [
"bevy_audio",
EOF
$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout
Time (mean ± σ): 686.3 ms ± 4.4 ms [User: 570.7 ms, System: 173.6 ms]
Range (min … max): 679.9 ms … 716.0 ms 100 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 0.151; rss: 294MB link_binary
time: 0.141; rss: 246MB link_binary
time: 0.130; rss: 251MB link_binary
time: 0.141; rss: 251MB link_binary
time: 0.130; rss: 255MB link_binary
time: 0.131; rss: 251MB link_binary
time: 0.135; rss: 251MB link_binary
time: 0.131; rss: 251MB link_binary
time: 0.130; rss: 252MB link_binary
time: 0.130; rss: 252MB link_binary
time: 0.133; rss: 252MB link_binary
```
## `-Zfunction-sections=no`
```bash
$ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=no\" cargo build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo build --example breakout
Time (mean ± σ): 6.613 s ± 0.235 s [User: 5.439 s, System: 1.137 s]
Range (min … max): 6.423 s … 7.090 s 10 runs
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 5.796; rss: 266MB link_binary
time: 5.968; rss: 268MB link_binary
time: 6.048; rss: 267MB link_binary
time: 5.976; rss: 269MB link_binary
time: 5.917; rss: 267MB link_binary
time: 5.805; rss: 268MB link_binary
time: 6.081; rss: 266MB link_binary
time: 6.472; rss: 267MB link_binary
time: 5.996; rss: 266MB link_binary
time: 6.041; rss: 268MB link_binary
time: 5.864; rss: 266MB link_binary
```
In this case the overhead of having a single section for each function is smaller the time saved not
linking functions that are dead code I guess. On the rustc-perf benchmark suite this actually saves
time.
## `-Zfunction-sections=no` + dylib
```bash
$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=no\" cargo build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo build --example breakout
Time (mean ± σ): 1.607 s ± 0.019 s [User: 1.391 s, System: 0.232 s]
Range (min … max): 1.596 s … 1.746 s 100 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 1.036; rss: 257MB link_binary
time: 1.023; rss: 259MB link_binary
time: 1.052; rss: 259MB link_binary
time: 1.035; rss: 256MB link_binary
time: 1.135; rss: 260MB link_binary
time: 1.137; rss: 259MB link_binary
time: 1.034; rss: 256MB link_binary
time: 1.084; rss: 258MB link_binary
time: 1.044; rss: 259MB link_binary
time: 1.083; rss: 260MB link_binary
time: 1.172; rss: 256MB link_binary
```
## `-Zrelro-level=off`
```bash
$ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zrelro-level=off\" cargo build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo build --example breakout
Time (mean ± σ): 9.362 s ± 0.248 s [User: 7.580 s, System: 1.800 s]
Range (min … max): 9.233 s … 9.924 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 8.609; rss: 263MB link_binary
time: 8.615; rss: 265MB link_binary
time: 8.604; rss: 264MB link_binary
time: 8.596; rss: 262MB link_binary
time: 8.593; rss: 262MB link_binary
time: 8.651; rss: 261MB link_binary
time: 8.636; rss: 262MB link_binary
time: 8.611; rss: 260MB link_binary
time: 8.624; rss: 259MB link_binary
time: 8.617; rss: 262MB link_binary
time: 8.603; rss: 260MB link_binary
```
This makes it possible to use the PLT for lazy resolution of symbols. This seems to put extra strain
on the linker.
## `-Zrelro=level=off` + dylib
```bash
$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zrelro-level=off\" cargo build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo build --example breakout
Time (mean ± σ): 1.584 s ± 0.024 s [User: 1.366 s, System: 0.235 s]
Range (min … max): 1.556 s … 1.717 s 100 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 1.039; rss: 267MB link_binary
time: 1.049; rss: 264MB link_binary
time: 1.065; rss: 266MB link_binary
time: 1.077; rss: 264MB link_binary
time: 1.085; rss: 267MB link_binary
time: 1.076; rss: 267MB link_binary
time: 1.069; rss: 263MB link_binary
time: 1.079; rss: 264MB link_binary
time: 1.129; rss: 265MB link_binary
time: 1.050; rss: 264MB link_binary
time: 1.136; rss: 264MB link_binary
```
## `[profile.dev] panic=abort`
```bash
$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout
Time (mean ± σ): 2.150 s ± 0.407 s [User: 2.127 s, System: 0.550 s]
Range (min … max): 1.736 s … 4.452 s 100 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 0.999; rss: 250MB link_binary
time: 1.040; rss: 247MB link_binary
time: 1.159; rss: 249MB link_binary
time: 1.024; rss: 249MB link_binary
time: 1.063; rss: 247MB link_binary
time: 1.074; rss: 248MB link_binary
time: 1.018; rss: 247MB link_binary
time: 1.090; rss: 249MB link_binary
time: 1.088; rss: 249MB link_binary
time: 1.161; rss: 248MB link_binary
time: 1.031; rss: 248MB link_binary
```
## `-Cpanic=abort` + dylib
Compilation fails with ``error: the linked panic runtime `panic_unwind` is not compiled with this
crate's panic strategy `abort`.``.
## cg_clif (with implicit `-Zfunction-sections=no`, `-Zrelro-level=off` and `-Cpanic=abort`)
```bash
$ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout
Time (mean ± σ): 4.364 s ± 0.284 s [User: 3.289 s, System: 0.864 s]
Range (min … max): 4.130 s … 4.964 s 10 runs
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 3.570; rss: 245MB link_binary
time: 3.549; rss: 208MB link_binary
time: 3.514; rss: 208MB link_binary
time: 3.509; rss: 208MB link_binary
time: 3.526; rss: 208MB link_binary
time: 3.513; rss: 209MB link_binary
time: 3.514; rss: 207MB link_binary
time: 3.519; rss: 207MB link_binary
time: 3.523; rss: 208MB link_binary
time: 3.515; rss: 207MB link_binary
time: 3.525; rss: 207MB link_binary
```
https://github.com/bjorn3/rustc_codegen_cranelift/issues/762 is the tracking issue for linker performance in cg_clif.
## cg_clif + dylib (with implicit `-Zfunction-sections=no`, `-Zrelro-level=off` and `-Cpanic=abort`)
```bash
$ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout
Time (mean ± σ): 1.961 s ± 0.054 s [User: 1.421 s, System: 0.551 s]
Range (min … max): 1.892 s … 2.078 s 10 runs
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 1.016; rss: 206MB link_binary
time: 1.077; rss: 204MB link_binary
time: 1.017; rss: 205MB link_binary
time: 1.029; rss: 205MB link_binary
time: 1.020; rss: 205MB link_binary
time: 1.117; rss: 205MB link_binary
time: 1.485; rss: 205MB link_binary
time: 1.241; rss: 205MB link_binary
time: 1.029; rss: 205MB link_binary
time: 1.097; rss: 206MB link_binary
time: 1.120; rss: 204MB link_binary
```
## cg_clif with `-Zfunction-sections=yes` (with implicit `-Zrelro-level=off` and `-Cpanic=abort`)
```bash
$ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=yes\" ../build/cargo.sh build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh build --example breakout
Time (mean ± σ): 6.515 s ± 0.118 s [User: 5.174 s, System: 1.310 s]
Range (min … max): 6.398 s … 6.754 s 10 runs
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 5.610; rss: 245MB link_binary
time: 5.625; rss: 208MB link_binary
time: 5.611; rss: 207MB link_binary
time: 5.603; rss: 210MB link_binary
time: 5.633; rss: 208MB link_binary
time: 5.599; rss: 209MB link_binary
time: 5.603; rss: 208MB link_binary
time: 5.591; rss: 207MB link_binary
time: 5.578; rss: 209MB link_binary
time: 5.588; rss: 207MB link_binary
time: 5.607; rss: 208MB link_binary
```
## cg_clif with dylib and `-Zfunction-sections=yes` (with implicit `-Zrelro-level=off` and `-Cpanic=abort`)
```bash
$ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=yes\" ../build/cargo.sh build --example breakout"
Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh build --example breakout
Time (mean ± σ): 1.969 s ± 0.062 s [User: 1.422 s, System: 0.557 s]
Range (min … max): 1.907 s … 2.055 s 10 runs
```
```bash
$ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done
time: 1.026; rss: 205MB link_binary
time: 1.025; rss: 205MB link_binary
time: 1.021; rss: 205MB link_binary
time: 1.033; rss: 207MB link_binary
time: 1.047; rss: 204MB link_binary
time: 1.106; rss: 207MB link_binary
time: 1.074; rss: 206MB link_binary
time: 1.022; rss: 205MB link_binary
time: 1.051; rss: 205MB link_binary
time: 1.039; rss: 206MB link_binary
time: 1.047; rss: 205MB link_binary
```
## Summary for no-op changes
|name|time|link time|
|---|---|---|
|cargo overhead|0.2s|N/A|
|baseline|2.1s|1.1s|
|`-Cprefer-dynamic`|2.1s|1.1s|
|**dylib**|**0.9s**|**0.1s**|
|`-Zfunction-sections=no`|6.6s|6s|
|`-Zfunction-sections=no` + dylib|1.6s|1.1s|
|`-Zrelro-level=off`|9.4s|8.6s|
|`-Zrelro-level=off` + dylib|1.6s|1.1s|
|`panic=abort`|2.2s|1.1s|
|`panic=abort` + dylib|error|N/A|
|cg_clif|4.4s|3.5s|
|cg_clif + dylib|2.0s|1.1s|
|cg_clif + `-Zfunction-sections=yes`|6.5s|5.6s|
|cg_clif + dylib + `-Zfunction-sections=yes`|2.0s|1.0s|
The only thing with a clear improvement on compilation time is using a dylib for the main Bevy crate. The rest either has neutral or negative effect.
# Adding a println!() to main.
This will cause at least one codegen unit to be recompiled.
I only ran a subset of the previous benchmarks as I think the rest won't cause an improvement in this case either.
I also won't list linker times as those should be comparable to the previous benchmark.
## Baseline
```bash
$ diff -u examples/game/breakout1.rs examples/game/breakout2.rs
--- examples/game/breakout1.rs 2020-11-04 13:09:35.528245302 +0100
+++ examples/game/breakout2.rs 2020-11-04 13:08:50.008110939 +0100
@@ -6,7 +6,6 @@
/// An implementation of the classic game "Breakout"
fn main() {
- println!();
App::build()
.add_plugins(DefaultPlugins)
.add_resource(Scoreboard { score: 0 })
$ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout"
Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout
Time (mean ± σ): 6.744 s ± 0.314 s [User: 9.153 s, System: 1.244 s]
Range (min … max): 6.211 s … 7.280 s 10 runs
```
## Bevy dylib
```bash
$ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout"
Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout
Time (mean ± σ): 3.796 s ± 0.075 s [User: 5.875 s, System: 0.467 s]
Range (min … max): 3.662 s … 3.872 s 10 runs
```
## cg_clif
```bash
$ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout"
Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout
Time (mean ± σ): 10.126 s ± 0.293 s [User: 7.832 s, System: 2.168 s]
Range (min … max): 9.852 s … 10.798 s 10 runs
```
## cg_clif + dylib
```bash
$ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout"
Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout
Time (mean ± σ): 5.127 s ± 0.145 s [User: 4.071 s, System: 1.077 s]
Range (min … max): 4.959 s … 5.360 s 10 runs
```
## cg_clif + dylib + jit
Patch to prevent jit mode from running the code:
```diff
$ cd rustc_codegen_cranelift && git diff
diff --git a/src/driver/jit.rs b/src/driver/jit.rs
index 3f47df7..1c8712d 100644
--- a/src/driver/jit.rs
+++ b/src/driver/jit.rs
@@ -103,7 +103,7 @@ pub(super) fn run_jit(tcx: TyCtxt<'_>) -> ! {
// useful as some dynamic linkers use it as a marker to jump over.
argv.push(std::ptr::null());
- let ret = f(args.len() as c_int, argv.as_ptr());
+ let ret = 0;
std::process::exit(ret);
}
```
Patch for bevy to run in jit mode, as rustc spawns a new thread to run in:
```diff
$ git diff crates/bevy_winit
diff --git a/crates/bevy_winit/src/lib.rs b/crates/bevy_winit/src/lib.rs
index a8c20cdb..579183bb 100644
--- a/crates/bevy_winit/src/lib.rs
+++ b/crates/bevy_winit/src/lib.rs
@@ -142,7 +142,7 @@ where
}
pub fn winit_runner(mut app: App) {
- let mut event_loop = EventLoop::new();
+ let mut event_loop: EventLoop<_> = winit::platform::unix::EventLoopExtUnix::new_any_thread();
let mut create_window_event_reader = EventReader::<CreateWindow>::default();
let mut app_exit_event_reader = EventReader::<AppExit>::default();
```
```
$ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout"
Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout
Time (mean ± σ): 4.579 s ± 0.006 s [User: 3.576 s, System: 1.033 s]
Range (min … max): 4.569 s … 4.588 s 10 runs
```
This is still a bit slower than cg_llvm + dylib, but that is expected as it doesn't cache any codegen unit in the incremental cache.
# Conclusion
The only way I could find to improve compilation times of Bevy was to make the `bevy` crate a dylib.
If you still want to retain the ability to statically link to `bevy`, you could change set
`crate-type = ["rlib", "dylib"]` and use `-Cprefer-dynamic` in the fast compilation profile.
Edit(2020-11-10): [bevy#808](https://github.com/bevyengine/bevy/pull/808) has been merged which makes it possible to optionally compile bevy as dylib
# Extra: Cargo performance
> New 2020-11-05
## Breakout
```bash
$ cargo build --example breakout
$ touch examples/game/breakout.rs
$ perf record --call-graph dwarf,50000 sh -c "cargo build --example breakout"
```
https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/bjorn3/562a8772b08ce58d0a7746a0f7c12b72/raw/e5646219a679b8fc3082c12578814e4b583ea3dd/bevy_breakout_fresh.collapsed


|program|cpu usage|
|---|---|
|rustc|50%|
|cargo|28%|
|ld.lld|19%|
|clang|1.8%|
|sh|0.7%|
## Old vs new resolver
```bash
$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout" "touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout"
Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout
Time (mean ± σ): 678.8 ms ± 21.5 ms [User: 573.7 ms, System: 160.8 ms]
Range (min … max): 668.1 ms … 812.3 ms 100 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark #2: touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout
Time (mean ± σ): 690.2 ms ± 20.4 ms [User: 582.6 ms, System: 164.8 ms]
Range (min … max): 683.3 ms … 891.4 ms 100 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (891.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Summary
'touch examples/game/breakout.rs && cargo build --example breakout' ran
1.02 ± 0.04 times faster than 'touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout'
```
The new resolver is a bit slower.
## Avoid constructing `anyhow::Error` when not necessary (merged)
This avoids an unnecessary backtrace capture for each thread: https://github.com/rust-lang/cargo/pull/8844
## Avoiding a new thread for each fresh job
Saves about 2% on my system: https://github.com/rust-lang/cargo/pull/8837
## Conclusion
The Cargo overhead is higher than I hoped. I opened [rust-lang/cargo#8833](https://github.com/rust-lang/cargo/issues/8833) for this.