# Faster bevy compilation times All measurements done with `echo performance | sudo -k tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor` on a system without any programs open except for a terminal. The standard compile fast config was used as base. The measurements are for debug mode. ## Cargo overhead To establish a lower-bound on the compilation time I first tested the amount of time Cargo needs to consider all files clean when there have been no changes. ```bash $ hyperfine --warmup 10 --runs 100 "cargo build --example breakout" Benchmark #1: cargo build --example breakout Time (mean ± σ): 164.3 ms ± 0.9 ms [User: 136.6 ms, System: 37.4 ms] Range (min … max): 162.8 ms … 167.4 ms 100 runs ``` # No-op change The first actual thing to test is a no-op change, which puts most strain on incremental compilation and the linker, but doesn't involve any codegen. ## Baseline ```bash $ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout" Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout Benchmark #1: cargo build --example breakout Time (mean ± σ): 2.065 s ± 0.362 s [User: 2.181 s, System: 0.604 s] Range (min … max): 1.722 s … 4.387 s 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. ``` ``` $ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 1.064; rss: 257MB link_binary time: 1.090; rss: 257MB link_binary time: 1.068; rss: 257MB link_binary time: 1.093; rss: 257MB link_binary time: 1.131; rss: 256MB link_binary time: 1.107; rss: 256MB link_binary time: 1.087; rss: 256MB link_binary time: 1.087; rss: 259MB link_binary time: 1.103; rss: 258MB link_binary time: 1.130; rss: 258MB link_binary time: 1.155; rss: 256MB link_binary ``` ## `-Cprefer-dynamic` ```bash $ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo rustc --example breakout -- -Cprefer-dynamic" Benchmark #1: touch examples/game/breakout.rs && cargo rustc --example breakout -- -Cprefer-dynamic Time (mean ± σ): 2.097 s ± 0.411 s [User: 2.207 s, System: 0.603 s] Range (min … max): 1.708 s … 4.151 s 100 runs ``` ``` $ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Cprefer-dynamic -Ztime-passes |& rg "link_binary$"; done time: 1.043; rss: 302MB link_binary time: 1.132; rss: 246MB link_binary time: 1.099; rss: 253MB link_binary time: 1.169; rss: 254MB link_binary time: 1.116; rss: 252MB link_binary time: 1.160; rss: 251MB link_binary time: 1.055; rss: 252MB link_binary time: 1.073; rss: 253MB link_binary time: 1.195; rss: 254MB link_binary time: 1.108; rss: 254MB link_binary time: 1.142; rss: 252MB link_binary ``` ## Bevy dylib (implies `-Cprefer-dynamic` as one of the crates is only available as dylib) ```bash $ git apply - <<EOF diff --git a/Cargo.toml b/Cargo.toml index 8d18c296..71bc971e 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -15,6 +15,9 @@ categories = ["game-engines", "graphics", "gui", "rendering"] readme = "README.md" exclude = ["assets/**/*", "tools/**/*", ".github/**/*", "crates/**/*"] +[lib] +crate-type = ["dylib"] + [features] default = [ "bevy_audio", EOF $ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout" Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout Time (mean ± σ): 686.3 ms ± 4.4 ms [User: 570.7 ms, System: 173.6 ms] Range (min … max): 679.9 ms … 716.0 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 0.151; rss: 294MB link_binary time: 0.141; rss: 246MB link_binary time: 0.130; rss: 251MB link_binary time: 0.141; rss: 251MB link_binary time: 0.130; rss: 255MB link_binary time: 0.131; rss: 251MB link_binary time: 0.135; rss: 251MB link_binary time: 0.131; rss: 251MB link_binary time: 0.130; rss: 252MB link_binary time: 0.130; rss: 252MB link_binary time: 0.133; rss: 252MB link_binary ``` ## `-Zfunction-sections=no` ```bash $ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=no\" cargo build --example breakout" Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo build --example breakout Time (mean ± σ): 6.613 s ± 0.235 s [User: 5.439 s, System: 1.137 s] Range (min … max): 6.423 s … 7.090 s 10 runs ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 5.796; rss: 266MB link_binary time: 5.968; rss: 268MB link_binary time: 6.048; rss: 267MB link_binary time: 5.976; rss: 269MB link_binary time: 5.917; rss: 267MB link_binary time: 5.805; rss: 268MB link_binary time: 6.081; rss: 266MB link_binary time: 6.472; rss: 267MB link_binary time: 5.996; rss: 266MB link_binary time: 6.041; rss: 268MB link_binary time: 5.864; rss: 266MB link_binary ``` In this case the overhead of having a single section for each function is smaller the time saved not linking functions that are dead code I guess. On the rustc-perf benchmark suite this actually saves time. ## `-Zfunction-sections=no` + dylib ```bash $ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=no\" cargo build --example breakout" Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo build --example breakout Time (mean ± σ): 1.607 s ± 0.019 s [User: 1.391 s, System: 0.232 s] Range (min … max): 1.596 s … 1.746 s 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=no" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 1.036; rss: 257MB link_binary time: 1.023; rss: 259MB link_binary time: 1.052; rss: 259MB link_binary time: 1.035; rss: 256MB link_binary time: 1.135; rss: 260MB link_binary time: 1.137; rss: 259MB link_binary time: 1.034; rss: 256MB link_binary time: 1.084; rss: 258MB link_binary time: 1.044; rss: 259MB link_binary time: 1.083; rss: 260MB link_binary time: 1.172; rss: 256MB link_binary ``` ## `-Zrelro-level=off` ```bash $ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zrelro-level=off\" cargo build --example breakout" Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo build --example breakout Time (mean ± σ): 9.362 s ± 0.248 s [User: 7.580 s, System: 1.800 s] Range (min … max): 9.233 s … 9.924 s 10 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 8.609; rss: 263MB link_binary time: 8.615; rss: 265MB link_binary time: 8.604; rss: 264MB link_binary time: 8.596; rss: 262MB link_binary time: 8.593; rss: 262MB link_binary time: 8.651; rss: 261MB link_binary time: 8.636; rss: 262MB link_binary time: 8.611; rss: 260MB link_binary time: 8.624; rss: 259MB link_binary time: 8.617; rss: 262MB link_binary time: 8.603; rss: 260MB link_binary ``` This makes it possible to use the PLT for lazy resolution of symbols. This seems to put extra strain on the linker. ## `-Zrelro=level=off` + dylib ```bash $ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zrelro-level=off\" cargo build --example breakout" Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo build --example breakout Time (mean ± σ): 1.584 s ± 0.024 s [User: 1.366 s, System: 0.235 s] Range (min … max): 1.556 s … 1.717 s 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zrelro-level=off" cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 1.039; rss: 267MB link_binary time: 1.049; rss: 264MB link_binary time: 1.065; rss: 266MB link_binary time: 1.077; rss: 264MB link_binary time: 1.085; rss: 267MB link_binary time: 1.076; rss: 267MB link_binary time: 1.069; rss: 263MB link_binary time: 1.079; rss: 264MB link_binary time: 1.129; rss: 265MB link_binary time: 1.050; rss: 264MB link_binary time: 1.136; rss: 264MB link_binary ``` ## `[profile.dev] panic=abort` ```bash $ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout" Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout Time (mean ± σ): 2.150 s ± 0.407 s [User: 2.127 s, System: 0.550 s] Range (min … max): 1.736 s … 4.452 s 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && cargo rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 0.999; rss: 250MB link_binary time: 1.040; rss: 247MB link_binary time: 1.159; rss: 249MB link_binary time: 1.024; rss: 249MB link_binary time: 1.063; rss: 247MB link_binary time: 1.074; rss: 248MB link_binary time: 1.018; rss: 247MB link_binary time: 1.090; rss: 249MB link_binary time: 1.088; rss: 249MB link_binary time: 1.161; rss: 248MB link_binary time: 1.031; rss: 248MB link_binary ``` ## `-Cpanic=abort` + dylib Compilation fails with ``error: the linked panic runtime `panic_unwind` is not compiled with this crate's panic strategy `abort`.``. ## cg_clif (with implicit `-Zfunction-sections=no`, `-Zrelro-level=off` and `-Cpanic=abort`) ```bash $ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout" Benchmark #1: touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout Time (mean ± σ): 4.364 s ± 0.284 s [User: 3.289 s, System: 0.864 s] Range (min … max): 4.130 s … 4.964 s 10 runs ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 3.570; rss: 245MB link_binary time: 3.549; rss: 208MB link_binary time: 3.514; rss: 208MB link_binary time: 3.509; rss: 208MB link_binary time: 3.526; rss: 208MB link_binary time: 3.513; rss: 209MB link_binary time: 3.514; rss: 207MB link_binary time: 3.519; rss: 207MB link_binary time: 3.523; rss: 208MB link_binary time: 3.515; rss: 207MB link_binary time: 3.525; rss: 207MB link_binary ``` https://github.com/bjorn3/rustc_codegen_cranelift/issues/762 is the tracking issue for linker performance in cg_clif. ## cg_clif + dylib (with implicit `-Zfunction-sections=no`, `-Zrelro-level=off` and `-Cpanic=abort`) ```bash $ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout" Benchmark #1: touch examples/game/breakout.rs && ../build/cargo.sh build --example breakout Time (mean ± σ): 1.961 s ± 0.054 s [User: 1.421 s, System: 0.551 s] Range (min … max): 1.892 s … 2.078 s 10 runs ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 1.016; rss: 206MB link_binary time: 1.077; rss: 204MB link_binary time: 1.017; rss: 205MB link_binary time: 1.029; rss: 205MB link_binary time: 1.020; rss: 205MB link_binary time: 1.117; rss: 205MB link_binary time: 1.485; rss: 205MB link_binary time: 1.241; rss: 205MB link_binary time: 1.029; rss: 205MB link_binary time: 1.097; rss: 206MB link_binary time: 1.120; rss: 204MB link_binary ``` ## cg_clif with `-Zfunction-sections=yes` (with implicit `-Zrelro-level=off` and `-Cpanic=abort`) ```bash $ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=yes\" ../build/cargo.sh build --example breakout" Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh build --example breakout Time (mean ± σ): 6.515 s ± 0.118 s [User: 5.174 s, System: 1.310 s] Range (min … max): 6.398 s … 6.754 s 10 runs ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 5.610; rss: 245MB link_binary time: 5.625; rss: 208MB link_binary time: 5.611; rss: 207MB link_binary time: 5.603; rss: 210MB link_binary time: 5.633; rss: 208MB link_binary time: 5.599; rss: 209MB link_binary time: 5.603; rss: 208MB link_binary time: 5.591; rss: 207MB link_binary time: 5.578; rss: 209MB link_binary time: 5.588; rss: 207MB link_binary time: 5.607; rss: 208MB link_binary ``` ## cg_clif with dylib and `-Zfunction-sections=yes` (with implicit `-Zrelro-level=off` and `-Cpanic=abort`) ```bash $ hyperfine --warmup 1 --runs 10 "touch examples/game/breakout.rs && RUSTFLAGS=\"-Zfunction-sections=yes\" ../build/cargo.sh build --example breakout" Benchmark #1: touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh build --example breakout Time (mean ± σ): 1.969 s ± 0.062 s [User: 1.422 s, System: 0.557 s] Range (min … max): 1.907 s … 2.055 s 10 runs ``` ```bash $ for i in $(seq 0 10); do touch examples/game/breakout.rs && RUSTFLAGS="-Zfunction-sections=yes" ../build/cargo.sh rustc --example breakout -- -Ztime-passes |& rg "link_binary$"; done time: 1.026; rss: 205MB link_binary time: 1.025; rss: 205MB link_binary time: 1.021; rss: 205MB link_binary time: 1.033; rss: 207MB link_binary time: 1.047; rss: 204MB link_binary time: 1.106; rss: 207MB link_binary time: 1.074; rss: 206MB link_binary time: 1.022; rss: 205MB link_binary time: 1.051; rss: 205MB link_binary time: 1.039; rss: 206MB link_binary time: 1.047; rss: 205MB link_binary ``` ## Summary for no-op changes |name|time|link time| |---|---|---| |cargo overhead|0.2s|N/A| |baseline|2.1s|1.1s| |`-Cprefer-dynamic`|2.1s|1.1s| |**dylib**|**0.9s**|**0.1s**| |`-Zfunction-sections=no`|6.6s|6s| |`-Zfunction-sections=no` + dylib|1.6s|1.1s| |`-Zrelro-level=off`|9.4s|8.6s| |`-Zrelro-level=off` + dylib|1.6s|1.1s| |`panic=abort`|2.2s|1.1s| |`panic=abort` + dylib|error|N/A| |cg_clif|4.4s|3.5s| |cg_clif + dylib|2.0s|1.1s| |cg_clif + `-Zfunction-sections=yes`|6.5s|5.6s| |cg_clif + dylib + `-Zfunction-sections=yes`|2.0s|1.0s| The only thing with a clear improvement on compilation time is using a dylib for the main Bevy crate. The rest either has neutral or negative effect. # Adding a println!() to main. This will cause at least one codegen unit to be recompiled. I only ran a subset of the previous benchmarks as I think the rest won't cause an improvement in this case either. I also won't list linker times as those should be comparable to the previous benchmark. ## Baseline ```bash $ diff -u examples/game/breakout1.rs examples/game/breakout2.rs --- examples/game/breakout1.rs 2020-11-04 13:09:35.528245302 +0100 +++ examples/game/breakout2.rs 2020-11-04 13:08:50.008110939 +0100 @@ -6,7 +6,6 @@ /// An implementation of the classic game "Breakout" fn main() { - println!(); App::build() .add_plugins(DefaultPlugins) .add_resource(Scoreboard { score: 0 }) $ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout" Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout Time (mean ± σ): 6.744 s ± 0.314 s [User: 9.153 s, System: 1.244 s] Range (min … max): 6.211 s … 7.280 s 10 runs ``` ## Bevy dylib ```bash $ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout" Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; cargo build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; cargo build --example breakout Time (mean ± σ): 3.796 s ± 0.075 s [User: 5.875 s, System: 0.467 s] Range (min … max): 3.662 s … 3.872 s 10 runs ``` ## cg_clif ```bash $ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout" Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout Time (mean ± σ): 10.126 s ± 0.293 s [User: 7.832 s, System: 2.168 s] Range (min … max): 9.852 s … 10.798 s 10 runs ``` ## cg_clif + dylib ```bash $ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout" Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh build --example breakout Time (mean ± σ): 5.127 s ± 0.145 s [User: 4.071 s, System: 1.077 s] Range (min … max): 4.959 s … 5.360 s 10 runs ``` ## cg_clif + dylib + jit Patch to prevent jit mode from running the code: ```diff $ cd rustc_codegen_cranelift && git diff diff --git a/src/driver/jit.rs b/src/driver/jit.rs index 3f47df7..1c8712d 100644 --- a/src/driver/jit.rs +++ b/src/driver/jit.rs @@ -103,7 +103,7 @@ pub(super) fn run_jit(tcx: TyCtxt<'_>) -> ! { // useful as some dynamic linkers use it as a marker to jump over. argv.push(std::ptr::null()); - let ret = f(args.len() as c_int, argv.as_ptr()); + let ret = 0; std::process::exit(ret); } ``` Patch for bevy to run in jit mode, as rustc spawns a new thread to run in: ```diff $ git diff crates/bevy_winit diff --git a/crates/bevy_winit/src/lib.rs b/crates/bevy_winit/src/lib.rs index a8c20cdb..579183bb 100644 --- a/crates/bevy_winit/src/lib.rs +++ b/crates/bevy_winit/src/lib.rs @@ -142,7 +142,7 @@ where } pub fn winit_runner(mut app: App) { - let mut event_loop = EventLoop::new(); + let mut event_loop: EventLoop<_> = winit::platform::unix::EventLoopExtUnix::new_any_thread(); let mut create_window_event_reader = EventReader::<CreateWindow>::default(); let mut app_exit_event_reader = EventReader::<AppExit>::default(); ``` ``` $ hyperfine --warmup 1 --runs 10 "cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout" Benchmark #1: cp examples/game/breakout1.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout; cp examples/game/breakout2.rs examples/game/breakout.rs; ../build/cargo.sh jit --example breakout Time (mean ± σ): 4.579 s ± 0.006 s [User: 3.576 s, System: 1.033 s] Range (min … max): 4.569 s … 4.588 s 10 runs ``` This is still a bit slower than cg_llvm + dylib, but that is expected as it doesn't cache any codegen unit in the incremental cache. # Conclusion The only way I could find to improve compilation times of Bevy was to make the `bevy` crate a dylib. If you still want to retain the ability to statically link to `bevy`, you could change set `crate-type = ["rlib", "dylib"]` and use `-Cprefer-dynamic` in the fast compilation profile. Edit(2020-11-10): [bevy#808](https://github.com/bevyengine/bevy/pull/808) has been merged which makes it possible to optionally compile bevy as dylib # Extra: Cargo performance > New 2020-11-05 ## Breakout ```bash $ cargo build --example breakout $ touch examples/game/breakout.rs $ perf record --call-graph dwarf,50000 sh -c "cargo build --example breakout" ``` https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/bjorn3/562a8772b08ce58d0a7746a0f7c12b72/raw/e5646219a679b8fc3082c12578814e4b583ea3dd/bevy_breakout_fresh.collapsed ![full profile of the build](https://i.imgur.com/goaUPfV.png) ![profile zoomed into cargo](https://i.imgur.com/gu3o2cF.png) |program|cpu usage| |---|---| |rustc|50%| |cargo|28%| |ld.lld|19%| |clang|1.8%| |sh|0.7%| ## Old vs new resolver ```bash $ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout" "touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout" Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout Time (mean ± σ): 678.8 ms ± 21.5 ms [User: 573.7 ms, System: 160.8 ms] Range (min … max): 668.1 ms … 812.3 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout Time (mean ± σ): 690.2 ms ± 20.4 ms [User: 582.6 ms, System: 164.8 ms] Range (min … max): 683.3 ms … 891.4 ms 100 runs Warning: The first benchmarking run for this command was significantly slower than the rest (891.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run. Summary 'touch examples/game/breakout.rs && cargo build --example breakout' ran 1.02 ± 0.04 times faster than 'touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout' ``` The new resolver is a bit slower. ## Avoid constructing `anyhow::Error` when not necessary (merged) This avoids an unnecessary backtrace capture for each thread: https://github.com/rust-lang/cargo/pull/8844 ## Avoiding a new thread for each fresh job Saves about 2% on my system: https://github.com/rust-lang/cargo/pull/8837 ## Conclusion The Cargo overhead is higher than I hoped. I opened [rust-lang/cargo#8833](https://github.com/rust-lang/cargo/issues/8833) for this.