changed 4 years ago
Linked with GitHub

Performance notes on PR #76896

This PR implements part of #54089. The change made in this PR is to no longer duplicate #[inline] functions per CGU which uses them in debug builds (the current behavior is unaffected for builds requesting optimization of any level).

I would like to go ahead and merge because even with the regressions:

  1. There are improvements on real world crates: -35% on regex-debug, -12% on encoding-debug, -4.5% on ripgrep-debug,

  2. The strategy above is logically better than the current state as we hand less code to LLVM overall. The regressions seen below are related to other, existing issues with CGU partitioning not caused by this change.

Results Triage

Test case Regressed due to CGU issue 1 Regressed due to CGU issue 2 Is very small benchmark
clap-rs-debug incr-patched: println :heavy_check_mark:
issue-46449-debug (all) :heavy_check_mark: :heavy_check_mark:
webrender-wrench-debug (all) :heavy_check_mark:
tokio-webpush-simple-debug :heavy_check_mark: :heavy_check_mark:
regression-31157-debug full & incr-full :heavy_check_mark: :heavy_check_mark:
deeply-nested-debug incr-full :heavy_check_mark: :heavy_check_mark:
cargo-debug (all) :heavy_check_mark:
hyper-2-debug full :heavy_check_mark:
hyper-2-debug incr-full :heavy_check_mark:
futures-debug full :heavy_check_mark: :heavy_check_mark:
futures-debug incr-full :heavy_check_mark: :heavy_check_mark:
webrender-debug full & incr-full :heavy_check_mark:

CGU Issue 1

This is the known issue where small changes to MIR size or anything else that could effect the size estimate for a CGU (such as including more functions or even removing some previously included functions) can cause CGUs to be partitioned differently which can have major implications on LLVM performance.

You can tell this is happening when the codegen_module and related queries are run more frequently in the detailed self-profile data.

CGU Issue 2

What seems to be happening here is that the same number of CGUs are executed, but they contain different sets of functions which leads to LLVM doing more work in some cases.

I don't believe this is a serious issue since innocuous changes like regoranizing modules can cause exactly the same effect.

Other:

  • "ripgrep-debug incr-patched: println" and "ripgrep-debug incr-patched: unchanged"
    • Self-profile shows codegen_copy_artifacts_from_incr_cache called two more times than previously (~1% more than previously which matches the ~1% regression).
    • Regression is ~1% instruction count, 0.037 seconds difference.
Select a repo