owned this note
owned this note
Published
Linked with GitHub
# Analysis of `rustc-benchmarking-data`
lqd gathered a lot of data in the [`rustc-benchmarking-data`](https://github.com/lqd/rustc-benchmarking-data) repository. This document is nnethercote's analysis of it (with a few additional comments from others). It is long, detailed, and quite dry. It is aimed at Rust compiler developers, and not intended for a general audience. It is also not the highest quality prose, in part because it is likely to become out of date in the not too distant future as performance work addresses things this measurement and analysis has identified.
See the [roadmap](https://hackmd.io/YJQSj_nLSZWl2sbI84R1qA) for a higher-level view of rustc perf work for 2022.
As well as an analysis, it will serve as a means of tracking who is doing/has done what work. Task assignations are shown in square bracket, e.g. “[name]”.
## round-1-cachegrind-check
**Executive summary**
- [x] `parse_tt` and other functions related to macro parsing are the hottest, and correlate highly with allocations. \[nnethercote, this [blog post](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) has details\]
- [x] `memcpy` is high in functions using BitSets a lot for dataflow analysis, e.g. in `http-0.2.6`. \[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984)\]
- [x] Metadata decoding/file reading is roughly constant for all crates; this can be a moderately high proportion of time for tiny crates. Likewise the LLVM `SetImpliedBits` function getting target feature information.
- [x] There is a long tail of moderate opportunities for wins on a few crates, worth looking at each of them briefly, there are probably a few easy wins.
### Hot functions in a single crate
- [x] `deunicode-1.3.1` dominated by `core::ascii::escape_default` \[martingms, [#94776](https://github.com/rust-lang/rust/pull/94776)\]
- [x] `tinyvec-1.5.1` dominated by `<rustc_mir_build::build::Builder>::diverge_cleanup` \[nnethercote: not worth fixing within rustc, but there are several possible fixes within `tinyvec` itself. See [#161](https://github.com/Lokathor/tinyvec/issues/161) for details.\]
- [x] `unicode-normalization-0.1.19` dominated by `try_eval_bits` \[nnethercote, [#97936](https://github.com/rust-lang/rust/pull/97936)\]
### Widely used functions
This shows all functions across all benchmarks, weighted by their `Ir` percentage. This demonstrates breadth of usage.
I've excluded malloc, memcpy, and dlopen/elf stuff, which made up lots of slots.
This table is hard to read, but metadata decoding dominates because of its effect on small crates. The next section ("Hot functions in multiple crates") breaks hot functions down more and is probably more useful.
```
33116.2 counts (weighted fractional, erased)
( 8) 913.6 ( 2.8%, 45.9%): compiler/rustc_serialize/src/opaque.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
( 9) 802.4 ( 2.4%, 48.4%): library/alloc/src/vec/mod.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
( 10) 757.8 ( 2.3%, 50.7%): compiler/rustc_span/src/lib.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
( 13) 466.1 ( 1.4%, 56.3%): ???:SetImpliedBits(llvm::FeatureBitset&, llvm::FeatureBitset const&, llvm::ArrayRef<llvm::SubtargetFeatureKV>)
( 14) 406.1 ( 1.2%, 57.5%): hashbrown-0.12.0/src/raw/mod.rs:<hashbrown::map::RawEntryBuilderMut<rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>, (), core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::from_hash::<hashbrown::map::equivalent<rustc_middle::ty::sty::TyKind, rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>>::{closure#0}>
( 15) 367.0 ( 1.1%, 58.6%): compiler/rustc_serialize/src/leb128.rs:<rustc_metadata::rmeta::decoder::DecodeContext as rustc_serialize::serialize::Decoder>::read_u32
( 17) 342.1 ( 1.0%, 60.7%): library/core/src/slice/iter/macros.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(),core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}>
( 18) 340.9 ( 1.0%, 61.7%): library/core/src/slice/iter/macros.rs:<rustc_span::source_map::SourceMap>::new_imported_source_file
( 19) 340.0 ( 1.0%, 62.8%): compiler/rustc_span/src/lib.rs:<rustc_span::source_map::SourceMap>::new_imported_source_file
( 21) 243.4 ( 0.7%, 64.4%): compiler/rustc_metadata/src/rmeta/decoder.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}>
( 22) 239.5 ( 0.7%, 65.2%): library/core/src/num/uint_macros.rs:<rustc_data_structures::sip128::SipHasher128>::short_write_process_buffer::<u64>
( 24) 220.0 ( 0.7%, 66.5%): compiler/rustc_span/src/lib.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}>
( 25) 214.2 ( 0.6%, 67.1%): compiler/rustc_serialize/src/leb128.rs:<rustc_serialize::opaque::Decoder as rustc_serialize::serialize::Decoder>::read_usize
( 27) 185.4 ( 0.6%, 68.3%): hashbrown-0.12.0/src/map.rs:<hashbrown::map::RawEntryBuilderMut<rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>, (), core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::from_hash::<hashbrown::map::equivalent<rustc_middle::ty::sty::TyKind, rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>>::{closure#0}>
( 28) 183.1 ( 0.6%, 68.9%): library/std/src/sys/unix/alloc.rs:__rdl_alloc
( 29) 182.1 ( 0.5%, 69.4%): compiler/rustc_middle/src/ty/context.rs:<rustc_middle::ty::context::CtxtInterners>::intern_ty
( 30) 181.7 ( 0.5%, 70.0%): compiler/rustc_middle/src/ty/sty.rs:<rustc_middle::ty::sty::TyKind as core::hash::Hash>::hash::<rustc_hash::FxHasher>
```
### Hot functions in multiple crates
This section lists all the functions that hit 1.5% or higher in one benchmark and appear in more than one benchmark. It's a long list. Related functions (i.e. functions that are hot in tandem) are grouped together.
----
\[nnethercote, mostly related to macro parsing, greatly improved, see [here](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) for details]
```
_int_free/_int_malloc/malloc/free/malloc_consolidate, etc., as represented by
_int_free
315: 6.67% async-std-1.10.0
375: 5.85% yansi-0.5.0
401: 5.60% time-macros-0.2.3
582: 4.50% inotify-0.10.0
591: 4.47% web-sys-0.3.56
667: 4.19% nix-0.23.1
685: 4.13% vsdb-0.13.10
687: 4.11% cloudabi-0.1.0
692: 4.10% vsdb_derive-0.2.2
706: 4.06% pest_generator-2.1.3
726: 3.99% futures-lite-1.12.0
736: 3.95% scroll_derive-0.11.0
739: 3.92% num-derive-0.3.3
744: 3.91% raw-cpuid-10.2.0
751: 3.89% clap_derive-3.0.12
755: 3.89% prost-derive-0.9.0
760: 3.88% tonic-build-0.6.2
763: 3.86% pyo3-macros-backend-0.15.1
764: 3.86% diesel_derives-1.4.1
765: 3.85% wasm-bindgen-backend-0.2.79
```
This table undersells the cost of allocations a lot, because it's only showing `_int_free` results. But it also oversells a little in a different way, because jemalloc is more efficient than glibc malloc (which is measured here). We can probably assume allocations in general account for double the percentage in this table. See the DHAT results for more data.
Note that these crates correlate highly with the crates where `parse_tt` and related functions are hot.
----
\[nnethercote, greatly improved, see [here](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) for details]
These are all the macro parsing functions.
```
macro_parser::parse_tt
359: 5.99% async-std-1.10.0
462: 5.08% async-std-1.10.0
550: 4.63% time-macros-0.2.3
642: 4.28% yansi-0.5.0
879: 3.65% time-macros-0.2.3
1115: 3.32% yansi-0.5.0
2884: 2.29% num-derive-0.3.3
2939: 2.25% pest_generator-2.1.3
3095: 2.12% ctor-0.1.21
3103: 2.11% scroll_derive-0.11.0
3112: 2.10% tonic-build-0.6.2
3167: 2.06% vsdb_derive-0.2.2
3265: 2.00% stdweb-derive-0.5.3
3500: 1.91% mockall_derive-0.11.0
3512: 1.91% wasm-bindgen-backend-0.2.79
3581: 1.89% futures-macro-0.3.19
3620: 1.88% wayland-scanner-0.30.0-alpha3
3623: 1.88% clap_derive-3.0.12
3806: 1.82% prost-derive-0.9.0
3880: 1.80% diesel_derives-1.4.1
```
```
<rustc_parse::parser::Parser>::{bump,bump_with}
603: 4.43% bump time-macros-0.2.3
791: 3.80% bump async-std-1.10.0
2784: 2.36% bump yansi-0.5.0
3240: 2.01% bump_with time-macros-0.2.3
3673: 1.87% bump web-sys-0.3.56
4078: 1.73% bump_with async-std-1.10.0
4228: 1.67% bump num-derive-0.3.3
4549: 1.57% bump futures-macro-0.3.19
4566: 1.56% bump ctor-0.1.21
5171: 1.39% bump mockall_derive-0.11.0
5288: 1.37% bump pest_generator-2.1.3
5343: 1.35% bump wasm-bindgen-backend-0.2.79
5359: 1.35% bump tonic-build-0.6.2
5453: 1.32% bump vsdb_derive-0.2.2
5634: 1.27% bump wayland-scanner-0.30.0-alpha3
5647: 1.27% bump stdweb-derive-0.5.3
5845: 1.22% bump scroll_derive-0.11.0
5872: 1.22% bump enum-as-inner-0.3.3
5918: 1.20% bump clap_derive-3.0.12
5934: 1.20% bump ref-cast-impl-1.0.6
```
```
<rustc_parse::parser::TokenCursor>::{next,next_desugared}
633: 4.30% next time-macros-0.2.3
844: 3.70% next async-std-1.10.0
2770: 2.37% next yansi-0.5.0
3312: 1.98% next web-sys-0.3.56
3504: 1.91% next_desugared time-macros-0.2.3
4214: 1.68% next num-derive-0.3.3
4490: 1.59% next futures-macro-0.3.19
4569: 1.56% next ctor-0.1.21
5071: 1.42% next mockall_derive-0.11.0
5240: 1.38% next pest_generator-2.1.3
5274: 1.37% next wasm-bindgen-backend-0.2.79
5328: 1.35% next tonic-build-0.6.2
5396: 1.33% next vsdb_derive-0.2.2
5576: 1.28% next wayland-scanner-0.30.0-alpha3
5651: 1.27% next stdweb-derive-0.5.3
5714: 1.25% next enum-as-inner-0.3.3
5844: 1.22% next scroll_derive-0.11.0
5874: 1.21% next clap_derive-3.0.12
5888: 1.21% next_desugared async-std-1.10.0
5999: 1.18% next ref-cast-impl-1.0.6
```
```
<rustc_ast::tokenstream::Cursor>::next_with_spacing
1580: 3.03% time-macros-0.2.3
2386: 2.61% async-std-1.10.0
4437: 1.61% yansi-0.5.0
5322: 1.36% web-sys-0.3.56
6027: 1.17% num-derive-0.3.3
6302: 1.10% futures-macro-0.3.19
6351: 1.09% ctor-0.1.21
6867: 0.97% mockall_derive-0.11.0
6870: 0.97% pest_generator-2.1.3
6939: 0.96% tonic-build-0.6.2
7000: 0.95% wasm-bindgen-backend-0.2.79
7118: 0.93% vsdb_derive-0.2.2
7406: 0.89% stdweb-derive-0.5.3
7409: 0.89% wayland-scanner-0.30.0-alpha3
7619: 0.86% scroll_derive-0.11.0
7735: 0.85% ref-cast-impl-1.0.6
7758: 0.85% enum-as-inner-0.3.3
7771: 0.85% clap_derive-3.0.12
7917: 0.82% pyo3-macros-backend-0.15.1
8071: 0.80% prost-derive-0.9.0
```
```
<rustc_expand::mbe::macro_parser::MatcherPos as core::clone::Clone>::clone
<rustc_expand::mbe::macro_parser::MatcherPosHandle as core::clone::Clone>::clone
4659: 1.54% MatcherPos async-std-1.10.0
6090: 1.15% MatcherPos time-macros-0.2.3
6831: 0.98% MatcherPos yansi-0.5.0
14457: 0.39% MatcherPos inotify-0.10.0
18643: 0.31% MatcherPosHandle async-std-1.10.0
23304: 0.26% MatcherPos funty-2.0.0
24134: 0.25% MatcherPosHandle time-macros-0.2.3
26172: 0.24% MatcherPos rustfix-0.6.0
29551: 0.22% MatcherPos async-std-1.10.0
33695: 0.19% MatcherPosHandle yansi-0.5.0
```
----
\[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984), completed, addresses the biggest of these: keccak, http, vte\]
```
memcpy
(also: 9.19% for keccak in rustc-perf)
274: 7.35% http-0.2.6
2466: 2.56% vte-0.10.1
2721: 2.40% js-sys-0.3.56
2801: 2.35% unic-ucd-segment-0.9.0
2940: 2.25% aes-gcm-0.9.4
2953: 2.24% pbkdf2-0.10.0
2976: 2.21% stdweb-derive-0.5.3
2999: 2.20% c2-chacha-0.3.3
3047: 2.15% pest_generator-2.1.3
3072: 2.13% rls-data-0.19.1
3073: 2.13% tonic-build-0.6.2
3076: 2.13% num-derive-0.3.3
3101: 2.11% ctor-0.1.21
3115: 2.10% sentry-types-0.24.2
3116: 2.10% pest-2.1.3
3117: 2.10% lsp-types-0.91.1
3122: 2.10% mockall_derive-0.11.0
3124: 2.09% wasm-bindgen-backend-0.2.79
3128: 2.09% postgres-protocol-0.6.3
3136: 2.09% cargo_metadata-0.14.1
```
keccak and http-0.2.6 high numbers are due to large bitsets in borrowck dataflow analysis. Note that keccak-0.1.0 has some significant changes vs. keccak in rustc-benchmarks.
----
[Hard to improve. On x86-64 we query ~50 target feature flags, for things like SSE*, AVX*, etc. This is within `target_features` in `compiler/rustc_codegen_llvm/src/llvm_util.rs`. We check one flag at a time because the LLVM interface makes it hard to do otherwise, and LLVM is moderately slow to check each one. Even though it's a significant fraction of execution time for small programs, the absolute time is low, so doesn't seem worth any further effort.]
```
???:SetImpliedBits(llvm::FeatureBitset&, llvm::FeatureBitset const&, llvm::ArrayRef<llvm::SubtargetFeatureKV>)
1425: 3.11% opaque-debug-0.3.0
1436: 3.10% new_debug_unreachable-1.0.4
1443: 3.10% tinyvec_macros-0.1.0
1570: 3.03% matches-0.1.9
1692: 2.97% cfg-if-1.0.0
1699: 2.97% pin-utils-0.1.0
1733: 2.95% match_cfg-0.1.0
1775: 2.93% fuchsia-cprng-0.1.1
1918: 2.85% cty-0.2.2
1923: 2.85% unic-ucd-version-0.9.0
1999: 2.82% if_chain-1.0.2
2114: 2.76% assert_matches-1.5.0
2144: 2.74% more-asserts-0.2.2
2206: 2.71% wincolor-1.0.3
2217: 2.71% winapi-util-0.1.5
2219: 2.71% fsevent-sys-4.1.0
2256: 2.68% miow-0.4.0
2292: 2.66% cpufeatures-0.2.1
2309: 2.65% schannel-0.1.19
2354: 2.63% byte-tools-0.3.1
```
This is significant only for very small crates. It's getting some target feature information from LLVM.
----
\[nnethercote, [#97575](https://github.com/rust-lang/rust/pull/97575) fixes it]
```
<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
404: 5.57% (2793830 Ir) fsevent-sys-4.1.0
406: 5.56% (2793830 Ir) winapi-util-0.1.5
412: 5.55% (2793830 Ir) wincolor-1.0.3
420: 5.49% (2793830 Ir) miow-0.4.0
426: 5.44% (2793830 Ir) schannel-0.1.19
444: 5.22% (2793830 Ir) output_vt100-0.1.2
446: 5.22% (2793830 Ir) precomputed-hash-0.1.1
448: 5.19% (2793830 Ir) typeable-0.1.2
453: 5.12% (2793830 Ir) encoding_index_tests-0.1.4
464: 5.06% (2929437 Ir) crossbeam-0.8.1
471: 5.03% (2793830 Ir) fsevent-2.1.2
476: 5.01% (2843100 Ir) enum_primitive-0.1.1
481: 4.97% (2793830 Ir) block-cipher-0.99.99
482: 4.97% (2793830 Ir) stream-cipher-0.99.99
484: 4.95% (2793830 Ir) string_cache_shared-0.3.0
489: 4.93% (2475668 Ir) winapi-util-0.1.5
490: 4.93% (2475668 Ir) fsevent-sys-4.1.0
493: 4.92% (2475668 Ir) wincolor-1.0.3
496: 4.90% (2793830 Ir) maplit-1.0.2
498: 4.89% (2793830 Ir) mac-0.1.1
```
```
...::imported_source_files::...
3064: 2.14% (1071721 Ir) fsevent-sys-4.1.0
3075: 2.13% (1071721 Ir) winapi-util-0.1.5
3077: 2.13% (1071721 Ir) wincolor-1.0.3
3098: 2.11% (1071721 Ir) miow-0.4.0
3131: 2.09% (1071721 Ir) schannel-0.1.19
3248: 2.00% (1071721 Ir) output_vt100-0.1.2
3256: 2.00% (1071721 Ir) precomputed-hash-0.1.1
3289: 1.99% (1071721 Ir) typeable-0.1.2
3350: 1.96% (1071721 Ir) encoding_index_tests-0.1.4
3369: 1.95% (1130584 Ir) crossbeam-0.8.1
3430: 1.93% (1093218 Ir) enum_primitive-0.1.1
3446: 1.93% (1071721 Ir) fsevent-2.1.2
3498: 1.91% (1071721 Ir) stream-cipher-0.99.99
3501: 1.91% (1071721 Ir) block-cipher-0.99.99
3547: 1.90% (1071721 Ir) string_cache_shared-0.3.0
3631: 1.88% (1071721 Ir) maplit-1.0.2
3664: 1.87% (1071721 Ir) mac-0.1.1
3746: 1.84% (1071721 Ir) wayland-protocols-0.30.0-alpha3
3834: 1.82% (1059387 Ir) num-0.4.0
3882: 1.80% (1071721 Ir) mio-named-pipes-0.1.7
```
```
<rustc_span::source_map::SourceMap>::new_imported_source_file
3074: 2.13% (1067761 Ir) winapi-util-0.1.5
3078: 2.13% (1067761 Ir) fsevent-sys-4.1.0
3089: 2.12% (1067761 Ir) wincolor-1.0.3
3090: 2.12% (1064836 Ir) fsevent-sys-4.1.0
3094: 2.12% (1064836 Ir) winapi-util-0.1.5
3097: 2.12% (1064836 Ir) wincolor-1.0.3
3114: 2.10% (1067761 Ir) miow-0.4.0
3130: 2.09% (1064836 Ir) miow-0.4.0
3148: 2.08% (1067761 Ir) schannel-0.1.19
3153: 2.07% (1064836 Ir) schannel-0.1.19
3271: 2.00% (1067761 Ir) precomputed-hash-0.1.1
3275: 1.99% (1064836 Ir) precomputed-hash-0.1.1
3288: 1.99% (1064836 Ir) output_vt100-0.1.2
3292: 1.99% (1067761 Ir) output_vt100-0.1.2
3300: 1.98% (1064836 Ir) typeable-0.1.2
3322: 1.98% (1067761 Ir) typeable-0.1.2
3357: 1.96% (1067761 Ir) encoding_index_tests-0.1.4
3389: 1.95% (1064836 Ir) encoding_index_tests-0.1.4
3425: 1.94% (1123069 Ir) crossbeam-0.8.1
3426: 1.94% (1126309 Ir) crossbeam-0.8.1
```
```
<rustc_metadata::rmeta::decoder::DecodeContext as rustc_serialize::serialize::Decoder>::read_u32
4352: 1.63% pretty_env_logger-0.4.0
6813: 0.98% rand_os-0.2.2
6968: 0.96% async-compression-0.3.12
6969: 0.96% thread-id-4.0.0
7084: 0.94% strum-0.23.0
7124: 0.93% tokio-buf-0.2.0-alpha.1
7210: 0.92% matchers-0.1.0
7228: 0.92% void-1.0.2
7344: 0.90% crypto-hash-0.3.4
7352: 0.90% gethostname-0.2.2
7391: 0.90% errno-0.2.8
7521: 0.88% inotify-sys-0.1.5
7579: 0.87% atomic-waker-1.0.0
7606: 0.86% malloc_buf-1.0.0
7670: 0.86% terminal_size-0.1.17
7679: 0.86% remove_dir_all-0.7.0
7759: 0.85% hostname-0.3.1
7865: 0.83% ident_case-1.0.1
7906: 0.82% atty-0.2.14
7944: 0.82% clicolors-control-1.0.1
```
Metadata decoding. High relative number for many, but mostly on very short-running crates, with constant amounts of decoding, presumably for decoding common libs like `std`, `core`.
----
\[nnethercote, [#94316](https://github.com/rust-lang/rust/pull/94316
)]
```
rustc_lexer::unescape::scan_escape
1355: 3.15% pkcs8-0.8.0
4650: 1.54% der-0.6.0-pre.0
5519: 1.30% bitvec-1.0.0
6190: 1.12% snafu-0.7.0
7261: 0.91% unicode_categories-0.1.1
7732: 0.85% web-sys-0.3.56
10163: 0.58% elliptic-curve-0.12.0-pre.1
12835: 0.44% rusoto_s3-0.47.0
13723: 0.41% pkcs8-0.8.0
14869: 0.38% bumpalo-3.9.1
```
```
rustc_lexer::unescape::unescape_literal::<<rustc_ast::ast::LitKind>::from_lit_token and friends
2843: 2.32% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
6105: 1.15% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
7234: 0.91% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> bitvec-1.0.0
7904: 0.82% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
8997: 0.69% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1
9719: 0.62% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> snafu-0.7.0
9847: 0.61% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
11220: 0.52% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
11318: 0.51% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
13177: 0.43% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> elliptic-curve-0.12.0-pre.1
14008: 0.40% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
17420: 0.33% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> rusoto_s3-0.47.0
18179: 0.32% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> bitvec-1.0.0
18485: 0.31% <rustc_ast::ast::Lit>::from_lit_token lexical-core-0.8.2
19087: 0.30% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
19243: 0.30% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1
19244: 0.30% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1
23059: 0.26% <rustc_ast::ast::Lit>::from_lit_token web-sys-0.3.56
23736: 0.26% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
23924: 0.26% <rustc_ast::ast::Lit>::from_lit_token async-compression-0.3.12
```
----
[nnethercote, [#98153](https://github.com/rust-lang/rust/pull/98153)]
```
<rustc_lint::builtin::MissingDoc as rustc_lint::passes::LateLintPass>::enter_lint_attrs
1631: 3.00% structopt-0.3.26
3259: 2.00% structopt-0.3.26
4776: 1.50% mockall-0.11.0
6739: 1.00% mockall-0.11.0
9250: 0.67% derive_builder-0.10.2
10079: 0.59% tracing-0.1.29
12858: 0.44% derive_builder-0.10.2
14079: 0.40% tracing-0.1.29
16334: 0.35% pest_derive-2.1.0
16479: 0.34% bitflags-1.3.2
```
----
[lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)]
```
super_relate_consts::<rustc_infer::infer::equate::Equate>
super_relate_consts::<rustc_infer::infer::combine::ConstInferUnifier>
1294: 3.19% equate::Equate> bitmaps-3.1.0
6584: 1.03% equate::Equate> hex-0.4.3
6740: 1.00% equate::Equate> bitmaps-3.1.0
8574: 0.74% combine::ConstInferUnifier> nalgebra-0.30.1
11867: 0.49% equate::Equate> secrecy-0.8.0
15796: 0.36% equate::Equate> nalgebra-0.30.1
18242: 0.32% equate::Equate> hex-0.4.3
27985: 0.23% combine::ConstInferUnifier> nalgebra-0.30.1
28678: 0.22% equate::Equate> bytemuck-1.7.3
38119: 0.17% equate::Equate> bytestring-1.0.0
```
```
super_relate_tys::<rustc_infer::infer::equate::Equate>
super_relate_tys::<rustc_infer::infer::combine::Generalizer>
4144: 1.71% equate::Equate bitmaps-3.1.0
7745: 0.85% combine::Generalizer pbkdf2-0.10.0
9182: 0.67% equate::Equate hex-0.4.3
9981: 0.60% equate::Equate nalgebra-0.30.1
11379: 0.51% combine::Generalizer aes-gcm-0.9.4
15196: 0.37% combine::Generalizer quickcheck-1.0.3
15212: 0.37% combine::Generalizer sha3-0.10.0
17959: 0.32% equate::Equate secrecy-0.8.0
18288: 0.31% equate::Equate vsdb-0.13.10
18587: 0.31% combine::Generalizer jsonrpc-client-transports-18.0.0
```
----
[This code was heavily optimised a couple of years ago for rustc-perf benchmarks like `keccak` and `inflate`, and further improvements are difficult. [#97674](https://github.com/rust-lang/rust/pull/97674) has some small improvements.]
```
process_obligations
2037: 2.80% wast-39.0.0
2614: 2.47% wast-39.0.0
2926: 2.26% wast-39.0.0
3012: 2.18% rustc-serialize-0.3.24
3017: 2.18% wasmparser-0.82.0
3854: 1.81% rustc-serialize-0.3.24
4291: 1.65% rustc-serialize-0.3.24
4434: 1.61% wasmparser-0.82.0
4457: 1.60% wast-39.0.0
4482: 1.59% wast-39.0.0
4726: 1.51% wast-39.0.0
4856: 1.48% inflate-0.4.5
4971: 1.45% mime-0.3.16
5144: 1.40% wasmparser-0.82.0
5555: 1.29% mime-0.3.16
5684: 1.26% wast-39.0.0
5924: 1.20% rustc-serialize-0.3.24
6093: 1.15% wast-39.0.0
6141: 1.14% wasmparser-0.82.0
6188: 1.13% inflate-0.4.5
6250: 1.11% rustc-serialize-0.3.24
6284: 1.10% primitive-types-0.10.1
6328: 1.09% rustc-serialize-0.3.24
6343: 1.09% inflate-0.4.5
6357: 1.09% wast-39.0.0
7032: 0.95% keccak-0.1.0
7038: 0.95% primitive-types-0.10.1
7159: 0.93% keccak-0.1.0
7166: 0.93% wasmparser-0.82.0
7168: 0.93% wasmparser-0.82.0
```
```
uninlined_get_root_key
4727: 1.51% (211196081 Ir) wast-39.0.0
5969: 1.19% (5880320 Ir) mime-0.3.16
6988: 0.95% (79707031 Ir) redis-0.21.5
10444: 0.57% (34953834 Ir) rustc-serialize-0.3.24
14051: 0.40% (2061753 Ir) keccak-0.1.0
14277: 0.39% (439948359 Ir) nalgebra-0.30.1
16888: 0.34% (22681340 Ir) http-0.2.6
17441: 0.33% (12036120 Ir) vte-0.10.1
18441: 0.31% (27171051 Ir) procfs-0.12.0
23024: 0.26% (6715340 Ir) rand-0.8.4
```
A few crates over-represented: `wast-39.0.0`, `rustc-serialize-0.3.24`, `wasmparser-0.82.0`, `inflate-0.4.5`.
----
[This is caused by lots of type folding and interning, very hard to improve.]
```
hashbrown...::from_hash::
2898: 2.28% cexpr-0.6.0
3311: 1.98% combine-4.6.3
3877: 1.80% diesel-1.4.8
4972: 1.45% pest_meta-2.1.3
5334: 1.35% pbkdf2-0.10.0
5413: 1.33% der-parser-6.0.1
5442: 1.32% arbitrary-1.0.3
5701: 1.26% redis-0.21.5
5772: 1.24% actix-web-4.0.0-beta.21
6016: 1.18% bitvec-1.0.0
6029: 1.17% cookie_store-0.15.1
6044: 1.17% quickcheck-1.0.3
6088: 1.15% tera-1.15.0
6097: 1.15% elliptic-curve-0.12.0-pre.1
6162: 1.13% aes-gcm-0.9.4
6183: 1.13% clap-3.0.13
6200: 1.12% actix-http-3.0.0-beta.19
6379: 1.08% jsonrpc-client-transports-18.0.0
6412: 1.07% convert_case-0.5.0
6430: 1.07% cexpr-0.6.0
```
----
\[nnethercote, [#96210](https://github.com/rust-lang/rust/pull/96210) + [#96683](https://github.com/rust-lang/rust/pull/96683
)]
```
<rustc_parse::lexer::StringReader>::next_token
2433: 2.58% web-sys-0.3.56
5945: 1.19% bitflags-1.3.2
8053: 0.81% unicode_categories-0.1.1
8215: 0.78% quick-error-2.0.1
8584: 0.74% pin-project-lite-0.2.8
9592: 0.63% pest-2.1.3
10257: 0.58% mio-named-pipes-0.1.7
10363: 0.57% fixed-hash-0.7.0
10450: 0.57% web-sys-0.3.56
10505: 0.56% tracing-0.1.29
10606: 0.56% uint-0.9.2
10684: 0.55% downcast-rs-1.2.0
11257: 0.52% arrayref-0.3.6
11538: 0.50% web-sys-0.3.56
11961: 0.48% idna-0.2.3
12249: 0.47% jni-sys-0.3.0
12330: 0.46% static_assertions-1.1.0
12674: 0.45% assert_matches-1.5.0
12823: 0.44% parking_lot-0.12.0
13230: 0.43% web-sys-0.3.56
```
```
<rustc_parse::lexer::tokentrees::TokenTreesReader>::parse_token_tree
4048: 1.74% web-sys-0.3.56
7244: 0.91% bitflags-1.3.2
9726: 0.62% quick-error-2.0.1
9990: 0.60% pin-project-lite-0.2.8
11603: 0.50% unicode_categories-0.1.1
12702: 0.45% pest-2.1.3
12806: 0.44% fixed-hash-0.7.0
12822: 0.44% mio-named-pipes-0.1.7
12999: 0.44% tracing-0.1.29
13026: 0.43% downcast-rs-1.2.0
```
```
<rustc_lexer::cursor::Cursor>::advance_token
4211: 1.68% web-sys-0.3.56
7009: 0.95% bitflags-1.3.2
9521: 0.64% pin-project-lite-0.2.8
9932: 0.60% quick-error-2.0.1
11691: 0.49% unicode_categories-0.1.1
12004: 0.48% pest-2.1.3
12528: 0.45% mio-named-pipes-0.1.7
12625: 0.45% static_assertions-1.1.0
12643: 0.45% tracing-0.1.29
13025: 0.43% downcast-rs-1.2.0
```
----
\[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984)\]
```
BitSet<...>::union
2929: 2.26% http-0.2.6
4775: 1.50% vte-0.10.1
6620: 1.02% language-tags-0.3.2
8138: 0.79% vte-0.10.1
10793: 0.54% tinyvec-1.5.1
11031: 0.53% stdweb-derive-0.5.3
11483: 0.50% language-tags-0.3.2
12603: 0.45% keccak-0.1.0
13153: 0.43% wasmparser-0.82.0
15617: 0.36% futures-macro-0.3.19
16358: 0.35% regalloc-0.0.34
16887: 0.34% http-0.2.6
17406: 0.33% json-0.12.4
21027: 0.28% inflate-0.4.5
21621: 0.28% num-derive-0.3.3
22142: 0.27% cranelift-codegen-meta-0.80.0
26357: 0.24% wasm-bindgen-backend-0.2.79
26554: 0.24% vte-0.10.1
27012: 0.23% enumset_derive-0.5.5
29555: 0.22% mockall_derive-0.11.0
```
----
[lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)]
```
<rustc_trait_selection::traits::select::SelectionContext>::match_impl
3016: 2.18% match_impl bitmaps-3.1.0
6666: 1.01% match_impl nalgebra-0.30.1
7145: 0.93% match_impl bitmaps-3.1.0
7985: 0.81% match_impl hex-0.4.3
12231: 0.47% match_impl scroll-0.11.0
12901: 0.44% match_impl bitmaps-3.1.0
12903: 0.44% match_impl bitmaps-3.1.0
13382: 0.42% match_impl bitmaps-3.1.0
13727: 0.41% match_impl ordered-float-2.10.0
14243: 0.40% match_impl nalgebra-0.30.1
14451: 0.39% match_impl bytestring-1.0.0
14817: 0.38% match_impl::{closure#0}> bitmaps-3.1.0
14829: 0.38% match_impl bitmaps-3.1.0
14956: 0.38% match_impl lzw-0.10.0
15418: 0.36% match_impl strsim-0.10.0
15450: 0.36% match_impl::{closure#0}> bitmaps-3.1.0
16229: 0.35% match_impl num-complex-0.4.0
16832: 0.34% match_impl hex-0.4.3
17045: 0.33% match_impl aes-gcm-0.9.4
17628: 0.32% match_impl subtle-2.4.1
```
----
[lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)]
```
fast_reject::simplify_type
4642: 1.54% bitmaps-3.1.0
10384: 0.57% nalgebra-0.30.1
11646: 0.50% num-complex-0.4.0
11790: 0.49% bytestring-1.0.0
11832: 0.49% ordered-float-2.10.0
13444: 0.42% hex-0.4.3
14691: 0.38% scroll-0.11.0
14982: 0.38% bigdecimal-0.3.0
18130: 0.32% lzw-0.10.0
19718: 0.30% aes-gcm-0.9.4
```
----
[lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)]
```
<rustc_infer::infer::InferCtxtInner>::rollback_to
4787: 1.50% (150578940 Ir) bitmaps-3.1.0
7036: 0.95% (95183010 Ir) bitmaps-3.1.0
8789: 0.72% (2176185 Ir) secrecy-0.8.0
10271: 0.58% (41184070 Ir) rustc-rayon-0.3.2
10343: 0.57% (5906612 Ir) hex-0.4.3
10497: 0.56% (4717135 Ir) scroll-0.11.0
10703: 0.55% (628248294 Ir) nalgebra-0.30.1
11310: 0.51% (29431780 Ir) serde_with-1.11.0
11410: 0.51% (174009408 Ir) diesel-1.4.8
11837: 0.49% (5214125 Ir) ordered-float-2.10.0
12520: 0.45% (1460629 Ir) strsim-0.10.0
13314: 0.42% (13746895 Ir) funty-2.0.0
13350: 0.42% (1903593 Ir) aes-gcm-0.9.4
13428: 0.42% (8132966 Ir) num-complex-0.4.0
13740: 0.41% (15534789 Ir) arbitrary-1.0.3
13947: 0.40% (1519390 Ir) bytemuck-1.7.3
14557: 0.39% (1717338 Ir) pbkdf2-0.10.0
14940: 0.38% (8150921 Ir) parity-scale-codec-2.3.1
15389: 0.37% (37069588 Ir) bitmaps-3.1.0
15499: 0.36% (2017806 Ir) smallvec-1.8.0
```
## round-2-llvm-lines-leaf-crate
**Executive summary**
- Very little room for improvement here.
----
The top functions in `std`, `alloc` and `core`, as weighted by "Lines" counts. (The percentages here are more useful as a relative measure than an absolute measure.)
```
13677742 counts (weighted integral, erased)
( 1) 269227 ( 2.0%, 2.0%): <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
( 2) 255665 ( 1.9%, 3.8%): alloc::raw_vec::RawVec<T,A>::grow_amortized
( 3) 228899 ( 1.7%, 5.5%): core::option::Option<T>::map
( 4) 158437 ( 1.2%, 6.7%): alloc::alloc::box_free
( 5) 154300 ( 1.1%, 7.8%): alloc::raw_vec::RawVec<T,A>::allocate_in
( 6) 151742 ( 1.1%, 8.9%): core::iter::traits::iterator::Iterator::try_fold
( 7) 140484 ( 1.0%, 9.9%): alloc::raw_vec::RawVec<T,A>::current_memory
( 8) 136639 ( 1.0%, 10.9%): core::iter::traits::iterator::Iterator::fold
( 9) 136108 ( 1.0%, 11.9%): core::result::Result<T,E>::map_err
( 10) 135024 ( 1.0%, 12.9%): core::mem::replace
( 11) 128457 ( 0.9%, 13.9%): <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
( 12) 114300 ( 0.8%, 14.7%): <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
( 13) 104392 ( 0.8%, 15.5%): core::ptr::read
( 14) 97865 ( 0.7%, 16.2%): <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
( 15) 94520 ( 0.7%, 16.9%): core::alloc::layout::Layout::array
( 16) 83034 ( 0.6%, 17.5%): core::slice::iter::Iter<T>::post_inc_start
( 17) 82289 ( 0.6%, 18.1%): core::ops::function::FnOnce::call_once
( 18) 78632 ( 0.6%, 18.6%): core::iter::adapters::map::map_fold::{{closure}}
( 19) 78011 ( 0.6%, 19.2%): core::result::Result<T,E>::map
( 20) 75512 ( 0.6%, 19.8%): core::slice::iter::Iter<T>::new
( 21) 73783 ( 0.5%, 20.3%): <&T as core::fmt::Debug>::fmt
( 22) 71914 ( 0.5%, 20.8%): <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop
( 23) 70031 ( 0.5%, 21.3%): <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::next
( 24) 69193 ( 0.5%, 21.8%): core::ptr::metadata::from_raw_parts_mut
( 25) 69071 ( 0.5%, 22.4%): <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
( 26) 67503 ( 0.5%, 22.8%): alloc::vec::Vec<T,A>::push
( 27) 64384 ( 0.5%, 23.3%): core::fmt::ArgumentV1::new
( 28) 60300 ( 0.4%, 23.8%): core::mem::maybe_uninit::MaybeUninit<T>::assume_init
( 29) 55834 ( 0.4%, 24.2%): core::char::methods::encode_utf8_raw
( 30) 55700 ( 0.4%, 24.6%): alloc::vec::Vec<T,A>::extend_desugared
```
`grow_amortized` has been heavily optimized in the past, and the other top functions are generally very small, hard to improve upon.
One possibility: `map_fold`: just inline and remove it? Most affected: actix-router, quote, diesel_derives, bytecount \[nnethercote, [#94442](https://github.com/rust-lang/rust/pull/94442)\], didn't help]
## round-3-dhat
**Executive summary**
- [x] `parse_tt` and related macro parsing functions cause by far the most allocations, and correlate highly with Cachegrind results. [nnethercote, this [blog post](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) has details]
- [x] Large BitSets are the next best opportunity, featuring in several crates. \[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984)\]
- [x] After that, a handful of areas that would help one or two crates and might be worth some effort to look for easy wins, e.g. `match_impl`, `super_relate_tys`, ena snapshot vecs, `escape_defaults`, `ModChild`, thir `mirror_expr_inner`, etc. [lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345), deals with `match_impl`/`super_relate_tys`; nnethercote [#98569](https://github.com/rust-lang/rust/pull/98569), deals with `ModChild`; not much other scope for easy improvements]
----
Top 20 malloc users, from Cachegrind, and biggest source of allocations as determined by looking at the DHAT profiles.
```
315: 6.67% async-std-1.10.0 macro parsing
375: 5.85% yansi-0.5.0 macro parsing
401: 5.60% time-macros-0.2.3 macro parsing
582: 4.50% inotify-0.10.0 macro parsing
591: 4.47% web-sys-0.3.56 other parsing/AST stuff
667: 4.19% nix-0.23.1 macro parsing
685: 4.13% vsdb-0.13.10 super_relate_tys
687: 4.11% cloudabi-0.1.0 very spread out, no esp. hot places
692: 4.10% vsdb_derive-0.2.2 macro parsing
706: 4.06% pest_generator-2.1.3 macro parsing
726: 3.99% futures-lite-1.12.0 macro parsing (a little), spread out
736: 3.95% scroll_derive-0.11.0 macro parsing
739: 3.92% num-derive-0.3.3 macro parsing
744: 3.91% raw-cpuid-10.2.0 very spread out
751: 3.89% clap_derive-3.0.12 macro parsing
755: 3.89% prost-derive-0.9.0 macro parsing
760: 3.88% tonic-build-0.6.2 macro parsing
763: 3.86% pyo3-macros-backend-0.15.1 macro parsing
764: 3.86% diesel_derives-1.4.1 macro parsing
765: 3.85% wasm-bindgen-backend-0.2.79 macro parsing
```
----
Hottest program points (PPs) by allocation rate (blocks). This isn't a perfect metric because sometimes multiple distinct PPs are best considered in combination, which requires human understanding of the stack traces. But it's a good start, and while the Cachegrind numbers can be high if there's lots of allocations spread across lots of places, single PPs that allocate a lot are more likely to be optimizable.
Ones marked with `**` are not in the top 20 Cachegrind list above.
```
251.3 / Minstr (5,362,131 blocks) / async-std-1.10.0
251.0 / Minstr (5,355,825 blocks) / async-std-1.10.0
212.0 / Minstr (2,110,433 blocks) / bitmaps-3.1.0 ** match_impl
176.2 / Minstr (3,759,571 blocks) / async-std-1.10.0
176.1 / Minstr (654,394 blocks) / time-macros-0.2.3
176.0 / Minstr (654,019 blocks) / time-macros-0.2.3
175.7 / Minstr (653,182 blocks) / time-macros-0.2.3
175.7 / Minstr (3,750,274 blocks) / async-std-1.10.0
173.8 / Minstr (646,121 blocks) / time-macros-0.2.3
154.2 / Minstr (158,177 blocks) / yansi-0.5.0
150.7 / Minstr (154,606 blocks) / yansi-0.5.0
98.5 / Minstr (280,913 blocks) / num-derive-0.3.3
97.5 / Minstr (164,019 blocks) / vsdb_derive-0.2.2
97.0 / Minstr (190,944 blocks) / pest_generator-2.1.3
90.7 / Minstr (225,904 blocks) / tonic-build-0.6.2
90.7 / Minstr (108,378 blocks) / scroll_derive-0.11.0
90.3 / Minstr (78,899 blocks) / ctor-0.1.21
87.3 / Minstr (89,576 blocks) / yansi-0.5.0
86.7 / Minstr (340,107 blocks) / clap_derive-3.0.12
86.1 / Minstr (88,312 blocks) / yansi-0.5.0
85.9 / Minstr (75,505 blocks) / stdweb-derive-0.5.3 ** macro parsing
81.7 / Minstr (312,670 blocks) / wasm-bindgen-backend-0.2.79
81.3 / Minstr (60,048 blocks) / enumflags2_derive-0.7.3 ** macro parsing
81.3 / Minstr (595,413 blocks) / mockall_derive-0.11.0 ** macro parsing
80.7 / Minstr (284,977 blocks) / wayland-scanner-0.30.0-alpha3 ** macro parsing
79.3 / Minstr (133,508 blocks) / futures-macro-0.3.19
79.0 / Minstr (257,950 blocks) / prost-derive-0.9.0
77.2 / Minstr (185,937 blocks) / diesel_derives-1.4.1
77.2 / Minstr (178,079 blocks) / structopt-derive-0.4.18 ** macro parsing
77.1 / Minstr (77,488 blocks) / hex-0.4.3 ** match_impl
```
Macro parsing dominates here, again. `match_impl` also shows up.
----
Hottest program points (PPs) by allocation rate (bytes). Excludes tiny crates dominated by metadata decoding, which all have a `bytes` value in the range 0.9-2.0MB, mostly around 1.4MB. The rightmost column indicates the hot allocation causes.
Ones marked with `**` are not in the top 20 Cachegrind list above.
```
33,740.63 / Minstr (720,052,608 bytes) / async-std-1.10.0
33,375.40 / Minstr (124,055,232 bytes) / time-macros-0.2.3
28,855.35 / Minstr (8,407,296 bytes) / secrecy-0.8.0 ** ena snapshot vecs
28,620.20 / Minstr (12,557,016 bytes) / aes-gcm-0.9.4 ** pred oblig. hashmaps
27,516.50 / Minstr (11,782,896 bytes) / pbkdf2-0.10.0 ** vtbl_impl
27,398.95 / Minstr (8,388,592 bytes) / deunicode-1.3.1 ** escape_default
24,645.81 / Minstr (3,985,120 bytes) / web-sys-0.3.56 parsing/AST stuff
22,248.21 / Minstr (9,761,328 bytes) / aes-gcm-0.9.4
22,085.03 / Minstr (471,312,600 bytes) / async-std-1.10.0
21,752.66 / Minstr (9,314,748 bytes) / pbkdf2-0.10.0 ** pred obligs
18,237.79 / Minstr (15,495,584 bytes) / unicode_categories-0.1.1 ** thir mirror_expr_inner
17,878.73 / Minstr (85,887,776 bytes) / pest-2.1.3 ** thir mirror_expr_inner
16,525.18 / Minstr (16,955,904 bytes) / yansi-0.5.0
16,447.43 / Minstr (5,035,623 bytes) / deunicode-1.3.1 escape_default
16,080.75 / Minstr (343,176,384 bytes) / async-std-1.10.0
15,484.04 / Minstr (57,553,672 bytes) / time-macros-0.2.3
14,426.90 / Minstr (307,881,792 bytes) / async-std-1.10.0
13,991.95 / Minstr (4,283,840 bytes) / deunicode-1.3.1 escape_default
13,621.47 / Minstr (50,101,664 bytes) / vte-0.10.1 BitSets
13,568.19 / Minstr (135,067,712 bytes) / bitmaps-3.1.0 ** match_impl
13,259.72 / Minstr (13,605,328 bytes) / yansi-0.5.0
13,140.27 / Minstr (79,387,384 bytes) / http-0.2.6 BitSets
13,140.27 / Minstr (79,387,384 bytes) / http-0.2.6 BitSets
12,873.49 / Minstr (6,820,112 bytes) / c2-chacha-0.3.3 ModChild
```
A much wider range of results here.
----
Hottest program points (PPs) by peak memory usage. The rightmost column indicates the hot allocation causes.
```
33.10% (3,985,120 bytes) / web-sys-0.3.56 Vec<TreeAndSpacing> in TokenStreamBuilder
32.29% (79,387,384 bytes) / http-0.2.6 BitSets
22.30% (25,059,520 bytes) / vte-0.10.1 BitSets
21.30% (6,422,572 bytes) / unicode_categories-0.1.1 LitToConstInput
20.95% (4,182,024 bytes) / rand_chacha-0.3.1 NameBinding, NameResolution, BindingKey
20.94% (51,517,192 bytes) / http-0.2.6 BitSets
20.94% (4,182,024 bytes) / c2-chacha-0.3.3 NameBinding, NameResolution, BindingKey
18.95% (17,086,432 bytes) / language-tags-0.3.2 BitSets
18.95% (17,086,168 bytes) / language-tags-0.3.2 BitSets
18.89% (21,230,008 bytes) / vte-0.10.1 BitSets
18.02% (5,035,623 bytes) / deunicode-1.3.1 Symbol, LitKind, encode_metadata_impl
15.76% (3,145,728 bytes) / rand_chacha-0.3.1 NameBinding, NameResolution, BindingKey
14.07% (5,609,280 bytes) / c2-chacha-0.3.3 NameBinding, NameResolution, BindingKey
13.30% (2,097,152 bytes) / keccak-0.1.0 ProjectionElem, PredicateInner
13.28% (8,832,256 bytes) / pest-2.1.3 as_operand, Expr
12.95% (2,042,880 bytes) / keccak-0.1.0 ProjectionElem, PredicateInner
12.89% (2,860,032 bytes) / serde_qs-0.8.5 DroplessArena
12.78% (8,388,600 bytes) / unic-ucd-segment-0.9.0 encode_metadata_impl
12.57% (2,097,152 bytes) / deunicode-1.3.1 Symbol, LitKind, encode_metadata_impl
12.14% (2,451,456 bytes) / actix-tls-3.0.1 DroplessArena
11.99% (41,901,440 bytes) / redis-0.21.5 obligation_forest::Node
11.95% (4,761,040 bytes) / rand_chacha-0.3.1 NameBinding, NameResolution, BindingKey
11.91% (7,922,432 bytes) / pest-2.1.3 as_operand, Expr
11.75% (8,637,520 bytes) / tinyvec-1.5.1 BitSet
11.75% (8,637,312 bytes) / tinyvec-1.5.1 BitSet
10.72% (2,451,456 bytes) / stdweb-derive-0.5.3 DroplessArena
10.50% (2,097,152 bytes) / c2-chacha-0.3.3 NameBinding, NameResolution, BindingKey
10.21% (5,968,560 bytes) / aes-0.7.5 ModChild
```
BitSets are again common. `DroplessArena` ones are difficult to action because that covers many different types. Otherwise, fairly spread out.
----
Highest peak memory usage, absolute.
```
535,773,520 bytes / vsdb-0.13.10
350,086,000 bytes / lsp-types-0.91.1
312,125,839 bytes / nalgebra-0.30.1
295,757,875 bytes / diesel-1.4.8
245,575,943 bytes / http-0.2.6
224,398,810 bytes / combine-4.6.3
199,890,087 bytes / nix-0.23.1
195,042,951 bytes / gimli-0.26.1
185,666,858 bytes / rusoto_s3-0.47.0
179,543,658 bytes / object-0.28.3
174,479,216 bytes / proptest-1.0.0
169,319,484 bytes / wast-39.0.0
168,982,525 bytes / tendermint-proto-0.24.0-pre.1
161,741,679 bytes / goblin-0.4.3
148,107,835 bytes / h2-0.3.11
```
Other than `http-0.2.6`, which is dominated by BitSets, these are not all that interesting. No particularly hot allocations sites, a pretty similar mix, with higher ones tending to be `DroplessArena`, mir CFG building, metadata encoding, etc.
## round-4-line-counts
Biggest crates.
```
web-sys-0.3.56.txt : 155935 lines of rust
regex-syntax-0.6.25.txt : 45482 lines of rust
tokio-1.16.1.txt : 44900 lines of rust
nalgebra-0.30.1.txt : 43215 lines of rust
gimli-0.26.1.txt : 38441 lines of rust
unicode-normalization-0.1.19.txt : 27132 lines of rust
curve25519-dalek-4.0.0-pre.1.txt : 26478 lines of rust
object-0.28.3.txt : 26403 lines of rust
diesel-1.4.8.txt : 25390 lines of rust
ndarray-0.15.4.txt : 24502 lines of rust
rustls-0.20.2.txt : 23704 lines of rust
nix-0.23.1.txt : 22779 lines of rust
rusoto_s3-0.47.0.txt : 20968 lines of rust
trust-dns-proto-0.21.0-alpha.4.txt : 19930 lines of rust
image-0.23.14.txt : 19866 lines of rust
petgraph-0.6.0.txt : 19283 lines of rust
git2-0.13.25.txt : 18990 lines of rust
vsdbsled-0.34.7-patched.txt : 18900 lines of rust
bitvec-1.0.0.txt : 18437 lines of rust
actix-web-4.0.0-beta.21.txt : 17388 lines of rust
```
Smallest crates.
```
fuchsia-cprng-0.1.1.txt : 40 lines of rust
cranelift-codegen-shared-0.80.0.txt : 39 lines of rust
num-0.4.0.txt : 38 lines of rust
headers-core-0.2.0.txt : 38 lines of rust
waker-fn-1.1.0.txt : 36 lines of rust
thread-id-4.0.0.txt : 36 lines of rust
foreign-types-shared-0.3.0.txt : 36 lines of rust
darling_macro-0.13.1.txt : 34 lines of rust
new_debug_unreachable-1.0.4.txt : 29 lines of rust
opaque-debug-0.3.0.txt : 25 lines of rust
tinyvec_macros-0.1.0.txt : 22 lines of rust
byte-tools-0.3.1.txt : 21 lines of rust
precomputed-hash-0.1.1.txt : 13 lines of rust
winapi-build-0.1.1.txt : 12 lines of rust
string_cache_shared-0.3.0.txt : 10 lines of rust
enum-iterator-0.7.0.txt : 10 lines of rust
typeable-0.1.2.txt : 8 lines of rust
stream-cipher-0.99.99.txt : 3 lines of rust
jsonrpc-core-client-18.0.0.txt : 3 lines of rust
block-cipher-0.99.99.txt : 3 lines of rust
```
Not much to analyze here.
## round-5-cachegrind-debug
This is similar to round-1-cachegrind-check, but with additional LLVM costs, which aren't very interesting to analyze here.
## round-6-llvm-lines-project
This gave very similar results to round-2-llvm-lines-leaf-crate, so I haven't analyzed it.
## round-7-cargo-timing-check-j1
The use of `-j1` forces codegen to be non-parallel, which makes these results non-representative. See the `-j8` results instead.
## round-8-cargo-timing-debug-j1
The use of -j1 force codegen to be non-parallel, which makes these results non-representative. See the `-j8` results instead.
## round-9-cargo-timing-opt-j1
The use of -j1 force codegen to be non-parallel, which makes these results non-representative. See the `-j8` results instead.
## round-10-cachegrind-opt
This is similar to round-1-cachegrind-check, but with additional LLVM costs, which aren't very interesting to analyze here.
## round-13-cargo-timing-opt-j8
Most expensive crates. The counts are seconds of compile time, e.g. `syn` accounted for 643.5 seconds of compile time, which is 8.1% of the total. (There is of course overlap in crate compilation, so this doesn't say much about the critical path.)
```
7932.5 counts (weighted fractional, erased)
( 1) 643.5 ( 8.1%, 8.1%): syn v1.0.86
( 2) 235.4 ( 3.0%, 11.1%): serde v1.0.136
( 3) 202.1 ( 2.5%, 13.6%): tokio v1.16.1
( 4) 182.9 ( 2.3%, 15.9%): regex-syntax v0.6.25
( 5) 172.3 ( 2.2%, 18.1%): libc v0.2.116
( 6) 155.9 ( 2.0%, 20.1%): regex v1.5.4
( 7) 151.7 ( 1.9%, 22.0%): proc-macro2 v1.0.36
( 8) 138.8 ( 1.7%, 23.7%): serde_derive v1.0.136
( 9) 130.0 ( 1.6%, 25.4%): memchr v2.4.1
( 10) 99.6 ( 1.3%, 26.6%): libc v0.2.116 build script
( 11) 95.6 ( 1.2%, 27.8%): proc-macro2 v1.0.36 build script
( 12) 92.2 ( 1.2%, 29.0%): quote v1.0.15
( 13) 86.4 ( 1.1%, 30.1%): syn v1.0.86 build script
( 14) 78.3 ( 1.0%, 31.1%): http v0.2.6
( 15) 77.2 ( 1.0%, 32.0%): futures-util v0.3.19
( 16) 69.3 ( 0.9%, 32.9%): aho-corasick v0.7.18
( 17) 63.5 ( 0.8%, 33.7%): serde_json v1.0.78
( 18) 54.3 ( 0.7%, 34.4%): bytes v1.1.0
( 19) 50.7 ( 0.6%, 35.0%): h2 v0.3.11
( 20) 49.4 ( 0.6%, 35.7%): autocfg v1.0.1
( 21) 49.1 ( 0.6%, 36.3%): cc v1.0.72
( 22) 49.1 ( 0.6%, 36.9%): thiserror-impl v1.0.30
( 23) 49.0 ( 0.6%, 37.5%): log v0.4.14
( 24) 48.7 ( 0.6%, 38.1%): hyper v0.14.16
( 25) 48.1 ( 0.6%, 38.7%): num_cpus v1.13.1
( 26) 47.9 ( 0.6%, 39.3%): mio v0.7.14
( 27) 45.1 ( 0.6%, 39.9%): unicode-bidi v0.3.7
( 28) 42.1 ( 0.5%, 40.4%): log v0.4.14 build script
( 29) 42.0 ( 0.5%, 41.0%): unicode-xid v0.2.2
( 30) 40.9 ( 0.5%, 41.5%): num-traits v0.2.14
```
`syn`/`quote`/`proc-macro2` (and their build scripts) are the most frequent.
Very surprising to see so many build scripts in there! Definitely worth investigation.
Some analysis of build script use-cases (areas where declaratively supporting the feature in cargo would remove the need for the script):
- setting conditional compilation flags depending on the compiler version, handling MSRV:
* [`syn`](https://github.com/dtolnay/syn/blob/master/build.rs)
* [`proc-macro2`](https://github.com/dtolnay/proc-macro2/blob/master/build.rs)
* [`libc`](https://github.com/rust-lang/libc/blob/master/build.rs)
* [`serde`](https://github.com/serde-rs/serde/blob/master/serde/build.rs)
- setting conditional compilation flags depending on the target:
* [`log`](https://github.com/rust-lang/log/blob/master/build.rs). This seems more of a convenience than something impossible without a script though: the crate could likely contain `cfg` expressions matching the same targets (doing so would remove this node from 120 crates' dependency graph in the dataset) \[https://github.com/rust-lang/log/issues/489\]
* [`proc-macro2`](https://github.com/dtolnay/proc-macro2/blob/master/build.rs): e.g. for the wasm target
* [`libc`](https://github.com/rust-lang/libc/blob/master/build.rs): e.g. for the FreeBSD target versions
* [`memchr`](https://github.com/BurntSushi/memchr/blob/master/build.rs): e.g. for SIMD
* [`serde`](https://github.com/serde-rs/serde/blob/master/serde/build.rs): e.g. for wasm/asm.js, and architectures where libstd supports atomics
* [`futures-core`](https://github.com/rust-lang/futures-rs/blob/master/futures-core/build.rs): e.g. for targets without atomic CAS ops
- parsing and checking other environment variables (although one could see the `target` use-case above as parsing the `TARGET` env var):
* `proc-macro2` also checks the `DOCS_RS` env var, likely to control and improve rustdoc output on docs.rs
* `libc` for CI to deny warnings, to check if it's a dependency of libstd, and to access cargo feature flags (which are probably equivalent to using `cfg!` expressions in the build script)
- setting conditional compilation flags derived from other feature flags (e.g. in `proc-macro2`)
TODO: also investigate the build script compile times. Some of these scripts are simple (but use various parts of libstd), but compile slowly (e.g. `syn`'s build script compiles in >400ms in 150 crates). We need to look into that: whether it's because of opt levels or else; maybe some simple scripts could be interpreted.
----
Most popular crates, i.e. how often they are dependencies for other crates.
```
10657 counts:
( 1) 224 ( 2.1%, 2.1%): libc v0.2.116 build script (run)
( 2) 224 ( 2.1%, 4.2%): libc v0.2.116
( 3) 223 ( 2.1%, 6.3%): cfg-if v1.0.0
( 4) 215 ( 2.0%, 8.3%): libc v0.2.116 build script
( 5) 200 ( 1.9%, 10.2%): unicode-xid v0.2.2
( 6) 199 ( 1.9%, 12.1%): proc-macro2 v1.0.36
( 7) 199 ( 1.9%, 13.9%): quote v1.0.15
( 8) 199 ( 1.9%, 15.8%): proc-macro2 v1.0.36 build script (run)
( 9) 197 ( 1.8%, 17.6%): proc-macro2 v1.0.36 build script
( 10) 193 ( 1.8%, 19.5%): syn v1.0.86 build script (run)
( 11) 193 ( 1.8%, 21.3%): syn v1.0.86
( 12) 191 ( 1.8%, 23.1%): syn v1.0.86 build script
( 13) 122 ( 1.1%, 24.2%): log v0.4.14 build script (run)
( 14) 122 ( 1.1%, 25.3%): log v0.4.14
( 15) 120 ( 1.1%, 26.5%): log v0.4.14 build script
( 16) 103 ( 1.0%, 27.4%): memchr v2.4.1
( 17) 103 ( 1.0%, 28.4%): memchr v2.4.1 build script (run)
( 18) 102 ( 1.0%, 29.4%): lazy_static v1.4.0
( 19) 101 ( 0.9%, 30.3%): memchr v2.4.1 build script
( 20) 88 ( 0.8%, 31.1%): autocfg v1.0.1
( 21) 76 ( 0.7%, 31.8%): serde v1.0.136
( 22) 76 ( 0.7%, 32.6%): serde v1.0.136 build script (run)
( 23) 74 ( 0.7%, 33.3%): serde v1.0.136 build script
( 24) 72 ( 0.7%, 33.9%): version_check v0.9.4
( 25) 63 ( 0.6%, 34.5%): pin-project-lite v0.2.8
( 26) 62 ( 0.6%, 35.1%): futures-core v0.3.19 build script
( 27) 62 ( 0.6%, 35.7%): futures-core v0.3.19
( 28) 62 ( 0.6%, 36.3%): futures-core v0.3.19 build script (run)
( 29) 60 ( 0.6%, 36.8%): once_cell v1.9.0
( 30) 58 ( 0.5%, 37.4%): fnv v1.0.7
```
`libc`, `cfg-if`, `unicode-xid`, and `syn`/`quote`/`proc-macro2`/`unicode-xid` are the most popular.
----
The biggest projects, i.e. most crates compiled.
```
jsonrpc-client-transports-18.0.0: 184
actix-web-4.0.0-beta.21: 178
sentry-0.24.2: 149
rusoto_s3-0.47.0: 144
awc-3.0.0-beta.19: 141
warp-0.3.2: 129
tonic-0.6.2: 112
actix-connect-2.0.0: 110
rusoto_signature-0.47.0: 108
tera-1.15.0: 107
reqwest-0.11.9: 105
actix-http-3.0.0-beta.19: 100
tokio-postgres-0.7.5: 93
vsdb-0.13.10: 90
rusoto_credential-0.47.0: 88
glutin-0.28.0: 87
criterion-0.3.5: 82
jsonrpc-core-client-18.0.0: 81
trust-dns-resolver-0.21.0-alpha.4: 79
hyper-rustls-0.23.0: 79
tokio-tungstenite-0.16.1: 75
rustc-ap-rustc_data_structures-727.0.0: 75
log4rs-1.0.0: 73
ammonia-3.1.3: 73
hyper-tls-0.5.0: 72
jsonrpc-server-utils-18.0.0: 68
jsonrpc-pubsub-18.0.0: 68
trust-dns-proto-0.21.0-alpha.4: 66
tracing-opentelemetry-0.16.0: 65
sentry-backtrace-0.24.2: 63
```
219 out of 777 projects contain a single crate, i.e. zero dependencies.
---
Observations just from looking at some timings graphs.
- The `hyper` crates depends on the `h2` crate, but doesn't start building until `hyper` is fully compiled, rather than when `hyper`'s metadata is emitted before codegen'. Is this necessary? E.g. in [`warp-0.3.2`](https://lqd.github.io/rustc-benchmarking-data/results/round-13-cargo-timing-opt-j8/cargo-timing-warp-0.3.2-opt-j8.html). [lqd, [hyper:#2770](https://github.com/hyperium/hyper/pull/2770), complete]
- Likewise for everything that depends on `syn`, e.g. in [`actix-connect-2.0.0`](https://lqd.github.io/rustc-benchmarking-data/results/round-13-cargo-timing-opt-j8/cargo-timing-actix-connect-2.0.0-opt-j8.html)
- Some build scripts that compile C code are very slow to run, e.g. `zstd-sys build script (run)` in [`awc-3.0.0-beta.19`](https://lqd.github.io/rustc-benchmarking-data/results/round-13-cargo-timing-opt-j8/cargo-timing-awc-3.0.0-beta.19-opt-j8.html). Can we do better with them? Prioritizing some of them earlier in the pipeline could help, thanks to increased parallelism. The same thing used to happen on servo but I've also seen it on crates depending on `openssl`, and is tracked in [this cargo issue](https://github.com/rust-lang/cargo/issues/7437). Note: although, native library builds can also compete for tokens and build in parallel, and moving those earlier can in turn make them build slower because of higher contention and less resources.
## round-11-cargo-timing-check-j8
Most expensive crates, same idea as for round-13.
```
5196.0 counts (weighted fractional, erased)
( 1) 491.5 ( 9.5%, 9.5%): syn v1.0.86
( 2) 179.7 ( 3.5%, 12.9%): serde v1.0.136 lib (check)
( 3) 154.6 ( 3.0%, 15.9%): serde_derive v1.0.136
( 4) 132.4 ( 2.5%, 18.4%): libc v0.2.116 lib (check)
( 5) 110.8 ( 2.1%, 20.6%): libc v0.2.116 build script
( 6) 110.6 ( 2.1%, 22.7%): proc-macro2 v1.0.36
( 7) 109.5 ( 2.1%, 24.8%): proc-macro2 v1.0.36 build script
( 8) 100.0 ( 1.9%, 26.7%): syn v1.0.86 lib (check)
( 9) 98.2 ( 1.9%, 28.6%): syn v1.0.86 build script
( 10) 82.7 ( 1.6%, 30.2%): tokio v1.16.1 lib (check)
( 11) 64.2 ( 1.2%, 31.5%): futures-util v0.3.19 lib (check)
( 12) 63.5 ( 1.2%, 32.7%): quote v1.0.15
( 13) 57.0 ( 1.1%, 33.8%): autocfg v1.0.1
( 14) 52.2 ( 1.0%, 34.8%): thiserror-impl v1.0.30
( 15) 47.8 ( 0.9%, 35.7%): regex-syntax v0.6.25 lib (check)
( 16) 47.3 ( 0.9%, 36.6%): cc v1.0.72
( 17) 43.9 ( 0.8%, 37.4%): log v0.4.14 build script
( 18) 43.0 ( 0.8%, 38.3%): memchr v2.4.1 lib (check)
( 19) 42.5 ( 0.8%, 39.1%): memchr v2.4.1 build script
( 20) 38.9 ( 0.7%, 39.8%): version_check v0.9.4
( 21) 36.8 ( 0.7%, 40.6%): serde v1.0.136 build script
( 22) 36.7 ( 0.7%, 41.3%): typenum v1.15.0 build script
( 23) 36.0 ( 0.7%, 42.0%): http v0.2.6 lib (check)
( 24) 33.9 ( 0.7%, 42.6%): typenum v1.15.0 lib (check)
( 25) 32.3 ( 0.6%, 43.2%): zstd-sys v1.6.2+zstd.1.5.1 build script (run)
( 26) 31.3 ( 0.6%, 43.8%): jemalloc-sys v0.3.2 build script (run)
( 27) 31.3 ( 0.6%, 44.4%): unicode-xid v0.2.2
( 28) 31.1 ( 0.6%, 45.0%): cfg-if v1.0.0 lib (check)
( 29) 28.8 ( 0.6%, 45.6%): num-traits v0.2.14 lib (check)
( 30) 26.7 ( 0.5%, 46.1%): derive_more v0.99.17
```
Reasonably similar results to round-13.
## round-12-cargo-timing-debug-j8
Most expensive crates, same idea as for round-13.
```
6451.3 counts (weighted fractional, erased)
( 1) 663.9 (10.3%, 10.3%): syn v1.0.86
( 2) 222.7 ( 3.5%, 13.7%): serde v1.0.136
( 3) 154.1 ( 2.4%, 16.1%): serde_derive v1.0.136
( 4) 153.5 ( 2.4%, 18.5%): proc-macro2 v1.0.36
( 5) 144.6 ( 2.2%, 20.8%): libc v0.2.116
( 6) 135.1 ( 2.1%, 22.8%): tokio v1.16.1
( 7) 112.6 ( 1.7%, 24.6%): libc v0.2.116 build script
( 8) 108.7 ( 1.7%, 26.3%): proc-macro2 v1.0.36 build script
( 9) 97.6 ( 1.5%, 27.8%): syn v1.0.86 build script
( 10) 91.9 ( 1.4%, 29.2%): regex-syntax v0.6.25
( 11) 89.3 ( 1.4%, 30.6%): quote v1.0.15
( 12) 76.2 ( 1.2%, 31.8%): memchr v2.4.1
( 13) 73.9 ( 1.1%, 32.9%): futures-util v0.3.19
( 14) 62.4 ( 1.0%, 33.9%): regex v1.5.4
( 15) 58.4 ( 0.9%, 34.8%): http v0.2.6
( 16) 58.1 ( 0.9%, 35.7%): autocfg v1.0.1
( 17) 54.7 ( 0.8%, 36.5%): thiserror-impl v1.0.30
( 18) 52.1 ( 0.8%, 37.4%): cc v1.0.72
( 19) 43.6 ( 0.7%, 38.0%): log v0.4.14 build script
( 20) 42.5 ( 0.7%, 38.7%): memchr v2.4.1 build script
( 21) 41.8 ( 0.6%, 39.3%): unicode-xid v0.2.2
( 22) 40.9 ( 0.6%, 40.0%): log v0.4.14
( 23) 39.3 ( 0.6%, 40.6%): bytes v1.1.0
( 24) 38.7 ( 0.6%, 41.2%): serde_json v1.0.78
( 25) 38.6 ( 0.6%, 41.8%): version_check v0.9.4
( 26) 38.0 ( 0.6%, 42.4%): hyper v0.14.16
( 27) 37.9 ( 0.6%, 43.0%): serde v1.0.136 build script
( 28) 36.6 ( 0.6%, 43.5%): typenum v1.15.0 build script
( 29) 35.2 ( 0.5%, 44.1%): zstd-sys v1.6.2+zstd.1.5.1 build script (run)
( 30) 34.7 ( 0.5%, 44.6%): typenum v1.15.0
```
Reasonably similar results to round-13.
## round-14-self-profile-check
The heaviest relative queries seen. (More data [here](https://lqd.github.io/rustc-benchmarking-data/summaries/)). The `expand_crate` ones have some correlation with the hot macro parsing results seen with Cachegrind and DHAT.
```
expand_crate rel 83.73%, abs 44.35ms web-sys-0.3.56
expand_crate rel 70.53%, abs 3.62s async-std-1.10.0
metadata_register_crate rel 60.64%, abs 15.26ms jsonrpc-core-client-18.0.0
metadata_register_crate rel 59.59%, abs 24.52ms impl-codec-0.5.1
expand_crate rel 54.35%, abs 155.29ms yansi-0.5.0
typeck rel 53.87%, abs 1.41s redis-0.21.5
typeck rel 52.41%, abs 94.35ms keccak-0.1.0
specialization_graph_of rel 51.22%, abs 13.41s nalgebra-0.30.1
expand_crate rel 50.40%, abs 6.93ms opaque-debug-0.3.0
expand_crate rel 49.74%, abs 556.79ms time-macros-0.2.3
expand_crate rel 49.62%, abs 59.15ms enum-iterator-derive-0.7.0
expand_crate rel 49.29%, abs 350.36ms num-derive-0.3.3
expand_crate rel 49.10%, abs 9.20ms static_assertions-1.1.0
expand_crate rel 48.98%, abs 1.89s js-sys-0.3.56
expand_crate rel 48.64%, abs 10.33ms mac-0.1.1
expand_crate rel 48.61%, abs 6.49ms matches-0.1.9
expand_crate rel 48.54%, abs 114.46ms ctor-0.1.21
expand_crate rel 48.47%, abs 14.52ms fixed-hash-0.7.0
expand_crate rel 48.40%, abs 6.16ms pin-utils-0.1.0
expand_crate rel 48.34%, abs 6.21ms tinyvec_macros-0.1.0
expand_crate rel 48.23%, abs 10.45ms crossbeam-0.8.1
expand_crate rel 47.68%, abs 7.28ms cpufeatures-0.2.1
expand_crate rel 46.52%, abs 240.79ms pest_generator-2.1.3
expand_crate rel 46.31%, abs 18.75ms term_size-1.0.0-beta1
expand_crate rel 45.75%, abs 8.37ms miow-0.4.0
typeck rel 45.58%, abs 479.33ms vte-0.10.1
expand_crate rel 44.70%, abs 8.46ms wincolor-1.0.3
expand_crate rel 44.51%, abs 65.65ms enum-as-inner-0.3.3
expand_crate rel 44.50%, abs 490.77ms pear-0.2.3
expand_crate rel 44.43%, abs 8.63ms winapi-util-0.1.5
```
Slowest passes overall, weighted by percentages.
```
77399.4 counts (weighted fractional, erased)
( 1) 13800.9 (17.8%, 17.8%): typeck
( 2) 13736.9 (17.7%, 35.6%): expand_crate
( 3) 7133.2 ( 9.2%, 44.8%): mir_borrowck
( 4) 2454.5 ( 3.2%, 48.0%): evaluate_obligation
( 5) 2191.4 ( 2.8%, 50.8%): free_global_ctxt
( 6) 2120.5 ( 2.7%, 53.5%): metadata_register_crate
( 7) 2114.5 ( 2.7%, 56.3%): metadata_decode_entry_impl_trait_ref
( 8) 2072.1 ( 2.7%, 58.9%): hir_lowering
( 9) 1733.5 ( 2.2%, 61.2%): mir_built
( 10) 1692.4 ( 2.2%, 63.4%): specialization_graph_of
( 11) 1549.9 ( 2.0%, 65.4%): late_resolve_crate
( 12) 1473.5 ( 1.9%, 67.3%): parse_crate
( 13) 1214.3 ( 1.6%, 68.8%): type_op_prove_predicate
( 14) 979.1 ( 1.3%, 70.1%): check_impl_item_well_formed
( 15) 905.9 ( 1.2%, 71.3%): check_item_well_formed
( 16) 822.8 ( 1.1%, 72.3%): param_env
( 17) 667.8 ( 0.9%, 73.2%): generate_crate_metadata
( 18) 655.9 ( 0.8%, 74.1%): thir_body
( 19) 615.8 ( 0.8%, 74.9%): check_mod_item_types
( 20) 580.2 ( 0.7%, 75.6%): metadata_decode_entry_type_of
```
## round-15-self-profile-debug
The heaviest relative queries seen.
```
expand_crate rel 73.81%, abs 40.93ms web-sys-0.3.56-Debug-Full.txt
run_linker rel 68.10%, abs 152.87ms block-cipher-0.99.99-Debug-Full.txt
run_linker rel 67.83%, abs 141.60ms stream-cipher-0.99.99-Debug-Full.txt
run_linker rel 63.27%, abs 856.92ms wasm-bindgen-macro-0.2.79-Debug-Full.txt
run_linker rel 58.74%, abs 713.84ms pest_derive-2.1.0-Debug-Full.txt
run_linker rel 55.58%, abs 898.85ms darling_macro-0.13.1-Debug-Full.txt
metadata_register_crate rel 54.79%, abs 23.41ms jsonrpc-core-client-18.0.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 54.76%, abs 423.21ms rpassword-5.0.1-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 54.30%, abs 405.03ms color_quant-1.1.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 53.62%, abs 514.55ms predicates-tree-1.0.5-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 53.24%, abs 497.02ms slog-scope-4.4.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 53.21%, abs 579.14ms dirs-sys-next-0.1.2-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 52.92%, abs 556.50ms diff-0.1.12-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 52.65%, abs 603.78ms pem-1.0.2-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 52.47%, abs 488.49ms log-mdc-0.1.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 52.39%, abs 375.82ms heck-0.4.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 52.00%, abs 222.41ms shlex-1.1.0-Debug-Full.txt
run_linker rel 51.17%, abs 993.63ms pyo3-macros-0.15.1-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 51.12%, abs 588.33ms jobserver-0.1.24-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 50.60%, abs 612.43ms strsim-0.10.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 50.43%, abs 615.94ms convert_case-0.5.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 50.33%, abs 410.43ms tokio-tcp-0.2.0-alpha.1-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 50.13%, abs 591.25ms dotenv-0.15.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 50.11%, abs 643.20ms simplelog-0.11.2-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 49.82%, abs 451.52ms polling-2.2.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 49.72%, abs 611.30ms threadpool-1.8.1-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 49.64%, abs 233.63ms shell-escape-0.1.5-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 49.51%, abs 527.52ms proc-macro-crate-1.1.0-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 49.48%, abs 645.35ms stringprep-0.1.2-Debug-Full.txt
LLVM_module_codegen_emit_obj rel 49.48%, abs 571.53ms futures-timer-3.0.2-Debug-Full.txt
```
Slowest passes overall, weighted by percentages.
```
77694.1 counts (weighted fractional, erased)
( 1) 21595.5 (27.8%, 27.8%): LLVM_module_codegen_emit_obj
( 2) 8452.2 (10.9%, 38.7%): LLVM_passes
( 3) 5659.6 ( 7.3%, 46.0%): expand_crate
( 4) 4990.5 ( 6.4%, 52.4%): typeck
( 5) 4766.1 ( 6.1%, 58.5%): codegen_module
( 6) 2562.4 ( 3.3%, 61.8%): mir_borrowck
( 7) 1740.9 ( 2.2%, 64.1%): finish_ongoing_codegen
( 8) 1651.1 ( 2.1%, 66.2%): run_linker
( 9) 1407.6 ( 1.8%, 68.0%): LLVM_module_optimize
( 10) 1232.0 ( 1.6%, 69.6%): free_global_ctxt
( 11) 1130.4 ( 1.5%, 71.0%): LLVM_module_codegen
( 12) 1113.2 ( 1.4%, 72.5%): evaluate_obligation
( 13) 1057.2 ( 1.4%, 73.8%): metadata_register_crate
( 14) 894.0 ( 1.2%, 75.0%): metadata_decode_entry_impl_trait_ref
( 15) 870.6 ( 1.1%, 76.1%): hir_lowering
( 16) 816.2 ( 1.1%, 77.1%): mir_drops_elaborated_and_const_checked
( 17) 794.9 ( 1.0%, 78.2%): specialization_graph_of
( 18) 720.7 ( 0.9%, 79.1%): parse_crate
( 19) 676.3 ( 0.9%, 80.0%): optimized_mir
( 20) 643.4 ( 0.8%, 80.8%): mir_built
```
## round-16-self-profile-opt
The heaviest relative queries seen.
```
expand_crate rel 65.50%, abs 43.94ms web-sys-0.3.56-Opt-Full.txt
run_linker rel 62.97%, abs 167.55ms block-cipher-0.99.99-Opt-Full.txt
run_linker rel 56.23%, abs 147.61ms stream-cipher-0.99.99-Opt-Full.txt
specialization_graph_of rel 49.22%, abs 13.32s nalgebra-0.30.1-Opt-Full.txt
LLVM_module_optimize rel 41.94%, abs 994.78ms rustc-demangle-0.1.21-Opt-Full.txt
metadata_register_crate rel 41.40%, abs 21.23ms impl-codec-0.5.1-Opt-Full.txt
metadata_register_crate rel 39.58%, abs 16.02ms jsonrpc-core-client-18.0.0-Opt-Full.txt
LLVM_module_optimize rel 38.36%, abs 459.10ms rpassword-5.0.1-Opt-Full.txt
LLVM_module_optimize rel 38.27%, abs 662.36ms slog-scope-4.4.0-Opt-Full.txt
LLVM_module_optimize rel 38.05%, abs 1.83s async-process-1.3.0-Opt-Full.txt
LLVM_module_optimize rel 37.84%, abs 1.26s textwrap-0.14.2-Opt-Full.txt
LLVM_module_optimize rel 36.96%, abs 614.37ms serial_test-0.5.1-Opt-Full.txt
LLVM_module_optimize rel 36.58%, abs 675.59ms log-mdc-0.1.0-Opt-Full.txt
LLVM_module_optimize rel 36.53%, abs 623.55ms actix-threadpool-0.3.3-Opt-Full.txt
LLVM_module_optimize rel 36.51%, abs 679.71ms futures-executor-0.3.19-Opt-Full.txt
LLVM_module_optimize rel 36.48%, abs 752.50ms blocking-1.1.0-Opt-Full.txt
LLVM_module_optimize rel 36.25%, abs 1.23s version_check-0.9.4-Opt-Full.txt
LLVM_module_optimize rel 36.25%, abs 764.37ms dirs-sys-0.3.6-Opt-Full.txt
expand_crate rel 36.21%, abs 10.49ms crossbeam-0.8.1-Opt-Full.txt
LLVM_module_optimize rel 36.12%, abs 2.41s async-global-executor-2.0.2-Opt-Full.txt
LLVM_module_optimize rel 35.89%, abs 586.79ms tokio-udp-0.2.0-alpha.1-Opt-Full.txt
LLVM_module_optimize rel 35.59%, abs 741.72ms tokio-tcp-0.2.0-alpha.1-Opt-Full.txt
LLVM_module_optimize rel 35.59%, abs 1.11s rusty-fork-0.3.0-Opt-Full.txt
LLVM_module_optimize rel 35.58%, abs 534.42ms wasm-bindgen-futures-0.4.29-Opt-Full.txt
LLVM_module_optimize rel 35.52%, abs 1.08s os_info-3.1.0-Opt-Full.txt
LLVM_module_optimize rel 35.45%, abs 1.25s dotenv-0.15.0-Opt-Full.txt
LLVM_module_optimize rel 35.37%, abs 267.58ms crypto-hash-0.3.4-Opt-Full.txt
LLVM_module_optimize rel 34.98%, abs 447.28ms hyper-rustls-0.23.0-Opt-Full.txt
typeck rel 34.95%, abs 518.49ms vte-0.10.1-Opt-Full.txt
LLVM_module_optimize rel 34.93%, abs 1.57s tokio-signal-0.3.0-alpha.1-Opt-Full.txt
```
Slowest passes overall, weighted by percentages.
```
77768.4 counts (weighted fractional, erased)
( 1) 13851.8 (17.8%, 17.8%): LLVM_module_optimize
( 2) 10520.5 (13.5%, 31.3%): LLVM_passes
( 3) 8220.3 (10.6%, 41.9%): LLVM_module_codegen_emit_obj
( 4) 8131.6 (10.5%, 52.4%): finish_ongoing_codegen
( 5) 7983.4 (10.3%, 62.6%): LLVM_lto_optimize
( 6) 3890.3 ( 5.0%, 67.6%): expand_crate
( 7) 3213.1 ( 4.1%, 71.8%): typeck
( 8) 1628.8 ( 2.1%, 73.9%): mir_borrowck
( 9) 1405.6 ( 1.8%, 75.7%): codegen_module
( 10) 1002.9 ( 1.3%, 77.0%): codegen_module_optimize
( 11) 936.9 ( 1.2%, 78.2%): LLVM_thin_lto_import
( 12) 894.2 ( 1.1%, 79.3%): free_global_ctxt
( 13) 802.3 ( 1.0%, 80.3%): evaluate_obligation
( 14) 768.9 ( 1.0%, 81.3%): metadata_register_crate
( 15) 604.3 ( 0.8%, 82.1%): metadata_decode_entry_impl_trait_ref
( 16) 601.3 ( 0.8%, 82.9%): codegen_module_perform_lto
( 17) 594.2 ( 0.8%, 83.6%): hir_lowering
( 18) 558.6 ( 0.7%, 84.4%): parse_crate
( 19) 544.0 ( 0.7%, 85.1%): specialization_graph_of
( 20) 512.3 ( 0.7%, 85.7%): mir_drops_elaborated_and_const_checked
```
## round-17-time-passes-check
**Executive summary**
- Crate expansion and type checking are the passes that increase memory usage the most.
----
`-Ztime-passes` gives both time and RSS (absolute and change) for each pass. Self-profiling covers time, so I'll just analyze the change in RSS for each stage. I don't entirely trust the RSS numbers produced by `-Ztime-passes`, the sometimes seem wonky, but here goes.
Weighted RSS changes. Note that the totals aren't that meaningful, it's about the percentages.
```
184746.0 counts (weighted fractional, erased)
( 1) 48259.0 (26.1%, 26.1%): total
( 2) -36664.0 (-19.8%, 6.3%): free_global_ctxt
( 3) 33765.0 (18.3%, 24.6%): configure_and_expand
( 4) 29491.0 (16.0%, 40.5%): macro_expand_crate
( 5) 29448.0 (15.9%, 56.5%): expand_crate
( 6) 27825.0 (15.1%, 71.5%): type_check_crate
( 7) 11533.0 ( 6.2%, 77.8%): coherence_checking
( 8) 7046.0 ( 3.8%, 81.6%): item_bodies_checking
( 9) 6986.0 ( 3.8%, 85.4%): MIR_borrow_checking
( 10) 4205.0 ( 2.3%, 87.6%): type_collecting
( 11) 3477.0 ( 1.9%, 89.5%): hir_lowering
( 12) 3060.0 ( 1.7%, 91.2%): wf_checking
( 13) 2704.0 ( 1.5%, 92.6%): resolve_crate
( 14) 2451.0 ( 1.3%, 94.0%): late_resolve_crate
( 15) 1770.0 ( 1.0%, 94.9%): parse_crate
( 16) 1751.0 ( 0.9%, 95.9%): item_types_checking
( 17) 1358.0 ( 0.7%, 96.6%): misc_checking_1
( 18) 1329.0 ( 0.7%, 97.3%): generate_crate_metadata
( 19) 1141.0 ( 0.6%, 97.9%): misc_checking_3
( 20) 830.0 ( 0.4%, 98.4%): lint_checking
```
I don't think the `total` number is meaningful. `macro_expand_crate` and `expand_crate` are almost always identical, not sure what to make of that, seems suspicious.
## round-18-time-passes-debug
```
274400.0 counts (weighted fractional, erased)
( 1) 75783.0 (27.6%, 27.6%): total
( 2) -40315.0 (-14.7%, 12.9%): free_global_ctxt
( 3) 33988.0 (12.4%, 25.3%): configure_and_expand
( 4) 29827.0 (10.9%, 36.2%): macro_expand_crate
( 5) 29774.0 (10.9%, 47.0%): expand_crate
( 6) 28294.0 (10.3%, 57.3%): type_check_crate
( 7) 24573.0 ( 9.0%, 66.3%): codegen_crate
( 8) 23209.0 ( 8.5%, 74.8%): codegen_to_LLVM_IR
( 9) 11883.0 ( 4.3%, 79.1%): coherence_checking
( 10) 7118.0 ( 2.6%, 81.7%): item_bodies_checking
( 11) 7029.0 ( 2.6%, 84.2%): generate_crate_metadata
( 12) 7028.0 ( 2.6%, 86.8%): MIR_borrow_checking
( 13) 5132.0 ( 1.9%, 88.7%): monomorphization_collector_graph_walk
( 14) 4240.0 ( 1.5%, 90.2%): type_collecting
( 15) 3472.0 ( 1.3%, 91.5%): hir_lowering
( 16) 3114.0 ( 1.1%, 92.6%): wf_checking
( 17) 2732.0 ( 1.0%, 93.6%): resolve_crate
( 18) 2506.0 ( 0.9%, 94.5%): late_resolve_crate
( 19) 1731.0 ( 0.6%, 95.2%): item_types_checking
( 20) 1710.0 ( 0.6%, 95.8%): parse_crate
```
Numbers for front-end passes are similar to `round-17`, as expected. Codegen passes add some extra memory use, unsurprisingly.
## round-19-time-passes-opt
```
34209.0 counts (weighted fractional, erased)
( 1) 111637.0 (25.7%, 25.7%): LLVM_lto_optimize(*-cgu.N)
( 2) 94078.0 (21.7%, 47.4%): total
( 3) -39819.0 (-9.2%, 38.2%): free_global_ctxt
( 4) 33939.0 ( 7.8%, 46.0%): configure_and_expand
( 5) 30996.0 ( 7.1%, 53.2%): codegen_crate
( 6) 29712.0 ( 6.8%, 60.0%): macro_expand_crate
( 7) 29634.0 ( 6.8%, 66.8%): expand_crate
( 8) 28418.0 ( 6.5%, 73.4%): type_check_crate
( 9) 24642.0 ( 5.7%, 79.0%): codegen_to_LLVM_IR
( 10) 17904.0 ( 4.1%, 83.2%): finish_ongoing_codegen
( 11) 15878.0 ( 3.7%, 86.8%): link
( 12) 12012.0 ( 2.8%, 89.6%): coherence_checking
( 13) 7087.0 ( 1.6%, 91.2%): item_bodies_checking
( 14) 6953.0 ( 1.6%, 92.8%): MIR_borrow_checking
( 15) 5477.0 ( 1.3%, 94.1%): monomorphization_collector_graph_walk
( 16) 4179.0 ( 1.0%, 95.1%): type_collecting
( 17) 3479.0 ( 0.8%, 95.9%): hir_lowering
( 18) 3115.0 ( 0.7%, 96.6%): wf_checking
( 19) 2737.0 ( 0.6%, 97.2%): resolve_crate
( 20) 2513.0 ( 0.6%, 97.8%): generate_crate_metadata
```
`LLVM_lto_optimize` is the most memory-hungry pass, in general.