Analysis of `rustc-benchmarking-data`

lqd gathered a lot of data in the rustc-benchmarking-data repository. This document is nnethercote's analysis of it (with a few additional comments from others). It is long, detailed, and quite dry. It is aimed at Rust compiler developers, and not intended for a general audience. It is also not the highest quality prose, in part because it is likely to become out of date in the not too distant future as performance work addresses things this measurement and analysis has identified.

See the roadmap for a higher-level view of rustc perf work for 2022.

As well as an analysis, it will serve as a means of tracking who is doing/has done what work. Task assignations are shown in square bracket, e.g. “[name]”.

round-1-cachegrind-check

Executive summary

parse_tt and other functions related to macro parsing are the hottest, and correlate highly with allocations. [nnethercote, this blog post has details]
memcpy is high in functions using BitSets a lot for dataflow analysis, e.g. in http-0.2.6. [nnethercote, #93984]
Metadata decoding/file reading is roughly constant for all crates; this can be a moderately high proportion of time for tiny crates. Likewise the LLVM SetImpliedBits function getting target feature information.
There is a long tail of moderate opportunities for wins on a few crates, worth looking at each of them briefly, there are probably a few easy wins.

Hot functions in a single crate

deunicode-1.3.1 dominated by core::ascii::escape_default [martingms, #94776]
tinyvec-1.5.1 dominated by <rustc_mir_build::build::Builder>::diverge_cleanup [nnethercote: not worth fixing within rustc, but there are several possible fixes within tinyvec itself. See #161 for details.]
unicode-normalization-0.1.19 dominated by try_eval_bits [nnethercote, #97936]

Widely used functions

This shows all functions across all benchmarks, weighted by their Ir percentage. This demonstrates breadth of usage.

I've excluded malloc, memcpy, and dlopen/elf stuff, which made up lots of slots.

This table is hard to read, but metadata decoding dominates because of its effect on small crates. The next section ("Hot functions in multiple crates") breaks hot functions down more and is probably more useful.

33116.2 counts (weighted fractional, erased)
(  8)    913.6 ( 2.8%, 45.9%): compiler/rustc_serialize/src/opaque.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
(  9)    802.4 ( 2.4%, 48.4%): library/alloc/src/vec/mod.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
( 10)    757.8 ( 2.3%, 50.7%): compiler/rustc_span/src/lib.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
( 13)    466.1 ( 1.4%, 56.3%): ???:SetImpliedBits(llvm::FeatureBitset&, llvm::FeatureBitset const&, llvm::ArrayRef<llvm::SubtargetFeatureKV>)
( 14)    406.1 ( 1.2%, 57.5%): hashbrown-0.12.0/src/raw/mod.rs:<hashbrown::map::RawEntryBuilderMut<rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>, (), core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::from_hash::<hashbrown::map::equivalent<rustc_middle::ty::sty::TyKind, rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>>::{closure#0}>
( 15)    367.0 ( 1.1%, 58.6%): compiler/rustc_serialize/src/leb128.rs:<rustc_metadata::rmeta::decoder::DecodeContext as rustc_serialize::serialize::Decoder>::read_u32
( 17)    342.1 ( 1.0%, 60.7%): library/core/src/slice/iter/macros.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(),core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}>
( 18)    340.9 ( 1.0%, 61.7%): library/core/src/slice/iter/macros.rs:<rustc_span::source_map::SourceMap>::new_imported_source_file
( 19)    340.0 ( 1.0%, 62.8%): compiler/rustc_span/src/lib.rs:<rustc_span::source_map::SourceMap>::new_imported_source_file
( 21)    243.4 ( 0.7%, 64.4%): compiler/rustc_metadata/src/rmeta/decoder.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}>
( 22)    239.5 ( 0.7%, 65.2%): library/core/src/num/uint_macros.rs:<rustc_data_structures::sip128::SipHasher128>::short_write_process_buffer::<u64>
( 24)    220.0 ( 0.7%, 66.5%): compiler/rustc_span/src/lib.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}>
( 25)    214.2 ( 0.6%, 67.1%): compiler/rustc_serialize/src/leb128.rs:<rustc_serialize::opaque::Decoder as rustc_serialize::serialize::Decoder>::read_usize
( 27)    185.4 ( 0.6%, 68.3%): hashbrown-0.12.0/src/map.rs:<hashbrown::map::RawEntryBuilderMut<rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>, (), core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::from_hash::<hashbrown::map::equivalent<rustc_middle::ty::sty::TyKind, rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>>::{closure#0}>
( 28)    183.1 ( 0.6%, 68.9%): library/std/src/sys/unix/alloc.rs:__rdl_alloc
( 29)    182.1 ( 0.5%, 69.4%): compiler/rustc_middle/src/ty/context.rs:<rustc_middle::ty::context::CtxtInterners>::intern_ty
( 30)    181.7 ( 0.5%, 70.0%): compiler/rustc_middle/src/ty/sty.rs:<rustc_middle::ty::sty::TyKind as core::hash::Hash>::hash::<rustc_hash::FxHasher>

Hot functions in multiple crates

This section lists all the functions that hit 1.5% or higher in one benchmark and appear in more than one benchmark. It's a long list. Related functions (i.e. functions that are hot in tandem) are grouped together.

[nnethercote, mostly related to macro parsing, greatly improved, see here for details]

_int_free/_int_malloc/malloc/free/malloc_consolidate, etc., as represented by
_int_free
315: 6.67% async-std-1.10.0
375: 5.85% yansi-0.5.0
401: 5.60% time-macros-0.2.3
582: 4.50% inotify-0.10.0
591: 4.47% web-sys-0.3.56
667: 4.19% nix-0.23.1
685: 4.13% vsdb-0.13.10
687: 4.11% cloudabi-0.1.0
692: 4.10% vsdb_derive-0.2.2
706: 4.06% pest_generator-2.1.3
726: 3.99% futures-lite-1.12.0
736: 3.95% scroll_derive-0.11.0
739: 3.92% num-derive-0.3.3
744: 3.91% raw-cpuid-10.2.0
751: 3.89% clap_derive-3.0.12
755: 3.89% prost-derive-0.9.0
760: 3.88% tonic-build-0.6.2
763: 3.86% pyo3-macros-backend-0.15.1
764: 3.86% diesel_derives-1.4.1
765: 3.85% wasm-bindgen-backend-0.2.79

This table undersells the cost of allocations a lot, because it's only showing _int_free results. But it also oversells a little in a different way, because jemalloc is more efficient than glibc malloc (which is measured here). We can probably assume allocations in general account for double the percentage in this table. See the DHAT results for more data.

Note that these crates correlate highly with the crates where parse_tt and related functions are hot.

[nnethercote, greatly improved, see here for details]

These are all the macro parsing functions.

macro_parser::parse_tt 
359: 5.99%  async-std-1.10.0
462: 5.08%  async-std-1.10.0
550: 4.63%  time-macros-0.2.3
642: 4.28%  yansi-0.5.0
879: 3.65%  time-macros-0.2.3
1115: 3.32%  yansi-0.5.0
2884: 2.29%  num-derive-0.3.3
2939: 2.25%  pest_generator-2.1.3
3095: 2.12%  ctor-0.1.21
3103: 2.11%  scroll_derive-0.11.0
3112: 2.10%  tonic-build-0.6.2
3167: 2.06%  vsdb_derive-0.2.2
3265: 2.00%  stdweb-derive-0.5.3
3500: 1.91%  mockall_derive-0.11.0
3512: 1.91%  wasm-bindgen-backend-0.2.79
3581: 1.89%  futures-macro-0.3.19
3620: 1.88%  wayland-scanner-0.30.0-alpha3
3623: 1.88%  clap_derive-3.0.12
3806: 1.82%  prost-derive-0.9.0
3880: 1.80%  diesel_derives-1.4.1

<rustc_parse::parser::Parser>::{bump,bump_with}
603: 4.43% bump time-macros-0.2.3
791: 3.80% bump async-std-1.10.0
2784: 2.36% bump yansi-0.5.0
3240: 2.01% bump_with time-macros-0.2.3
3673: 1.87% bump web-sys-0.3.56
4078: 1.73% bump_with async-std-1.10.0
4228: 1.67% bump num-derive-0.3.3
4549: 1.57% bump futures-macro-0.3.19
4566: 1.56% bump ctor-0.1.21
5171: 1.39% bump mockall_derive-0.11.0
5288: 1.37% bump pest_generator-2.1.3
5343: 1.35% bump wasm-bindgen-backend-0.2.79
5359: 1.35% bump tonic-build-0.6.2
5453: 1.32% bump vsdb_derive-0.2.2
5634: 1.27% bump wayland-scanner-0.30.0-alpha3
5647: 1.27% bump stdweb-derive-0.5.3
5845: 1.22% bump scroll_derive-0.11.0
5872: 1.22% bump enum-as-inner-0.3.3
5918: 1.20% bump clap_derive-3.0.12
5934: 1.20% bump ref-cast-impl-1.0.6

<rustc_parse::parser::TokenCursor>::{next,next_desugared}
633: 4.30% next time-macros-0.2.3
844: 3.70% next async-std-1.10.0
2770: 2.37% next yansi-0.5.0
3312: 1.98% next web-sys-0.3.56
3504: 1.91% next_desugared time-macros-0.2.3
4214: 1.68% next num-derive-0.3.3
4490: 1.59% next futures-macro-0.3.19
4569: 1.56% next ctor-0.1.21
5071: 1.42% next mockall_derive-0.11.0
5240: 1.38% next pest_generator-2.1.3
5274: 1.37% next wasm-bindgen-backend-0.2.79
5328: 1.35% next tonic-build-0.6.2
5396: 1.33% next vsdb_derive-0.2.2
5576: 1.28% next wayland-scanner-0.30.0-alpha3
5651: 1.27% next stdweb-derive-0.5.3
5714: 1.25% next enum-as-inner-0.3.3
5844: 1.22% next scroll_derive-0.11.0
5874: 1.21% next clap_derive-3.0.12
5888: 1.21% next_desugared async-std-1.10.0
5999: 1.18% next ref-cast-impl-1.0.6

<rustc_ast::tokenstream::Cursor>::next_with_spacing
1580: 3.03%  time-macros-0.2.3
2386: 2.61%  async-std-1.10.0
4437: 1.61%  yansi-0.5.0
5322: 1.36%  web-sys-0.3.56
6027: 1.17%  num-derive-0.3.3
6302: 1.10%  futures-macro-0.3.19
6351: 1.09%  ctor-0.1.21
6867: 0.97%  mockall_derive-0.11.0
6870: 0.97%  pest_generator-2.1.3
6939: 0.96%  tonic-build-0.6.2
7000: 0.95%  wasm-bindgen-backend-0.2.79
7118: 0.93%  vsdb_derive-0.2.2
7406: 0.89%  stdweb-derive-0.5.3
7409: 0.89%  wayland-scanner-0.30.0-alpha3
7619: 0.86%  scroll_derive-0.11.0
7735: 0.85%  ref-cast-impl-1.0.6
7758: 0.85%  enum-as-inner-0.3.3
7771: 0.85%  clap_derive-3.0.12
7917: 0.82%  pyo3-macros-backend-0.15.1
8071: 0.80%  prost-derive-0.9.0

<rustc_expand::mbe::macro_parser::MatcherPos as core::clone::Clone>::clone
<rustc_expand::mbe::macro_parser::MatcherPosHandle as core::clone::Clone>::clone
4659: 1.54% MatcherPos async-std-1.10.0
6090: 1.15% MatcherPos time-macros-0.2.3
6831: 0.98% MatcherPos yansi-0.5.0
14457: 0.39% MatcherPos inotify-0.10.0
18643: 0.31% MatcherPosHandle async-std-1.10.0
23304: 0.26% MatcherPos funty-2.0.0
24134: 0.25% MatcherPosHandle time-macros-0.2.3
26172: 0.24% MatcherPos rustfix-0.6.0
29551: 0.22% MatcherPos async-std-1.10.0
33695: 0.19% MatcherPosHandle yansi-0.5.0

[nnethercote, #93984, completed, addresses the biggest of these: keccak, http, vte]

memcpy
(also: 9.19% for keccak in rustc-perf)
 274: 7.35% http-0.2.6
2466: 2.56% vte-0.10.1
2721: 2.40% js-sys-0.3.56
2801: 2.35% unic-ucd-segment-0.9.0
2940: 2.25% aes-gcm-0.9.4
2953: 2.24% pbkdf2-0.10.0
2976: 2.21% stdweb-derive-0.5.3
2999: 2.20% c2-chacha-0.3.3
3047: 2.15% pest_generator-2.1.3
3072: 2.13% rls-data-0.19.1
3073: 2.13% tonic-build-0.6.2
3076: 2.13% num-derive-0.3.3
3101: 2.11% ctor-0.1.21
3115: 2.10% sentry-types-0.24.2
3116: 2.10% pest-2.1.3
3117: 2.10% lsp-types-0.91.1
3122: 2.10% mockall_derive-0.11.0
3124: 2.09% wasm-bindgen-backend-0.2.79
3128: 2.09% postgres-protocol-0.6.3
3136: 2.09% cargo_metadata-0.14.1

keccak and http-0.2.6 high numbers are due to large bitsets in borrowck dataflow analysis. Note that keccak-0.1.0 has some significant changes vs. keccak in rustc-benchmarks.

[Hard to improve. On x86-64 we query ~50 target feature flags, for things like SSE*, AVX*, etc. This is within target_features in compiler/rustc_codegen_llvm/src/llvm_util.rs. We check one flag at a time because the LLVM interface makes it hard to do otherwise, and LLVM is moderately slow to check each one. Even though it's a significant fraction of execution time for small programs, the absolute time is low, so doesn't seem worth any further effort.]

???:SetImpliedBits(llvm::FeatureBitset&, llvm::FeatureBitset const&, llvm::ArrayRef<llvm::SubtargetFeatureKV>)
1425: 3.11% opaque-debug-0.3.0
1436: 3.10% new_debug_unreachable-1.0.4
1443: 3.10% tinyvec_macros-0.1.0
1570: 3.03% matches-0.1.9
1692: 2.97% cfg-if-1.0.0
1699: 2.97% pin-utils-0.1.0
1733: 2.95% match_cfg-0.1.0
1775: 2.93% fuchsia-cprng-0.1.1
1918: 2.85% cty-0.2.2
1923: 2.85% unic-ucd-version-0.9.0
1999: 2.82% if_chain-1.0.2
2114: 2.76% assert_matches-1.5.0
2144: 2.74% more-asserts-0.2.2
2206: 2.71% wincolor-1.0.3
2217: 2.71% winapi-util-0.1.5
2219: 2.71% fsevent-sys-4.1.0
2256: 2.68% miow-0.4.0
2292: 2.66% cpufeatures-0.2.1
2309: 2.65% schannel-0.1.19
2354: 2.63% byte-tools-0.3.1

This is significant only for very small crates. It's getting some target feature information from LLVM.

[nnethercote, #97575 fixes it]

<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
404: 5.57% (2793830 Ir) fsevent-sys-4.1.0
406: 5.56% (2793830 Ir) winapi-util-0.1.5
412: 5.55% (2793830 Ir) wincolor-1.0.3
420: 5.49% (2793830 Ir) miow-0.4.0
426: 5.44% (2793830 Ir) schannel-0.1.19
444: 5.22% (2793830 Ir) output_vt100-0.1.2
446: 5.22% (2793830 Ir) precomputed-hash-0.1.1
448: 5.19% (2793830 Ir) typeable-0.1.2
453: 5.12% (2793830 Ir) encoding_index_tests-0.1.4
464: 5.06% (2929437 Ir) crossbeam-0.8.1
471: 5.03% (2793830 Ir) fsevent-2.1.2
476: 5.01% (2843100 Ir) enum_primitive-0.1.1
481: 4.97% (2793830 Ir) block-cipher-0.99.99
482: 4.97% (2793830 Ir) stream-cipher-0.99.99
484: 4.95% (2793830 Ir) string_cache_shared-0.3.0
489: 4.93% (2475668 Ir) winapi-util-0.1.5
490: 4.93% (2475668 Ir) fsevent-sys-4.1.0
493: 4.92% (2475668 Ir) wincolor-1.0.3
496: 4.90% (2793830 Ir) maplit-1.0.2
498: 4.89% (2793830 Ir) mac-0.1.1

...::imported_source_files::...
3064: 2.14% (1071721 Ir) fsevent-sys-4.1.0
3075: 2.13% (1071721 Ir) winapi-util-0.1.5
3077: 2.13% (1071721 Ir) wincolor-1.0.3
3098: 2.11% (1071721 Ir) miow-0.4.0
3131: 2.09% (1071721 Ir) schannel-0.1.19
3248: 2.00% (1071721 Ir) output_vt100-0.1.2
3256: 2.00% (1071721 Ir) precomputed-hash-0.1.1
3289: 1.99% (1071721 Ir) typeable-0.1.2
3350: 1.96% (1071721 Ir) encoding_index_tests-0.1.4
3369: 1.95% (1130584 Ir) crossbeam-0.8.1
3430: 1.93% (1093218 Ir) enum_primitive-0.1.1
3446: 1.93% (1071721 Ir) fsevent-2.1.2
3498: 1.91% (1071721 Ir) stream-cipher-0.99.99
3501: 1.91% (1071721 Ir) block-cipher-0.99.99
3547: 1.90% (1071721 Ir) string_cache_shared-0.3.0
3631: 1.88% (1071721 Ir) maplit-1.0.2
3664: 1.87% (1071721 Ir) mac-0.1.1
3746: 1.84% (1071721 Ir) wayland-protocols-0.30.0-alpha3
3834: 1.82% (1059387 Ir) num-0.4.0
3882: 1.80% (1071721 Ir) mio-named-pipes-0.1.7

<rustc_span::source_map::SourceMap>::new_imported_source_file
3074: 2.13% (1067761 Ir) winapi-util-0.1.5
3078: 2.13% (1067761 Ir) fsevent-sys-4.1.0
3089: 2.12% (1067761 Ir) wincolor-1.0.3
3090: 2.12% (1064836 Ir) fsevent-sys-4.1.0
3094: 2.12% (1064836 Ir) winapi-util-0.1.5
3097: 2.12% (1064836 Ir) wincolor-1.0.3
3114: 2.10% (1067761 Ir) miow-0.4.0
3130: 2.09% (1064836 Ir) miow-0.4.0
3148: 2.08% (1067761 Ir) schannel-0.1.19
3153: 2.07% (1064836 Ir) schannel-0.1.19
3271: 2.00% (1067761 Ir) precomputed-hash-0.1.1
3275: 1.99% (1064836 Ir) precomputed-hash-0.1.1
3288: 1.99% (1064836 Ir) output_vt100-0.1.2
3292: 1.99% (1067761 Ir) output_vt100-0.1.2
3300: 1.98% (1064836 Ir) typeable-0.1.2
3322: 1.98% (1067761 Ir) typeable-0.1.2
3357: 1.96% (1067761 Ir) encoding_index_tests-0.1.4
3389: 1.95% (1064836 Ir) encoding_index_tests-0.1.4
3425: 1.94% (1123069 Ir) crossbeam-0.8.1
3426: 1.94% (1126309 Ir) crossbeam-0.8.1

<rustc_metadata::rmeta::decoder::DecodeContext as rustc_serialize::serialize::Decoder>::read_u32
4352: 1.63% pretty_env_logger-0.4.0
6813: 0.98% rand_os-0.2.2
6968: 0.96% async-compression-0.3.12
6969: 0.96% thread-id-4.0.0
7084: 0.94% strum-0.23.0
7124: 0.93% tokio-buf-0.2.0-alpha.1
7210: 0.92% matchers-0.1.0
7228: 0.92% void-1.0.2
7344: 0.90% crypto-hash-0.3.4
7352: 0.90% gethostname-0.2.2
7391: 0.90% errno-0.2.8
7521: 0.88% inotify-sys-0.1.5
7579: 0.87% atomic-waker-1.0.0
7606: 0.86% malloc_buf-1.0.0
7670: 0.86% terminal_size-0.1.17
7679: 0.86% remove_dir_all-0.7.0
7759: 0.85% hostname-0.3.1
7865: 0.83% ident_case-1.0.1
7906: 0.82% atty-0.2.14
7944: 0.82% clicolors-control-1.0.1

Metadata decoding. High relative number for many, but mostly on very short-running crates, with constant amounts of decoding, presumably for decoding common libs like std, core.

[nnethercote, #94316]

rustc_lexer::unescape::scan_escape
1355: 3.15%  pkcs8-0.8.0
4650: 1.54%  der-0.6.0-pre.0
5519: 1.30%  bitvec-1.0.0
6190: 1.12%  snafu-0.7.0
7261: 0.91%  unicode_categories-0.1.1
7732: 0.85%  web-sys-0.3.56
10163: 0.58%  elliptic-curve-0.12.0-pre.1
12835: 0.44%  rusoto_s3-0.47.0
13723: 0.41%  pkcs8-0.8.0
14869: 0.38%  bumpalo-3.9.1

rustc_lexer::unescape::unescape_literal::<<rustc_ast::ast::LitKind>::from_lit_token and friends
2843: 2.32% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
6105: 1.15% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
7234: 0.91% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> bitvec-1.0.0
7904: 0.82% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
8997: 0.69% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1
9719: 0.62% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> snafu-0.7.0
9847: 0.61% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
11220: 0.52% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
11318: 0.51% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0
13177: 0.43% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> elliptic-curve-0.12.0-pre.1
14008: 0.40% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
17420: 0.33% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> rusoto_s3-0.47.0
18179: 0.32% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> bitvec-1.0.0
18485: 0.31% <rustc_ast::ast::Lit>::from_lit_token lexical-core-0.8.2
19087: 0.30% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
19243: 0.30% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1
19244: 0.30% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1
23059: 0.26% <rustc_ast::ast::Lit>::from_lit_token web-sys-0.3.56
23736: 0.26% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0
23924: 0.26% <rustc_ast::ast::Lit>::from_lit_token async-compression-0.3.12

[nnethercote, #98153]

<rustc_lint::builtin::MissingDoc as rustc_lint::passes::LateLintPass>::enter_lint_attrs
1631: 3.00%  structopt-0.3.26
3259: 2.00%  structopt-0.3.26
4776: 1.50%  mockall-0.11.0
6739: 1.00%  mockall-0.11.0
9250: 0.67%  derive_builder-0.10.2
10079: 0.59%  tracing-0.1.29
12858: 0.44%  derive_builder-0.10.2
14079: 0.40%  tracing-0.1.29
16334: 0.35%  pest_derive-2.1.0
16479: 0.34%  bitflags-1.3.2

[lcnr + nnethercote, #97345]

super_relate_consts::<rustc_infer::infer::equate::Equate>
super_relate_consts::<rustc_infer::infer::combine::ConstInferUnifier>
1294: 3.19% equate::Equate> bitmaps-3.1.0
6584: 1.03% equate::Equate> hex-0.4.3
6740: 1.00% equate::Equate> bitmaps-3.1.0
8574: 0.74% combine::ConstInferUnifier> nalgebra-0.30.1
11867: 0.49% equate::Equate> secrecy-0.8.0
15796: 0.36% equate::Equate> nalgebra-0.30.1
18242: 0.32% equate::Equate> hex-0.4.3
27985: 0.23% combine::ConstInferUnifier> nalgebra-0.30.1
28678: 0.22% equate::Equate> bytemuck-1.7.3
38119: 0.17% equate::Equate> bytestring-1.0.0

super_relate_tys::<rustc_infer::infer::equate::Equate>
super_relate_tys::<rustc_infer::infer::combine::Generalizer>
4144: 1.71% equate::Equate bitmaps-3.1.0
7745: 0.85% combine::Generalizer pbkdf2-0.10.0
9182: 0.67% equate::Equate hex-0.4.3
9981: 0.60% equate::Equate nalgebra-0.30.1
11379: 0.51% combine::Generalizer aes-gcm-0.9.4
15196: 0.37% combine::Generalizer quickcheck-1.0.3
15212: 0.37% combine::Generalizer sha3-0.10.0
17959: 0.32% equate::Equate secrecy-0.8.0
18288: 0.31% equate::Equate vsdb-0.13.10
18587: 0.31% combine::Generalizer jsonrpc-client-transports-18.0.0

[This code was heavily optimised a couple of years ago for rustc-perf benchmarks like keccak and inflate, and further improvements are difficult. #97674 has some small improvements.]

process_obligations
2037: 2.80%  wast-39.0.0
2614: 2.47%  wast-39.0.0
2926: 2.26%  wast-39.0.0
3012: 2.18%  rustc-serialize-0.3.24
3017: 2.18%  wasmparser-0.82.0
3854: 1.81%  rustc-serialize-0.3.24
4291: 1.65%  rustc-serialize-0.3.24
4434: 1.61%  wasmparser-0.82.0
4457: 1.60%  wast-39.0.0
4482: 1.59%  wast-39.0.0
4726: 1.51%  wast-39.0.0
4856: 1.48%  inflate-0.4.5
4971: 1.45%  mime-0.3.16
5144: 1.40%  wasmparser-0.82.0
5555: 1.29%  mime-0.3.16
5684: 1.26%  wast-39.0.0
5924: 1.20%  rustc-serialize-0.3.24
6093: 1.15%  wast-39.0.0
6141: 1.14%  wasmparser-0.82.0
6188: 1.13%  inflate-0.4.5
6250: 1.11%  rustc-serialize-0.3.24
6284: 1.10%  primitive-types-0.10.1
6328: 1.09%  rustc-serialize-0.3.24
6343: 1.09%  inflate-0.4.5
6357: 1.09%  wast-39.0.0
7032: 0.95%  keccak-0.1.0
7038: 0.95%  primitive-types-0.10.1
7159: 0.93%  keccak-0.1.0
7166: 0.93%  wasmparser-0.82.0
7168: 0.93%  wasmparser-0.82.0

uninlined_get_root_key
4727: 1.51% (211196081 Ir) wast-39.0.0
5969: 1.19% (5880320 Ir) mime-0.3.16
6988: 0.95% (79707031 Ir) redis-0.21.5
10444: 0.57% (34953834 Ir) rustc-serialize-0.3.24
14051: 0.40% (2061753 Ir) keccak-0.1.0
14277: 0.39% (439948359 Ir) nalgebra-0.30.1
16888: 0.34% (22681340 Ir) http-0.2.6
17441: 0.33% (12036120 Ir) vte-0.10.1
18441: 0.31% (27171051 Ir) procfs-0.12.0
23024: 0.26% (6715340 Ir) rand-0.8.4

A few crates over-represented: wast-39.0.0, rustc-serialize-0.3.24, wasmparser-0.82.0, inflate-0.4.5.

[This is caused by lots of type folding and interning, very hard to improve.]

hashbrown...::from_hash::
2898: 2.28% cexpr-0.6.0
3311: 1.98% combine-4.6.3
3877: 1.80% diesel-1.4.8
4972: 1.45% pest_meta-2.1.3
5334: 1.35% pbkdf2-0.10.0
5413: 1.33% der-parser-6.0.1
5442: 1.32% arbitrary-1.0.3
5701: 1.26% redis-0.21.5
5772: 1.24% actix-web-4.0.0-beta.21
6016: 1.18% bitvec-1.0.0
6029: 1.17% cookie_store-0.15.1
6044: 1.17% quickcheck-1.0.3
6088: 1.15% tera-1.15.0
6097: 1.15% elliptic-curve-0.12.0-pre.1
6162: 1.13% aes-gcm-0.9.4
6183: 1.13% clap-3.0.13
6200: 1.12% actix-http-3.0.0-beta.19
6379: 1.08% jsonrpc-client-transports-18.0.0
6412: 1.07% convert_case-0.5.0
6430: 1.07% cexpr-0.6.0

[nnethercote, #96210 + #96683]

<rustc_parse::lexer::StringReader>::next_token
2433: 2.58% web-sys-0.3.56
5945: 1.19% bitflags-1.3.2
8053: 0.81% unicode_categories-0.1.1
8215: 0.78% quick-error-2.0.1
8584: 0.74% pin-project-lite-0.2.8
9592: 0.63% pest-2.1.3
10257: 0.58% mio-named-pipes-0.1.7
10363: 0.57% fixed-hash-0.7.0
10450: 0.57% web-sys-0.3.56
10505: 0.56% tracing-0.1.29
10606: 0.56% uint-0.9.2
10684: 0.55% downcast-rs-1.2.0
11257: 0.52% arrayref-0.3.6
11538: 0.50% web-sys-0.3.56
11961: 0.48% idna-0.2.3
12249: 0.47% jni-sys-0.3.0
12330: 0.46% static_assertions-1.1.0
12674: 0.45% assert_matches-1.5.0
12823: 0.44% parking_lot-0.12.0
13230: 0.43% web-sys-0.3.56

<rustc_parse::lexer::tokentrees::TokenTreesReader>::parse_token_tree
4048: 1.74%  web-sys-0.3.56
7244: 0.91%  bitflags-1.3.2
9726: 0.62%  quick-error-2.0.1
9990: 0.60%  pin-project-lite-0.2.8
11603: 0.50%  unicode_categories-0.1.1
12702: 0.45%  pest-2.1.3
12806: 0.44%  fixed-hash-0.7.0
12822: 0.44%  mio-named-pipes-0.1.7
12999: 0.44%  tracing-0.1.29
13026: 0.43%  downcast-rs-1.2.0

<rustc_lexer::cursor::Cursor>::advance_token
4211: 1.68%  web-sys-0.3.56
7009: 0.95%  bitflags-1.3.2
9521: 0.64%  pin-project-lite-0.2.8
9932: 0.60%  quick-error-2.0.1
11691: 0.49%  unicode_categories-0.1.1
12004: 0.48%  pest-2.1.3
12528: 0.45%  mio-named-pipes-0.1.7
12625: 0.45%  static_assertions-1.1.0
12643: 0.45%  tracing-0.1.29
13025: 0.43%  downcast-rs-1.2.0

[nnethercote, #93984]

BitSet<...>::union
2929: 2.26% http-0.2.6
4775: 1.50% vte-0.10.1
6620: 1.02% language-tags-0.3.2
8138: 0.79% vte-0.10.1
10793: 0.54% tinyvec-1.5.1
11031: 0.53% stdweb-derive-0.5.3
11483: 0.50% language-tags-0.3.2
12603: 0.45% keccak-0.1.0
13153: 0.43% wasmparser-0.82.0
15617: 0.36% futures-macro-0.3.19
16358: 0.35% regalloc-0.0.34
16887: 0.34% http-0.2.6
17406: 0.33% json-0.12.4
21027: 0.28% inflate-0.4.5
21621: 0.28% num-derive-0.3.3
22142: 0.27% cranelift-codegen-meta-0.80.0
26357: 0.24% wasm-bindgen-backend-0.2.79
26554: 0.24% vte-0.10.1
27012: 0.23% enumset_derive-0.5.5
29555: 0.22% mockall_derive-0.11.0

[lcnr + nnethercote, #97345]

<rustc_trait_selection::traits::select::SelectionContext>::match_impl
3016: 2.18% match_impl bitmaps-3.1.0
6666: 1.01% match_impl nalgebra-0.30.1
7145: 0.93% match_impl bitmaps-3.1.0
7985: 0.81% match_impl hex-0.4.3
12231: 0.47% match_impl scroll-0.11.0
12901: 0.44% match_impl bitmaps-3.1.0
12903: 0.44% match_impl bitmaps-3.1.0
13382: 0.42% match_impl bitmaps-3.1.0
13727: 0.41% match_impl ordered-float-2.10.0
14243: 0.40% match_impl nalgebra-0.30.1
14451: 0.39% match_impl bytestring-1.0.0
14817: 0.38% match_impl::{closure#0}> bitmaps-3.1.0
14829: 0.38% match_impl bitmaps-3.1.0
14956: 0.38% match_impl lzw-0.10.0
15418: 0.36% match_impl strsim-0.10.0
15450: 0.36% match_impl::{closure#0}> bitmaps-3.1.0
16229: 0.35% match_impl num-complex-0.4.0
16832: 0.34% match_impl hex-0.4.3
17045: 0.33% match_impl aes-gcm-0.9.4
17628: 0.32% match_impl subtle-2.4.1

[lcnr + nnethercote, #97345]

fast_reject::simplify_type
4642: 1.54% bitmaps-3.1.0
10384: 0.57% nalgebra-0.30.1
11646: 0.50% num-complex-0.4.0
11790: 0.49% bytestring-1.0.0
11832: 0.49% ordered-float-2.10.0
13444: 0.42% hex-0.4.3
14691: 0.38% scroll-0.11.0
14982: 0.38% bigdecimal-0.3.0
18130: 0.32% lzw-0.10.0
19718: 0.30% aes-gcm-0.9.4

[lcnr + nnethercote, #97345]

<rustc_infer::infer::InferCtxtInner>::rollback_to
4787: 1.50% (150578940 Ir) bitmaps-3.1.0
7036: 0.95% (95183010 Ir) bitmaps-3.1.0
8789: 0.72% (2176185 Ir) secrecy-0.8.0
10271: 0.58% (41184070 Ir) rustc-rayon-0.3.2
10343: 0.57% (5906612 Ir) hex-0.4.3
10497: 0.56% (4717135 Ir) scroll-0.11.0
10703: 0.55% (628248294 Ir) nalgebra-0.30.1
11310: 0.51% (29431780 Ir) serde_with-1.11.0
11410: 0.51% (174009408 Ir) diesel-1.4.8
11837: 0.49% (5214125 Ir) ordered-float-2.10.0
12520: 0.45% (1460629 Ir) strsim-0.10.0
13314: 0.42% (13746895 Ir) funty-2.0.0
13350: 0.42% (1903593 Ir) aes-gcm-0.9.4
13428: 0.42% (8132966 Ir) num-complex-0.4.0
13740: 0.41% (15534789 Ir) arbitrary-1.0.3
13947: 0.40% (1519390 Ir) bytemuck-1.7.3
14557: 0.39% (1717338 Ir) pbkdf2-0.10.0
14940: 0.38% (8150921 Ir) parity-scale-codec-2.3.1
15389: 0.37% (37069588 Ir) bitmaps-3.1.0
15499: 0.36% (2017806 Ir) smallvec-1.8.0

round-2-llvm-lines-leaf-crate

Executive summary

Very little room for improvement here.

The top functions in std, alloc and core, as weighted by "Lines" counts. (The percentages here are more useful as a relative measure than an absolute measure.)

13677742 counts (weighted integral, erased)
(  1)   269227 ( 2.0%,  2.0%): <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
(  2)   255665 ( 1.9%,  3.8%): alloc::raw_vec::RawVec<T,A>::grow_amortized
(  3)   228899 ( 1.7%,  5.5%): core::option::Option<T>::map
(  4)   158437 ( 1.2%,  6.7%): alloc::alloc::box_free
(  5)   154300 ( 1.1%,  7.8%): alloc::raw_vec::RawVec<T,A>::allocate_in
(  6)   151742 ( 1.1%,  8.9%): core::iter::traits::iterator::Iterator::try_fold
(  7)   140484 ( 1.0%,  9.9%): alloc::raw_vec::RawVec<T,A>::current_memory
(  8)   136639 ( 1.0%, 10.9%): core::iter::traits::iterator::Iterator::fold
(  9)   136108 ( 1.0%, 11.9%): core::result::Result<T,E>::map_err
( 10)   135024 ( 1.0%, 12.9%): core::mem::replace
( 11)   128457 ( 0.9%, 13.9%): <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
( 12)   114300 ( 0.8%, 14.7%): <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
( 13)   104392 ( 0.8%, 15.5%): core::ptr::read
( 14)    97865 ( 0.7%, 16.2%): <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
( 15)    94520 ( 0.7%, 16.9%): core::alloc::layout::Layout::array
( 16)    83034 ( 0.6%, 17.5%): core::slice::iter::Iter<T>::post_inc_start
( 17)    82289 ( 0.6%, 18.1%): core::ops::function::FnOnce::call_once
( 18)    78632 ( 0.6%, 18.6%): core::iter::adapters::map::map_fold::{{closure}}
( 19)    78011 ( 0.6%, 19.2%): core::result::Result<T,E>::map                                       
( 20)    75512 ( 0.6%, 19.8%): core::slice::iter::Iter<T>::new
( 21)    73783 ( 0.5%, 20.3%): <&T as core::fmt::Debug>::fmt
( 22)    71914 ( 0.5%, 20.8%): <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop         
( 23)    70031 ( 0.5%, 21.3%): <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::next
( 24)    69193 ( 0.5%, 21.8%): core::ptr::metadata::from_raw_parts_mut                              
( 25)    69071 ( 0.5%, 22.4%): <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
( 26)    67503 ( 0.5%, 22.8%): alloc::vec::Vec<T,A>::push                                           
( 27)    64384 ( 0.5%, 23.3%): core::fmt::ArgumentV1::new
( 28)    60300 ( 0.4%, 23.8%): core::mem::maybe_uninit::MaybeUninit<T>::assume_init                 
( 29)    55834 ( 0.4%, 24.2%): core::char::methods::encode_utf8_raw
( 30)    55700 ( 0.4%, 24.6%): alloc::vec::Vec<T,A>::extend_desugared

grow_amortized has been heavily optimized in the past, and the other top functions are generally very small, hard to improve upon.

One possibility: map_fold: just inline and remove it? Most affected: actix-router, quote, diesel_derives, bytecount [nnethercote, #94442], didn't help]

round-3-dhat

Executive summary

parse_tt and related macro parsing functions cause by far the most allocations, and correlate highly with Cachegrind results. [nnethercote, this blog post has details]
Large BitSets are the next best opportunity, featuring in several crates. [nnethercote, #93984]
After that, a handful of areas that would help one or two crates and might be worth some effort to look for easy wins, e.g. match_impl, super_relate_tys, ena snapshot vecs, escape_defaults, ModChild, thir mirror_expr_inner, etc. [lcnr + nnethercote, #97345, deals with match_impl/super_relate_tys; nnethercote #98569, deals with ModChild; not much other scope for easy improvements]

Top 20 malloc users, from Cachegrind, and biggest source of allocations as determined by looking at the DHAT profiles.

315: 6.67% async-std-1.10.0                 macro parsing
375: 5.85% yansi-0.5.0                      macro parsing
401: 5.60% time-macros-0.2.3                macro parsing
582: 4.50% inotify-0.10.0                   macro parsing
591: 4.47% web-sys-0.3.56                   other parsing/AST stuff
667: 4.19% nix-0.23.1                       macro parsing
685: 4.13% vsdb-0.13.10                     super_relate_tys
687: 4.11% cloudabi-0.1.0                   very spread out, no esp. hot places
692: 4.10% vsdb_derive-0.2.2                macro parsing
706: 4.06% pest_generator-2.1.3             macro parsing
726: 3.99% futures-lite-1.12.0              macro parsing (a little), spread out
736: 3.95% scroll_derive-0.11.0             macro parsing
739: 3.92% num-derive-0.3.3                 macro parsing
744: 3.91% raw-cpuid-10.2.0                 very spread out
751: 3.89% clap_derive-3.0.12               macro parsing
755: 3.89% prost-derive-0.9.0               macro parsing
760: 3.88% tonic-build-0.6.2                macro parsing
763: 3.86% pyo3-macros-backend-0.15.1       macro parsing
764: 3.86% diesel_derives-1.4.1             macro parsing
765: 3.85% wasm-bindgen-backend-0.2.79      macro parsing

Hottest program points (PPs) by allocation rate (blocks). This isn't a perfect metric because sometimes multiple distinct PPs are best considered in combination, which requires human understanding of the stack traces. But it's a good start, and while the Cachegrind numbers can be high if there's lots of allocations spread across lots of places, single PPs that allocate a lot are more likely to be optimizable.

Ones marked with ** are not in the top 20 Cachegrind list above.

251.3 / Minstr (5,362,131 blocks) / async-std-1.10.0
251.0 / Minstr (5,355,825 blocks) / async-std-1.10.0
212.0 / Minstr (2,110,433 blocks) / bitmaps-3.1.0 **        match_impl
176.2 / Minstr (3,759,571 blocks) / async-std-1.10.0
176.1 / Minstr (654,394 blocks) / time-macros-0.2.3
176.0 / Minstr (654,019 blocks) / time-macros-0.2.3
175.7 / Minstr (653,182 blocks) / time-macros-0.2.3
175.7 / Minstr (3,750,274 blocks) / async-std-1.10.0
173.8 / Minstr (646,121 blocks) / time-macros-0.2.3
154.2 / Minstr (158,177 blocks) / yansi-0.5.0
150.7 / Minstr (154,606 blocks) / yansi-0.5.0
98.5 / Minstr (280,913 blocks) / num-derive-0.3.3
97.5 / Minstr (164,019 blocks) / vsdb_derive-0.2.2
97.0 / Minstr (190,944 blocks) / pest_generator-2.1.3
90.7 / Minstr (225,904 blocks) / tonic-build-0.6.2
90.7 / Minstr (108,378 blocks) / scroll_derive-0.11.0
90.3 / Minstr (78,899 blocks) / ctor-0.1.21
87.3 / Minstr (89,576 blocks) / yansi-0.5.0
86.7 / Minstr (340,107 blocks) / clap_derive-3.0.12
86.1 / Minstr (88,312 blocks) / yansi-0.5.0
85.9 / Minstr (75,505 blocks) / stdweb-derive-0.5.3 **       macro parsing
81.7 / Minstr (312,670 blocks) / wasm-bindgen-backend-0.2.79
81.3 / Minstr (60,048 blocks) / enumflags2_derive-0.7.3 **   macro parsing
81.3 / Minstr (595,413 blocks) / mockall_derive-0.11.0 **    macro parsing
80.7 / Minstr (284,977 blocks) / wayland-scanner-0.30.0-alpha3 ** macro parsing
79.3 / Minstr (133,508 blocks) / futures-macro-0.3.19
79.0 / Minstr (257,950 blocks) / prost-derive-0.9.0
77.2 / Minstr (185,937 blocks) / diesel_derives-1.4.1
77.2 / Minstr (178,079 blocks) / structopt-derive-0.4.18 **  macro parsing
77.1 / Minstr (77,488 blocks) / hex-0.4.3 **                 match_impl

Macro parsing dominates here, again. match_impl also shows up.

Hottest program points (PPs) by allocation rate (bytes). Excludes tiny crates dominated by metadata decoding, which all have a bytes value in the range 0.9-2.0MB, mostly around 1.4MB. The rightmost column indicates the hot allocation causes.

Ones marked with ** are not in the top 20 Cachegrind list above.

33,740.63 / Minstr (720,052,608 bytes) / async-std-1.10.0
33,375.40 / Minstr (124,055,232 bytes) / time-macros-0.2.3
28,855.35 / Minstr (8,407,296 bytes) / secrecy-0.8.0 **     ena snapshot vecs 
28,620.20 / Minstr (12,557,016 bytes) / aes-gcm-0.9.4 **    pred oblig. hashmaps
27,516.50 / Minstr (11,782,896 bytes) / pbkdf2-0.10.0 **    vtbl_impl
27,398.95 / Minstr (8,388,592 bytes) / deunicode-1.3.1 **   escape_default
24,645.81 / Minstr (3,985,120 bytes) / web-sys-0.3.56       parsing/AST stuff
22,248.21 / Minstr (9,761,328 bytes) / aes-gcm-0.9.4
22,085.03 / Minstr (471,312,600 bytes) / async-std-1.10.0
21,752.66 / Minstr (9,314,748 bytes) / pbkdf2-0.10.0 **     pred obligs
18,237.79 / Minstr (15,495,584 bytes) / unicode_categories-0.1.1 **  thir mirror_expr_inner
17,878.73 / Minstr (85,887,776 bytes) / pest-2.1.3 **       thir mirror_expr_inner
16,525.18 / Minstr (16,955,904 bytes) / yansi-0.5.0
16,447.43 / Minstr (5,035,623 bytes) / deunicode-1.3.1     escape_default
16,080.75 / Minstr (343,176,384 bytes) / async-std-1.10.0
15,484.04 / Minstr (57,553,672 bytes) / time-macros-0.2.3
14,426.90 / Minstr (307,881,792 bytes) / async-std-1.10.0
13,991.95 / Minstr (4,283,840 bytes) / deunicode-1.3.1     escape_default
13,621.47 / Minstr (50,101,664 bytes) / vte-0.10.1         BitSets
13,568.19 / Minstr (135,067,712 bytes) / bitmaps-3.1.0 **  match_impl
13,259.72 / Minstr (13,605,328 bytes) / yansi-0.5.0
13,140.27 / Minstr (79,387,384 bytes) / http-0.2.6         BitSets
13,140.27 / Minstr (79,387,384 bytes) / http-0.2.6         BitSets
12,873.49 / Minstr (6,820,112 bytes) / c2-chacha-0.3.3     ModChild

A much wider range of results here.

Hottest program points (PPs) by peak memory usage. The rightmost column indicates the hot allocation causes.

33.10% (3,985,120 bytes) / web-sys-0.3.56      Vec<TreeAndSpacing> in TokenStreamBuilder
32.29% (79,387,384 bytes) / http-0.2.6         BitSets
22.30% (25,059,520 bytes) / vte-0.10.1         BitSets
21.30% (6,422,572 bytes) / unicode_categories-0.1.1 LitToConstInput
20.95% (4,182,024 bytes) / rand_chacha-0.3.1   NameBinding, NameResolution, BindingKey
20.94% (51,517,192 bytes) / http-0.2.6         BitSets
20.94% (4,182,024 bytes) / c2-chacha-0.3.3     NameBinding, NameResolution, BindingKey
18.95% (17,086,432 bytes) / language-tags-0.3.2 BitSets
18.95% (17,086,168 bytes) / language-tags-0.3.2 BitSets
18.89% (21,230,008 bytes) / vte-0.10.1         BitSets
18.02% (5,035,623 bytes) / deunicode-1.3.1     Symbol, LitKind, encode_metadata_impl
15.76% (3,145,728 bytes) / rand_chacha-0.3.1   NameBinding, NameResolution, BindingKey
14.07% (5,609,280 bytes) / c2-chacha-0.3.3     NameBinding, NameResolution, BindingKey
13.30% (2,097,152 bytes) / keccak-0.1.0        ProjectionElem, PredicateInner
13.28% (8,832,256 bytes) / pest-2.1.3          as_operand, Expr
12.95% (2,042,880 bytes) / keccak-0.1.0        ProjectionElem, PredicateInner
12.89% (2,860,032 bytes) / serde_qs-0.8.5      DroplessArena
12.78% (8,388,600 bytes) / unic-ucd-segment-0.9.0   encode_metadata_impl
12.57% (2,097,152 bytes) / deunicode-1.3.1     Symbol, LitKind, encode_metadata_impl
12.14% (2,451,456 bytes) / actix-tls-3.0.1     DroplessArena
11.99% (41,901,440 bytes) / redis-0.21.5       obligation_forest::Node
11.95% (4,761,040 bytes) / rand_chacha-0.3.1   NameBinding, NameResolution, BindingKey
11.91% (7,922,432 bytes) / pest-2.1.3          as_operand, Expr
11.75% (8,637,520 bytes) / tinyvec-1.5.1       BitSet
11.75% (8,637,312 bytes) / tinyvec-1.5.1       BitSet
10.72% (2,451,456 bytes) / stdweb-derive-0.5.3 DroplessArena
10.50% (2,097,152 bytes) / c2-chacha-0.3.3     NameBinding, NameResolution, BindingKey
10.21% (5,968,560 bytes) / aes-0.7.5           ModChild

BitSets are again common. DroplessArena ones are difficult to action because that covers many different types. Otherwise, fairly spread out.

Highest peak memory usage, absolute.

535,773,520 bytes / vsdb-0.13.10
350,086,000 bytes / lsp-types-0.91.1
312,125,839 bytes / nalgebra-0.30.1
295,757,875 bytes / diesel-1.4.8
245,575,943 bytes / http-0.2.6
224,398,810 bytes / combine-4.6.3
199,890,087 bytes / nix-0.23.1
195,042,951 bytes / gimli-0.26.1
185,666,858 bytes / rusoto_s3-0.47.0
179,543,658 bytes / object-0.28.3
174,479,216 bytes / proptest-1.0.0
169,319,484 bytes / wast-39.0.0
168,982,525 bytes / tendermint-proto-0.24.0-pre.1
161,741,679 bytes / goblin-0.4.3
148,107,835 bytes / h2-0.3.11

Other than http-0.2.6, which is dominated by BitSets, these are not all that interesting. No particularly hot allocations sites, a pretty similar mix, with higher ones tending to be DroplessArena, mir CFG building, metadata encoding, etc.

round-4-line-counts

Biggest crates.

web-sys-0.3.56.txt :  155935 lines of rust
regex-syntax-0.6.25.txt :  45482 lines of rust
tokio-1.16.1.txt :  44900 lines of rust
nalgebra-0.30.1.txt :  43215 lines of rust
gimli-0.26.1.txt :  38441 lines of rust
unicode-normalization-0.1.19.txt :  27132 lines of rust
curve25519-dalek-4.0.0-pre.1.txt :  26478 lines of rust
object-0.28.3.txt :  26403 lines of rust
diesel-1.4.8.txt :  25390 lines of rust
ndarray-0.15.4.txt :  24502 lines of rust
rustls-0.20.2.txt :  23704 lines of rust
nix-0.23.1.txt :  22779 lines of rust
rusoto_s3-0.47.0.txt :  20968 lines of rust
trust-dns-proto-0.21.0-alpha.4.txt :  19930 lines of rust
image-0.23.14.txt :  19866 lines of rust
petgraph-0.6.0.txt :  19283 lines of rust
git2-0.13.25.txt :  18990 lines of rust
vsdbsled-0.34.7-patched.txt :  18900 lines of rust
bitvec-1.0.0.txt :  18437 lines of rust
actix-web-4.0.0-beta.21.txt :  17388 lines of rust

Smallest crates.

fuchsia-cprng-0.1.1.txt :  40 lines of rust
cranelift-codegen-shared-0.80.0.txt :  39 lines of rust
num-0.4.0.txt :  38 lines of rust
headers-core-0.2.0.txt :  38 lines of rust
waker-fn-1.1.0.txt :  36 lines of rust
thread-id-4.0.0.txt :  36 lines of rust
foreign-types-shared-0.3.0.txt :  36 lines of rust
darling_macro-0.13.1.txt :  34 lines of rust
new_debug_unreachable-1.0.4.txt :  29 lines of rust
opaque-debug-0.3.0.txt :  25 lines of rust
tinyvec_macros-0.1.0.txt :  22 lines of rust
byte-tools-0.3.1.txt :  21 lines of rust
precomputed-hash-0.1.1.txt :  13 lines of rust
winapi-build-0.1.1.txt :  12 lines of rust
string_cache_shared-0.3.0.txt :  10 lines of rust
enum-iterator-0.7.0.txt :  10 lines of rust
typeable-0.1.2.txt :  8 lines of rust
stream-cipher-0.99.99.txt :  3 lines of rust
jsonrpc-core-client-18.0.0.txt :  3 lines of rust
block-cipher-0.99.99.txt :  3 lines of rust

Not much to analyze here.

round-5-cachegrind-debug

This is similar to round-1-cachegrind-check, but with additional LLVM costs, which aren't very interesting to analyze here.

round-6-llvm-lines-project

This gave very similar results to round-2-llvm-lines-leaf-crate, so I haven't analyzed it.

round-7-cargo-timing-check-j1

The use of -j1 forces codegen to be non-parallel, which makes these results non-representative. See the -j8 results instead.

round-8-cargo-timing-debug-j1

The use of -j1 force codegen to be non-parallel, which makes these results non-representative. See the -j8 results instead.

round-9-cargo-timing-opt-j1

The use of -j1 force codegen to be non-parallel, which makes these results non-representative. See the -j8 results instead.

round-10-cachegrind-opt

This is similar to round-1-cachegrind-check, but with additional LLVM costs, which aren't very interesting to analyze here.

round-13-cargo-timing-opt-j8

Most expensive crates. The counts are seconds of compile time, e.g. syn accounted for 643.5 seconds of compile time, which is 8.1% of the total. (There is of course overlap in crate compilation, so this doesn't say much about the critical path.)

7932.5 counts (weighted fractional, erased)
(  1)    643.5 ( 8.1%,  8.1%): syn v1.0.86
(  2)    235.4 ( 3.0%, 11.1%): serde v1.0.136
(  3)    202.1 ( 2.5%, 13.6%): tokio v1.16.1
(  4)    182.9 ( 2.3%, 15.9%): regex-syntax v0.6.25
(  5)    172.3 ( 2.2%, 18.1%): libc v0.2.116
(  6)    155.9 ( 2.0%, 20.1%): regex v1.5.4
(  7)    151.7 ( 1.9%, 22.0%): proc-macro2 v1.0.36
(  8)    138.8 ( 1.7%, 23.7%): serde_derive v1.0.136
(  9)    130.0 ( 1.6%, 25.4%): memchr v2.4.1
( 10)     99.6 ( 1.3%, 26.6%): libc v0.2.116 build script
( 11)     95.6 ( 1.2%, 27.8%): proc-macro2 v1.0.36 build script
( 12)     92.2 ( 1.2%, 29.0%): quote v1.0.15
( 13)     86.4 ( 1.1%, 30.1%): syn v1.0.86 build script
( 14)     78.3 ( 1.0%, 31.1%): http v0.2.6
( 15)     77.2 ( 1.0%, 32.0%): futures-util v0.3.19
( 16)     69.3 ( 0.9%, 32.9%): aho-corasick v0.7.18
( 17)     63.5 ( 0.8%, 33.7%): serde_json v1.0.78
( 18)     54.3 ( 0.7%, 34.4%): bytes v1.1.0
( 19)     50.7 ( 0.6%, 35.0%): h2 v0.3.11
( 20)     49.4 ( 0.6%, 35.7%): autocfg v1.0.1
( 21)     49.1 ( 0.6%, 36.3%): cc v1.0.72
( 22)     49.1 ( 0.6%, 36.9%): thiserror-impl v1.0.30
( 23)     49.0 ( 0.6%, 37.5%): log v0.4.14
( 24)     48.7 ( 0.6%, 38.1%): hyper v0.14.16
( 25)     48.1 ( 0.6%, 38.7%): num_cpus v1.13.1
( 26)     47.9 ( 0.6%, 39.3%): mio v0.7.14
( 27)     45.1 ( 0.6%, 39.9%): unicode-bidi v0.3.7
( 28)     42.1 ( 0.5%, 40.4%): log v0.4.14 build script
( 29)     42.0 ( 0.5%, 41.0%): unicode-xid v0.2.2
( 30)     40.9 ( 0.5%, 41.5%): num-traits v0.2.14

syn/quote/proc-macro2 (and their build scripts) are the most frequent.

Very surprising to see so many build scripts in there! Definitely worth investigation.

Some analysis of build script use-cases (areas where declaratively supporting the feature in cargo would remove the need for the script):

setting conditional compilation flags depending on the compiler version, handling MSRV:
- syn
- proc-macro2
- libc
- serde
setting conditional compilation flags depending on the target:
- log. This seems more of a convenience than something impossible without a script though: the crate could likely contain cfg expressions matching the same targets (doing so would remove this node from 120 crates' dependency graph in the dataset) [https://github.com/rust-lang/log/issues/489]
- proc-macro2: e.g. for the wasm target
- libc: e.g. for the FreeBSD target versions
- memchr: e.g. for SIMD
- serde: e.g. for wasm/asm.js, and architectures where libstd supports atomics
- futures-core: e.g. for targets without atomic CAS ops
parsing and checking other environment variables (although one could see the target use-case above as parsing the TARGET env var):
- proc-macro2 also checks the DOCS_RS env var, likely to control and improve rustdoc output on docs.rs
- libc for CI to deny warnings, to check if it's a dependency of libstd, and to access cargo feature flags (which are probably equivalent to using cfg! expressions in the build script)
setting conditional compilation flags derived from other feature flags (e.g. in proc-macro2)

TODO: also investigate the build script compile times. Some of these scripts are simple (but use various parts of libstd), but compile slowly (e.g. syn's build script compiles in >400ms in 150 crates). We need to look into that: whether it's because of opt levels or else; maybe some simple scripts could be interpreted.

Most popular crates, i.e. how often they are dependencies for other crates.

10657 counts:
(  1)      224 ( 2.1%,  2.1%): libc v0.2.116 build script (run)
(  2)      224 ( 2.1%,  4.2%): libc v0.2.116
(  3)      223 ( 2.1%,  6.3%): cfg-if v1.0.0
(  4)      215 ( 2.0%,  8.3%): libc v0.2.116 build script
(  5)      200 ( 1.9%, 10.2%): unicode-xid v0.2.2
(  6)      199 ( 1.9%, 12.1%): proc-macro2 v1.0.36
(  7)      199 ( 1.9%, 13.9%): quote v1.0.15
(  8)      199 ( 1.9%, 15.8%): proc-macro2 v1.0.36 build script (run)
(  9)      197 ( 1.8%, 17.6%): proc-macro2 v1.0.36 build script
( 10)      193 ( 1.8%, 19.5%): syn v1.0.86 build script (run)
( 11)      193 ( 1.8%, 21.3%): syn v1.0.86
( 12)      191 ( 1.8%, 23.1%): syn v1.0.86 build script
( 13)      122 ( 1.1%, 24.2%): log v0.4.14 build script (run)
( 14)      122 ( 1.1%, 25.3%): log v0.4.14
( 15)      120 ( 1.1%, 26.5%): log v0.4.14 build script
( 16)      103 ( 1.0%, 27.4%): memchr v2.4.1
( 17)      103 ( 1.0%, 28.4%): memchr v2.4.1 build script (run)
( 18)      102 ( 1.0%, 29.4%): lazy_static v1.4.0
( 19)      101 ( 0.9%, 30.3%): memchr v2.4.1 build script
( 20)       88 ( 0.8%, 31.1%): autocfg v1.0.1
( 21)       76 ( 0.7%, 31.8%): serde v1.0.136
( 22)       76 ( 0.7%, 32.6%): serde v1.0.136 build script (run)
( 23)       74 ( 0.7%, 33.3%): serde v1.0.136 build script
( 24)       72 ( 0.7%, 33.9%): version_check v0.9.4
( 25)       63 ( 0.6%, 34.5%): pin-project-lite v0.2.8
( 26)       62 ( 0.6%, 35.1%): futures-core v0.3.19 build script
( 27)       62 ( 0.6%, 35.7%): futures-core v0.3.19
( 28)       62 ( 0.6%, 36.3%): futures-core v0.3.19 build script (run)
( 29)       60 ( 0.6%, 36.8%): once_cell v1.9.0
( 30)       58 ( 0.5%, 37.4%): fnv v1.0.7

libc, cfg-if, unicode-xid, and syn/quote/proc-macro2/unicode-xid are the most popular.

The biggest projects, i.e. most crates compiled.

jsonrpc-client-transports-18.0.0: 184
actix-web-4.0.0-beta.21: 178
sentry-0.24.2: 149
rusoto_s3-0.47.0: 144
awc-3.0.0-beta.19: 141
warp-0.3.2: 129
tonic-0.6.2: 112
actix-connect-2.0.0: 110
rusoto_signature-0.47.0: 108
tera-1.15.0: 107
reqwest-0.11.9: 105
actix-http-3.0.0-beta.19: 100
tokio-postgres-0.7.5: 93
vsdb-0.13.10: 90
rusoto_credential-0.47.0: 88
glutin-0.28.0: 87
criterion-0.3.5: 82
jsonrpc-core-client-18.0.0: 81
trust-dns-resolver-0.21.0-alpha.4: 79
hyper-rustls-0.23.0: 79
tokio-tungstenite-0.16.1: 75
rustc-ap-rustc_data_structures-727.0.0: 75
log4rs-1.0.0: 73
ammonia-3.1.3: 73
hyper-tls-0.5.0: 72
jsonrpc-server-utils-18.0.0: 68
jsonrpc-pubsub-18.0.0: 68
trust-dns-proto-0.21.0-alpha.4: 66
tracing-opentelemetry-0.16.0: 65
sentry-backtrace-0.24.2: 63

219 out of 777 projects contain a single crate, i.e. zero dependencies.

Observations just from looking at some timings graphs.

The hyper crates depends on the h2 crate, but doesn't start building until hyper is fully compiled, rather than when hyper's metadata is emitted before codegen'. Is this necessary? E.g. in warp-0.3.2. [lqd, hyper:#2770, complete]
Likewise for everything that depends on syn, e.g. in actix-connect-2.0.0
Some build scripts that compile C code are very slow to run, e.g. zstd-sys build script (run) in awc-3.0.0-beta.19. Can we do better with them? Prioritizing some of them earlier in the pipeline could help, thanks to increased parallelism. The same thing used to happen on servo but I've also seen it on crates depending on openssl, and is tracked in this cargo issue. Note: although, native library builds can also compete for tokens and build in parallel, and moving those earlier can in turn make them build slower because of higher contention and less resources.

round-11-cargo-timing-check-j8

Most expensive crates, same idea as for round-13.

5196.0 counts (weighted fractional, erased)
(  1)    491.5 ( 9.5%,  9.5%): syn v1.0.86
(  2)    179.7 ( 3.5%, 12.9%): serde v1.0.136 lib (check)
(  3)    154.6 ( 3.0%, 15.9%): serde_derive v1.0.136
(  4)    132.4 ( 2.5%, 18.4%): libc v0.2.116 lib (check)
(  5)    110.8 ( 2.1%, 20.6%): libc v0.2.116 build script
(  6)    110.6 ( 2.1%, 22.7%): proc-macro2 v1.0.36
(  7)    109.5 ( 2.1%, 24.8%): proc-macro2 v1.0.36 build script
(  8)    100.0 ( 1.9%, 26.7%): syn v1.0.86 lib (check)
(  9)     98.2 ( 1.9%, 28.6%): syn v1.0.86 build script
( 10)     82.7 ( 1.6%, 30.2%): tokio v1.16.1 lib (check)
( 11)     64.2 ( 1.2%, 31.5%): futures-util v0.3.19 lib (check)
( 12)     63.5 ( 1.2%, 32.7%): quote v1.0.15
( 13)     57.0 ( 1.1%, 33.8%): autocfg v1.0.1
( 14)     52.2 ( 1.0%, 34.8%): thiserror-impl v1.0.30
( 15)     47.8 ( 0.9%, 35.7%): regex-syntax v0.6.25 lib (check)
( 16)     47.3 ( 0.9%, 36.6%): cc v1.0.72
( 17)     43.9 ( 0.8%, 37.4%): log v0.4.14 build script
( 18)     43.0 ( 0.8%, 38.3%): memchr v2.4.1 lib (check)
( 19)     42.5 ( 0.8%, 39.1%): memchr v2.4.1 build script
( 20)     38.9 ( 0.7%, 39.8%): version_check v0.9.4
( 21)     36.8 ( 0.7%, 40.6%): serde v1.0.136 build script
( 22)     36.7 ( 0.7%, 41.3%): typenum v1.15.0 build script
( 23)     36.0 ( 0.7%, 42.0%): http v0.2.6 lib (check)
( 24)     33.9 ( 0.7%, 42.6%): typenum v1.15.0 lib (check)
( 25)     32.3 ( 0.6%, 43.2%): zstd-sys v1.6.2+zstd.1.5.1 build script (run)
( 26)     31.3 ( 0.6%, 43.8%): jemalloc-sys v0.3.2 build script (run)
( 27)     31.3 ( 0.6%, 44.4%): unicode-xid v0.2.2
( 28)     31.1 ( 0.6%, 45.0%): cfg-if v1.0.0 lib (check)
( 29)     28.8 ( 0.6%, 45.6%): num-traits v0.2.14 lib (check)
( 30)     26.7 ( 0.5%, 46.1%): derive_more v0.99.17

Reasonably similar results to round-13.

round-12-cargo-timing-debug-j8

Most expensive crates, same idea as for round-13.

6451.3 counts (weighted fractional, erased)
(  1)    663.9 (10.3%, 10.3%): syn v1.0.86 
(  2)    222.7 ( 3.5%, 13.7%): serde v1.0.136
(  3)    154.1 ( 2.4%, 16.1%): serde_derive v1.0.136
(  4)    153.5 ( 2.4%, 18.5%): proc-macro2 v1.0.36
(  5)    144.6 ( 2.2%, 20.8%): libc v0.2.116
(  6)    135.1 ( 2.1%, 22.8%): tokio v1.16.1
(  7)    112.6 ( 1.7%, 24.6%): libc v0.2.116 build script
(  8)    108.7 ( 1.7%, 26.3%): proc-macro2 v1.0.36 build script
(  9)     97.6 ( 1.5%, 27.8%): syn v1.0.86 build script
( 10)     91.9 ( 1.4%, 29.2%): regex-syntax v0.6.25
( 11)     89.3 ( 1.4%, 30.6%): quote v1.0.15
( 12)     76.2 ( 1.2%, 31.8%): memchr v2.4.1
( 13)     73.9 ( 1.1%, 32.9%): futures-util v0.3.19
( 14)     62.4 ( 1.0%, 33.9%): regex v1.5.4
( 15)     58.4 ( 0.9%, 34.8%): http v0.2.6
( 16)     58.1 ( 0.9%, 35.7%): autocfg v1.0.1
( 17)     54.7 ( 0.8%, 36.5%): thiserror-impl v1.0.30
( 18)     52.1 ( 0.8%, 37.4%): cc v1.0.72
( 19)     43.6 ( 0.7%, 38.0%): log v0.4.14 build script
( 20)     42.5 ( 0.7%, 38.7%): memchr v2.4.1 build script
( 21)     41.8 ( 0.6%, 39.3%): unicode-xid v0.2.2
( 22)     40.9 ( 0.6%, 40.0%): log v0.4.14
( 23)     39.3 ( 0.6%, 40.6%): bytes v1.1.0
( 24)     38.7 ( 0.6%, 41.2%): serde_json v1.0.78
( 25)     38.6 ( 0.6%, 41.8%): version_check v0.9.4
( 26)     38.0 ( 0.6%, 42.4%): hyper v0.14.16
( 27)     37.9 ( 0.6%, 43.0%): serde v1.0.136 build script
( 28)     36.6 ( 0.6%, 43.5%): typenum v1.15.0 build script
( 29)     35.2 ( 0.5%, 44.1%): zstd-sys v1.6.2+zstd.1.5.1 build script (run)
( 30)     34.7 ( 0.5%, 44.6%): typenum v1.15.0

Reasonably similar results to round-13.

round-14-self-profile-check

The heaviest relative queries seen. (More data here). The expand_crate ones have some correlation with the hot macro parsing results seen with Cachegrind and DHAT.

expand_crate             rel 83.73%, abs  44.35ms  web-sys-0.3.56
expand_crate             rel 70.53%, abs    3.62s  async-std-1.10.0
metadata_register_crate  rel 60.64%, abs  15.26ms  jsonrpc-core-client-18.0.0
metadata_register_crate  rel 59.59%, abs  24.52ms  impl-codec-0.5.1
expand_crate             rel 54.35%, abs 155.29ms  yansi-0.5.0
typeck                   rel 53.87%, abs    1.41s  redis-0.21.5
typeck                   rel 52.41%, abs  94.35ms  keccak-0.1.0
specialization_graph_of  rel 51.22%, abs   13.41s  nalgebra-0.30.1
expand_crate             rel 50.40%, abs   6.93ms  opaque-debug-0.3.0
expand_crate             rel 49.74%, abs 556.79ms  time-macros-0.2.3
expand_crate             rel 49.62%, abs  59.15ms  enum-iterator-derive-0.7.0
expand_crate             rel 49.29%, abs 350.36ms  num-derive-0.3.3
expand_crate             rel 49.10%, abs   9.20ms  static_assertions-1.1.0
expand_crate             rel 48.98%, abs    1.89s  js-sys-0.3.56
expand_crate             rel 48.64%, abs  10.33ms  mac-0.1.1
expand_crate             rel 48.61%, abs   6.49ms  matches-0.1.9
expand_crate             rel 48.54%, abs 114.46ms  ctor-0.1.21
expand_crate             rel 48.47%, abs  14.52ms  fixed-hash-0.7.0
expand_crate             rel 48.40%, abs   6.16ms  pin-utils-0.1.0
expand_crate             rel 48.34%, abs   6.21ms  tinyvec_macros-0.1.0
expand_crate             rel 48.23%, abs  10.45ms  crossbeam-0.8.1
expand_crate             rel 47.68%, abs   7.28ms  cpufeatures-0.2.1
expand_crate             rel 46.52%, abs 240.79ms  pest_generator-2.1.3
expand_crate             rel 46.31%, abs  18.75ms  term_size-1.0.0-beta1
expand_crate             rel 45.75%, abs   8.37ms  miow-0.4.0
typeck                   rel 45.58%, abs 479.33ms  vte-0.10.1
expand_crate             rel 44.70%, abs   8.46ms  wincolor-1.0.3
expand_crate             rel 44.51%, abs  65.65ms  enum-as-inner-0.3.3
expand_crate             rel 44.50%, abs 490.77ms  pear-0.2.3
expand_crate             rel 44.43%, abs   8.63ms  winapi-util-0.1.5

Slowest passes overall, weighted by percentages.

77399.4 counts (weighted fractional, erased)
(  1)  13800.9 (17.8%, 17.8%): typeck
(  2)  13736.9 (17.7%, 35.6%): expand_crate
(  3)   7133.2 ( 9.2%, 44.8%): mir_borrowck
(  4)   2454.5 ( 3.2%, 48.0%): evaluate_obligation
(  5)   2191.4 ( 2.8%, 50.8%): free_global_ctxt
(  6)   2120.5 ( 2.7%, 53.5%): metadata_register_crate
(  7)   2114.5 ( 2.7%, 56.3%): metadata_decode_entry_impl_trait_ref
(  8)   2072.1 ( 2.7%, 58.9%): hir_lowering
(  9)   1733.5 ( 2.2%, 61.2%): mir_built
( 10)   1692.4 ( 2.2%, 63.4%): specialization_graph_of
( 11)   1549.9 ( 2.0%, 65.4%): late_resolve_crate
( 12)   1473.5 ( 1.9%, 67.3%): parse_crate
( 13)   1214.3 ( 1.6%, 68.8%): type_op_prove_predicate
( 14)    979.1 ( 1.3%, 70.1%): check_impl_item_well_formed
( 15)    905.9 ( 1.2%, 71.3%): check_item_well_formed
( 16)    822.8 ( 1.1%, 72.3%): param_env
( 17)    667.8 ( 0.9%, 73.2%): generate_crate_metadata
( 18)    655.9 ( 0.8%, 74.1%): thir_body
( 19)    615.8 ( 0.8%, 74.9%): check_mod_item_types
( 20)    580.2 ( 0.7%, 75.6%): metadata_decode_entry_type_of

round-15-self-profile-debug

The heaviest relative queries seen.

expand_crate                  rel 73.81%, abs  40.93ms  web-sys-0.3.56-Debug-Full.txt
run_linker                    rel 68.10%, abs 152.87ms  block-cipher-0.99.99-Debug-Full.txt
run_linker                    rel 67.83%, abs 141.60ms  stream-cipher-0.99.99-Debug-Full.txt
run_linker                    rel 63.27%, abs 856.92ms  wasm-bindgen-macro-0.2.79-Debug-Full.txt
run_linker                    rel 58.74%, abs 713.84ms  pest_derive-2.1.0-Debug-Full.txt
run_linker                    rel 55.58%, abs 898.85ms  darling_macro-0.13.1-Debug-Full.txt
metadata_register_crate       rel 54.79%, abs  23.41ms  jsonrpc-core-client-18.0.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 54.76%, abs 423.21ms  rpassword-5.0.1-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 54.30%, abs 405.03ms  color_quant-1.1.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 53.62%, abs 514.55ms  predicates-tree-1.0.5-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 53.24%, abs 497.02ms  slog-scope-4.4.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 53.21%, abs 579.14ms  dirs-sys-next-0.1.2-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 52.92%, abs 556.50ms  diff-0.1.12-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 52.65%, abs 603.78ms  pem-1.0.2-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 52.47%, abs 488.49ms  log-mdc-0.1.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 52.39%, abs 375.82ms  heck-0.4.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 52.00%, abs 222.41ms  shlex-1.1.0-Debug-Full.txt
run_linker                    rel 51.17%, abs 993.63ms  pyo3-macros-0.15.1-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 51.12%, abs 588.33ms  jobserver-0.1.24-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 50.60%, abs 612.43ms  strsim-0.10.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 50.43%, abs 615.94ms  convert_case-0.5.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 50.33%, abs 410.43ms  tokio-tcp-0.2.0-alpha.1-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 50.13%, abs 591.25ms  dotenv-0.15.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 50.11%, abs 643.20ms  simplelog-0.11.2-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 49.82%, abs 451.52ms  polling-2.2.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 49.72%, abs 611.30ms  threadpool-1.8.1-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 49.64%, abs 233.63ms  shell-escape-0.1.5-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 49.51%, abs 527.52ms  proc-macro-crate-1.1.0-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 49.48%, abs 645.35ms  stringprep-0.1.2-Debug-Full.txt
LLVM_module_codegen_emit_obj  rel 49.48%, abs 571.53ms  futures-timer-3.0.2-Debug-Full.txt

Slowest passes overall, weighted by percentages.

77694.1 counts (weighted fractional, erased)
(  1)  21595.5 (27.8%, 27.8%): LLVM_module_codegen_emit_obj
(  2)   8452.2 (10.9%, 38.7%): LLVM_passes
(  3)   5659.6 ( 7.3%, 46.0%): expand_crate
(  4)   4990.5 ( 6.4%, 52.4%): typeck
(  5)   4766.1 ( 6.1%, 58.5%): codegen_module
(  6)   2562.4 ( 3.3%, 61.8%): mir_borrowck
(  7)   1740.9 ( 2.2%, 64.1%): finish_ongoing_codegen
(  8)   1651.1 ( 2.1%, 66.2%): run_linker
(  9)   1407.6 ( 1.8%, 68.0%): LLVM_module_optimize
( 10)   1232.0 ( 1.6%, 69.6%): free_global_ctxt
( 11)   1130.4 ( 1.5%, 71.0%): LLVM_module_codegen
( 12)   1113.2 ( 1.4%, 72.5%): evaluate_obligation
( 13)   1057.2 ( 1.4%, 73.8%): metadata_register_crate
( 14)    894.0 ( 1.2%, 75.0%): metadata_decode_entry_impl_trait_ref
( 15)    870.6 ( 1.1%, 76.1%): hir_lowering
( 16)    816.2 ( 1.1%, 77.1%): mir_drops_elaborated_and_const_checked
( 17)    794.9 ( 1.0%, 78.2%): specialization_graph_of
( 18)    720.7 ( 0.9%, 79.1%): parse_crate
( 19)    676.3 ( 0.9%, 80.0%): optimized_mir
( 20)    643.4 ( 0.8%, 80.8%): mir_built

round-16-self-profile-opt

The heaviest relative queries seen.

expand_crate             rel 65.50%, abs  43.94ms  web-sys-0.3.56-Opt-Full.txt
run_linker               rel 62.97%, abs 167.55ms  block-cipher-0.99.99-Opt-Full.txt
run_linker               rel 56.23%, abs 147.61ms  stream-cipher-0.99.99-Opt-Full.txt
specialization_graph_of  rel 49.22%, abs   13.32s  nalgebra-0.30.1-Opt-Full.txt
LLVM_module_optimize     rel 41.94%, abs 994.78ms  rustc-demangle-0.1.21-Opt-Full.txt
metadata_register_crate  rel 41.40%, abs  21.23ms  impl-codec-0.5.1-Opt-Full.txt
metadata_register_crate  rel 39.58%, abs  16.02ms  jsonrpc-core-client-18.0.0-Opt-Full.txt
LLVM_module_optimize     rel 38.36%, abs 459.10ms  rpassword-5.0.1-Opt-Full.txt
LLVM_module_optimize     rel 38.27%, abs 662.36ms  slog-scope-4.4.0-Opt-Full.txt
LLVM_module_optimize     rel 38.05%, abs    1.83s  async-process-1.3.0-Opt-Full.txt
LLVM_module_optimize     rel 37.84%, abs    1.26s  textwrap-0.14.2-Opt-Full.txt
LLVM_module_optimize     rel 36.96%, abs 614.37ms  serial_test-0.5.1-Opt-Full.txt
LLVM_module_optimize     rel 36.58%, abs 675.59ms  log-mdc-0.1.0-Opt-Full.txt
LLVM_module_optimize     rel 36.53%, abs 623.55ms  actix-threadpool-0.3.3-Opt-Full.txt
LLVM_module_optimize     rel 36.51%, abs 679.71ms  futures-executor-0.3.19-Opt-Full.txt
LLVM_module_optimize     rel 36.48%, abs 752.50ms  blocking-1.1.0-Opt-Full.txt
LLVM_module_optimize     rel 36.25%, abs    1.23s  version_check-0.9.4-Opt-Full.txt
LLVM_module_optimize     rel 36.25%, abs 764.37ms  dirs-sys-0.3.6-Opt-Full.txt
expand_crate             rel 36.21%, abs  10.49ms  crossbeam-0.8.1-Opt-Full.txt
LLVM_module_optimize     rel 36.12%, abs    2.41s  async-global-executor-2.0.2-Opt-Full.txt
LLVM_module_optimize     rel 35.89%, abs 586.79ms  tokio-udp-0.2.0-alpha.1-Opt-Full.txt
LLVM_module_optimize     rel 35.59%, abs 741.72ms  tokio-tcp-0.2.0-alpha.1-Opt-Full.txt
LLVM_module_optimize     rel 35.59%, abs    1.11s  rusty-fork-0.3.0-Opt-Full.txt
LLVM_module_optimize     rel 35.58%, abs 534.42ms  wasm-bindgen-futures-0.4.29-Opt-Full.txt
LLVM_module_optimize     rel 35.52%, abs    1.08s  os_info-3.1.0-Opt-Full.txt
LLVM_module_optimize     rel 35.45%, abs    1.25s  dotenv-0.15.0-Opt-Full.txt
LLVM_module_optimize     rel 35.37%, abs 267.58ms  crypto-hash-0.3.4-Opt-Full.txt
LLVM_module_optimize     rel 34.98%, abs 447.28ms  hyper-rustls-0.23.0-Opt-Full.txt
typeck                   rel 34.95%, abs 518.49ms  vte-0.10.1-Opt-Full.txt
LLVM_module_optimize     rel 34.93%, abs    1.57s  tokio-signal-0.3.0-alpha.1-Opt-Full.txt

Slowest passes overall, weighted by percentages.

77768.4 counts (weighted fractional, erased)
(  1)  13851.8 (17.8%, 17.8%): LLVM_module_optimize
(  2)  10520.5 (13.5%, 31.3%): LLVM_passes
(  3)   8220.3 (10.6%, 41.9%): LLVM_module_codegen_emit_obj
(  4)   8131.6 (10.5%, 52.4%): finish_ongoing_codegen
(  5)   7983.4 (10.3%, 62.6%): LLVM_lto_optimize
(  6)   3890.3 ( 5.0%, 67.6%): expand_crate
(  7)   3213.1 ( 4.1%, 71.8%): typeck
(  8)   1628.8 ( 2.1%, 73.9%): mir_borrowck
(  9)   1405.6 ( 1.8%, 75.7%): codegen_module
( 10)   1002.9 ( 1.3%, 77.0%): codegen_module_optimize
( 11)    936.9 ( 1.2%, 78.2%): LLVM_thin_lto_import
( 12)    894.2 ( 1.1%, 79.3%): free_global_ctxt
( 13)    802.3 ( 1.0%, 80.3%): evaluate_obligation
( 14)    768.9 ( 1.0%, 81.3%): metadata_register_crate
( 15)    604.3 ( 0.8%, 82.1%): metadata_decode_entry_impl_trait_ref
( 16)    601.3 ( 0.8%, 82.9%): codegen_module_perform_lto
( 17)    594.2 ( 0.8%, 83.6%): hir_lowering
( 18)    558.6 ( 0.7%, 84.4%): parse_crate
( 19)    544.0 ( 0.7%, 85.1%): specialization_graph_of
( 20)    512.3 ( 0.7%, 85.7%): mir_drops_elaborated_and_const_checked

round-17-time-passes-check

Executive summary

Crate expansion and type checking are the passes that increase memory usage the most.

-Ztime-passes gives both time and RSS (absolute and change) for each pass. Self-profiling covers time, so I'll just analyze the change in RSS for each stage. I don't entirely trust the RSS numbers produced by -Ztime-passes, the sometimes seem wonky, but here goes.

Weighted RSS changes. Note that the totals aren't that meaningful, it's about the percentages.

184746.0 counts (weighted fractional, erased)
(  1)  48259.0 (26.1%, 26.1%): total
(  2) -36664.0 (-19.8%,  6.3%): free_global_ctxt 
(  3)  33765.0 (18.3%, 24.6%): configure_and_expand
(  4)  29491.0 (16.0%, 40.5%): macro_expand_crate
(  5)  29448.0 (15.9%, 56.5%): expand_crate
(  6)  27825.0 (15.1%, 71.5%): type_check_crate
(  7)  11533.0 ( 6.2%, 77.8%): coherence_checking
(  8)   7046.0 ( 3.8%, 81.6%): item_bodies_checking 
(  9)   6986.0 ( 3.8%, 85.4%): MIR_borrow_checking
( 10)   4205.0 ( 2.3%, 87.6%): type_collecting
( 11)   3477.0 ( 1.9%, 89.5%): hir_lowering
( 12)   3060.0 ( 1.7%, 91.2%): wf_checking
( 13)   2704.0 ( 1.5%, 92.6%): resolve_crate
( 14)   2451.0 ( 1.3%, 94.0%): late_resolve_crate
( 15)   1770.0 ( 1.0%, 94.9%): parse_crate
( 16)   1751.0 ( 0.9%, 95.9%): item_types_checking
( 17)   1358.0 ( 0.7%, 96.6%): misc_checking_1
( 18)   1329.0 ( 0.7%, 97.3%): generate_crate_metadata
( 19)   1141.0 ( 0.6%, 97.9%): misc_checking_3
( 20)    830.0 ( 0.4%, 98.4%): lint_checking

I don't think the total number is meaningful. macro_expand_crate and expand_crate are almost always identical, not sure what to make of that, seems suspicious.

round-18-time-passes-debug

274400.0 counts (weighted fractional, erased)
(  1)  75783.0 (27.6%, 27.6%): total
(  2) -40315.0 (-14.7%, 12.9%): free_global_ctxt
(  3)  33988.0 (12.4%, 25.3%): configure_and_expand
(  4)  29827.0 (10.9%, 36.2%): macro_expand_crate
(  5)  29774.0 (10.9%, 47.0%): expand_crate
(  6)  28294.0 (10.3%, 57.3%): type_check_crate
(  7)  24573.0 ( 9.0%, 66.3%): codegen_crate
(  8)  23209.0 ( 8.5%, 74.8%): codegen_to_LLVM_IR
(  9)  11883.0 ( 4.3%, 79.1%): coherence_checking
( 10)   7118.0 ( 2.6%, 81.7%): item_bodies_checking
( 11)   7029.0 ( 2.6%, 84.2%): generate_crate_metadata
( 12)   7028.0 ( 2.6%, 86.8%): MIR_borrow_checking
( 13)   5132.0 ( 1.9%, 88.7%): monomorphization_collector_graph_walk
( 14)   4240.0 ( 1.5%, 90.2%): type_collecting
( 15)   3472.0 ( 1.3%, 91.5%): hir_lowering
( 16)   3114.0 ( 1.1%, 92.6%): wf_checking
( 17)   2732.0 ( 1.0%, 93.6%): resolve_crate
( 18)   2506.0 ( 0.9%, 94.5%): late_resolve_crate
( 19)   1731.0 ( 0.6%, 95.2%): item_types_checking
( 20)   1710.0 ( 0.6%, 95.8%): parse_crate

Numbers for front-end passes are similar to round-17, as expected. Codegen passes add some extra memory use, unsurprisingly.

round-19-time-passes-opt

34209.0 counts (weighted fractional, erased)
(  1) 111637.0 (25.7%, 25.7%): LLVM_lto_optimize(*-cgu.N)
(  2)  94078.0 (21.7%, 47.4%): total
(  3) -39819.0 (-9.2%, 38.2%): free_global_ctxt
(  4)  33939.0 ( 7.8%, 46.0%): configure_and_expand
(  5)  30996.0 ( 7.1%, 53.2%): codegen_crate
(  6)  29712.0 ( 6.8%, 60.0%): macro_expand_crate
(  7)  29634.0 ( 6.8%, 66.8%): expand_crate
(  8)  28418.0 ( 6.5%, 73.4%): type_check_crate
(  9)  24642.0 ( 5.7%, 79.0%): codegen_to_LLVM_IR
( 10)  17904.0 ( 4.1%, 83.2%): finish_ongoing_codegen
( 11)  15878.0 ( 3.7%, 86.8%): link
( 12)  12012.0 ( 2.8%, 89.6%): coherence_checking
( 13)   7087.0 ( 1.6%, 91.2%): item_bodies_checking
( 14)   6953.0 ( 1.6%, 92.8%): MIR_borrow_checking
( 15)   5477.0 ( 1.3%, 94.1%): monomorphization_collector_graph_walk
( 16)   4179.0 ( 1.0%, 95.1%): type_collecting
( 17)   3479.0 ( 0.8%, 95.9%): hir_lowering
( 18)   3115.0 ( 0.7%, 96.6%): wf_checking
( 19)   2737.0 ( 0.6%, 97.2%): resolve_crate
( 20)   2513.0 ( 0.6%, 97.8%): generate_crate_metadata

LLVM_lto_optimize is the most memory-hungry pass, in general.

Martin Gammelsæter

2022/03/17 10:17:54

tinyvec-1.5.1` dominated by `<rustc_mir_build::build::Builder>::diverge_cleanup

This crate contains a file with 10k lines of generated impls for arrays of sizes up to 4096 for users unable to use const generics, which I think is the source of this. (Edited)

Guest Wood2022/09/24 17:15:20

qd gathered a lot of data in the [`rustc-benchmarking-data`](https://github.com/lqd/rustc-benchmarking-data) repository. This document is nnethercote's analysis of it (with a few additional comments from others). It is long, detailed, and quite dry. It is aimed at Rust compiler developers, and not intended for a general audience. It is also not the highest quality prose, in part because it is likely to become out of date in the not too distant future as performance work addresses things this measurement and analysis has identified. S

Test (Edited)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Analysis of rustc-benchmarking-data

round-1-cachegrind-check

Hot functions in a single crate

Widely used functions

Hot functions in multiple crates

round-2-llvm-lines-leaf-crate

round-3-dhat

round-4-line-counts

round-5-cachegrind-debug

round-6-llvm-lines-project

round-7-cargo-timing-check-j1

round-8-cargo-timing-debug-j1

round-9-cargo-timing-opt-j1

round-10-cachegrind-opt

round-13-cargo-timing-opt-j8

round-11-cargo-timing-check-j8

round-12-cargo-timing-debug-j8

round-14-self-profile-check

round-15-self-profile-debug

round-16-self-profile-opt

round-17-time-passes-check

round-18-time-passes-debug

round-19-time-passes-opt

Analysis of `rustc-benchmarking-data`