---
tags: perf.rlo, rustc
---
# perf.rlo
## What are goals of meeting?
* Use meeting time itself to review site and/or establish goals?
* What action items can we establish?
* Who are the stake-holders?
* Do we think we have sufficient present here to assign action items usefully?
* Do we think we have sufficient present here to make decisions?
## Topics
* Who are the groups of people that consume this data and what do each of these groups want to see when visiting the site?
* Compiler team: curious as to the general trajectory of performance
* Those actively working on performance (perf team or motivated contributor)
* rustc dev tracking performance impact of their changes
* Curious non-dev person looking to see whether rustc is fast enough
* Curious person who uses Rust
* Who should establish the set of benchmarks?
* How often should the set change?
* How can we be confident this captures a reasonable slice of "representative" Rust programs?
* Do we need to gather expertise in Data Science
* Is there research-worthy material here? simulacrum thinks we are running up against open research problems (at least not with well known solutions)
* Changes/enhancements to discuss
* How to present the metrics we gather today
* Do the current defaults reflect what matters?
* Cannot currently visualize more than one metric at once.
* Should we change how the metrics are stored?
* What metrics to gather in the future
* Disk space usage is an obvious one we need
* pnkfelix thinks more stuff about cache-misses also needed (more real-world impact than instruction counts)
* (Note that perf counters have architectural limits)
* How do we track cross platform performance?
* What automation could be added, to ease existing workflows (e.g., triage report)?
* "Competitors" exist
* e.g. arewefastyet.rs
* should we embrace these (e.g. actively advertise them), leverage them as staging grounds for ideas?
* That is, ideas w.r.t what to benchmark, what metrics to gather, and how to present them.
# collected pain points
* Not clear which benchmarks matter. Am I supposed to weight all equally? (Does summary weight all equally?)
* noise, in particular on some benchmarks, is frequently present, and makes it easy to essentially ignore such benchmarks.
* the compare mode presentation, for each benchmark, has one "full" and then several incr variants. This might be leading to over-emphasis on incr performance.
* How do I tell what things are likely noise vs real gains and losses in performance
* too much data -- boiling data down to a view of "how is compiler doing". The dashboard helps here, but is it representative?
* regressions have unclear responsibility in terms of tracking down cause/fixing and deciding whether to revert. (is it the author? performance triager?)
* regressions during a rollup have unclear provenance
* hard to get a good overview of performance (of a particular diff) across many metric types (e.g.., wall time, cpu instructions, rss, etc.)
* who decides how much regression is "too much"?
* The detailed comparison view often does not correlate with summarized comparison view for small regressions/speedups.
* making decisions based on compare view often seems fairly arbitrary (eh, some minor regressions, some minor improvements, seems good enough :shrug:)
* We don't have disk space as metric. :)
* "well known" that some components are causing major increases in timing (e.g., simd in core/std), but unclear if we can say "stop adding intrinsics" or how to evaluate the tradeoff between features and performance. particularly performance of rustc developers vs end users, too
* The impact of cgu partitioning is not controlled for, while introducing systematic bias in measurements.
* overwhelming for newcomers on how to understand what they're looking at
* no measurement of runtime performance of generated code (kind of orthogonal, kind of not)
## Understanding the issue
With a focus on rustc devs who use perf.rlo to gauge the performance impact of their changes to the compiler, it's important to understand a typical workflow:
* Dev introduces change
* Dev or a reviewer kicks off a perf run to see what the impact is
* TODO FINSIH