Rust Compiler Team
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- tags: perf.rlo, rustc --- # Real World v Synthesis: Dawn of Benchmarks The main points pnkfelix wants to address today are: 1. why we include pathological benchmarks in the test suite. (It is definitely motivated. But its good to make sure everyone is on the same page about that point.) 2. How to categorize the existing set of benchmarks If you want to add a question/topic for the meeting, you can jump to that [section at the end](https://hackmd.io/kOkd1YSNRWesWYogxL4pGw?both#Meeting-QuestionsTopics). ## Background Text Some summary text adapted from zulip archives [categorize “real world” vs “pathological” benchmarks?][] and [T-compiler meeting 2021-12-16]: [categorize “real world” vs “pathological” benchmarks?]: https://zulip-archive.rust-lang.org/stream/247081-t-compiler/performance/topic/categorize.20.E2.80.9Creal.20world.E2.80.9D.20vs.20.E2.80.9Cpathological.E2.80.9D.20benchmarks.3F.html [T-compiler meeting 2021-12-16]: https://zulip-archive.rust-lang.org/stream/238009-t-compiler/meetings/topic/.5Bweekly.5D.202021-12-16.20.2354818.html#265175488 pnkfelix wants to make sure we avoid over-indexing our efforts based on pathological benchmarks that are solely meant to expose certain kinds of bad scenarios (by multiplying certain patterns of code by factors of 100x or 1000x). When such cases regress by 0.8%, its not nearly as concerning as it is when real world code like diesel or serde regress by 0.8%. Note from jyn514 and nagisa: pathological benchmarks that do a lot of a single thing aren't necessarily as pathological as it may seem – real code may do a lot of the same thing (many match arms, deeply nested types, large basic blocks) just as well. e.g. the reason deeply-nested was added was because there was no async-like benchmark before. wwiser refines the above by noting some of the pathological benchmarks are meant to catch exponential regressions. Ie externs used to be quadratic in run time or something. We want to avoid going back to that but a linear 10% regression wouldn't necessarily be the end of the world if it improved real world code and was actually only linear. ## Categorization There are many categorizations one could construct here. Our goal should probably be to have two categories, in the name of simplicity for people looking at [perf.rlo]. [perf.rlo]: https://perf.rust-lang.org/ But in for the *meta* conversation we are having now (and presumably that the performance investigations will continue having in the future), we should at least be aware of finer-grain distinctions. Here are some properties, potentially overlapping, that I think it may be useful to define: * *real-world*: directly reflects a case found in the wild. * *synthetic*: constructed (or derived) for the purposes of benchmarking. Often in service of *micro-benchmarking*, where its trying to zero in on one facet of the system. Has two main forms I'm aware of: * *distilled*: takes a real-world case and refines it to focus on the detail of interest. E.g. a benchmark that continues to act as an effective witness to some prior regression. (Thus this can have characteristics of real-world *and* synthesis...) * *toy*: a program that we do not expect to actually see in practice in the real world. (Note that these are still sometimes useful for *understanding*.) * *pathological*: takes some pattern of coding and multiplies it. (Note that the real-world can yield pathological scenarios.) This can be a method of distillation. But, it can also yield toys... * *important*: this labels benchmarks that represent code that is popularly used in the community or that matters to Rust stakeholders. We want to prioritize performance enhancements (and addressing regressions) for these benchmarks, and are potentially willing to take regressions elsewhere to achieve it. * *outdated*: no longer models real-world case of interest To be clear: pnkfelix is not suggesting that we present such fine-grain labels on [perf.rlo]. E.g., others have suggested a simple "primary"/"secondary" categorization, which is probably fine. But these fine-grain labels may be useful as metadata to *drive* the decisions about how benchmarks are categorized now and in the future. So, some concrete examples taken from [the zulip chat](https://zulip-archive.rust-lang.org/stream/247081-t-compiler/performance/topic/categorize.20.E2.80.9Creal.20world.E2.80.9D.20vs.20.E2.80.9Cpathological.E2.80.9D.20benchmarks.3F.html#265087311): * deep-vector just contains a vec!() literal with 135,000 0 elements, which is 100% artificial: *synthetic toy*, *pathological* * tuple-stress contains an array with 64k tuples of real geographic data, which was stripped out of a real program: *synthetic distilled*, *pathological* * inflate is a really old version of that crate that doesn't reflect the current version, and it's a weird example: *real-world*, *outdated* * deeply-nested was added was because there was no async-like benchmark before, heavily async code uses *lots* of nested types: *synthetic distilled*, *pathological*, *important* ## Once we have a classification, what should we do with it? [simple ideas](https://zulip-archive.rust-lang.org/stream/247081-t-compiler/performance/topic/categorize.20.E2.80.9Creal.20world.E2.80.9D.20vs.20.E2.80.9Cpathological.E2.80.9D.20benchmarks.3F.html#264969884) > I guess the dumb thing is to just split everything (tables, pages of graphs, etc.) into two halves: real first, then artificial pnkfelix wants to stress: unimportant toy benchmarks can be invaluable in *dissecting* a problem once it is identified. So continuing to have access to them seems useful. What is potentially *not* useful is letting them be an important part of day-to-day workflows: developers should not be stressing about small (or maybe any) regressions to toy benchmarks, and they should not be occupying time during weekly performance triage *unless* motivated by some other regression. ## Draft Classification [Contributed by nnethercote](https://zulip-archive.rust-lang.org/stream/247081-t-compiler/performance/topic/categorize.20.E2.80.9Creal.20world.E2.80.9D.20vs.20.E2.80.9Cpathological.E2.80.9D.20benchmarks.3F.html#265222970) My draft classification: - Real: cargo, clap-rs, cranelife-codegen, diesel, encoding, futures, html5ever, hyper-2, piston-image, regex, ripgrep, serde, stm32f4, style-servo, syn, tokio-webpush-simple, ucd, webrender, webrender-wrench - Maybe real: inflate, keccak, wg-grammar, unicode_normalization - Artificial: coercions, ctfe-stress-4, deeply-nested, deeply-nested-async, deeply-nested-closures, deep-vector, derive, externs, issue-*, many-assoc-items, match-stress-enum, match-stress-exhaustive_patterns, regression-31157, token-stream-stress, tuple-stress, unify-linearly, unused-warnings, wf-projection-stress-65510 # Appendices ## Hennesy and Patterson breakdown > [Hennessy & Patterson is always good for this sort of thing, too. Chapter 1 distinguishes: real applications, modified applications, kernels, toy benchmarks, and synthetic benchmarks](https://zulip-archive.rust-lang.org/stream/247081-t-compiler/performance/topic/categorize.20.E2.80.9Creal.20world.E2.80.9D.20vs.20.E2.80.9Cpathological.E2.80.9D.20benchmarks.3F.html#265254034) (if time permits, pnkfelix will transcribe relevant definitions from their copy of H&P) ## Breakdown from `nofib` suite > [For anyone who likes thinking about this topic, sections 2.2-2.4 of http://web.mit.edu/~ezyang/Public/10.1.1.53.4124.pdf may be of interest. It's a Haskell benchmarking suite divided into "real", "spectral", and "imaginary"](https://zulip-archive.rust-lang.org/stream/247081-t-compiler/performance/topic/categorize.20.E2.80.9Creal.20world.E2.80.9D.20vs.20.E2.80.9Cpathological.E2.80.9D.20benchmarks.3F.html#265253955) Transcribed from Will Partain's paper on [nofib][ezyang pdf]. [ezyang pdf]: http://web.mit.edu/~ezyang/Public/10.1.1.53.4124.pdf ### Real subset * Written to ~~standard Haskell (version 1.2 or later)~~ stable Rust * Written by someone trying to get a job done, not by someone trying to make a pedagogical or stylistic point * (pnkfelix wonders whether "trying to get a job done" includes expertise level; i.e. do you benchmark beginner code that has not had an optimization pass? Ah: see potential answer below) * Performs some useful task such that someone other than the author might want to execute the program for other than watch-a-demo reasons * Neither implausibly small nor impossibly large (the Glasgow Haskell compiler, written in Haskell, falls in the latter category) * Aside from pnkfelix: We are benchmarking the compiler itself on [perf.rlo]. Is `nofib` solely used for benchmarking the output code, or is it *also* used to benchmark Haskell compilers? (Perhaps we have inherently placed our task into an impossibly large area according to this document, and are just varying the inputs to that task... but still we must press on...) * The run time and space for the compiled program must be neither too small (e.g. time less than five secs.) or too large (e.g. such that a research student in a typical academic setting could not run it). #### Other desiderata for the Real subset as a whole: * Written by diverse people, with varying functional-programming skills and styles, at different sites * Include programs of varying "ages", from first attempts, to heavily-tuned rewritten-four-times behehmoths, to transliterations-from-LML, etc... * Span across as many different application areas as possible. * The suite, as a whole, should be able to compile and run to completion overnight, in a typical academic Unix computing environment ### Spectral subset Don't quite meet te criteria fr Real programs, usually the stipulation that someone other than the author might want to run them. Many of these programs fall into Henessy and Patterson's category of "kernel" benchmarks, being "small, key pieces from real programs" ### Imaginary subset Usual small toy benchmarks, e.g. `primes`, `kwic`, `queens`, and `tak`. These are distinctly unimportant, and you may get a special commendation if you ignore them completely. They can be quite useful as test programs, e.g. to answer the question, "Does the system work at all after Simon's changes?" # Meeting Questions/Topics * pnkfelix: Bi-classification sound good? Or should we go tri-classification like `nofib`? * jack huey: an important question is less about how we categorize benchmarks, but how do we use the categorization to decide if a PR is acceptable perf-wise * rylev: We in general leave it up to not-very-scientific heuristics on what is acceptable to merge or not. We'll likely want to decide what is acceptable for "important" benchmarks first and then decide how other categories differ. * aaron hill: set of metrics themselves should be expanded. We should disable changes in disk usage / artifact sizes * aaron hill: Right now, the only way to see artifact size changes is to open each individual benchmark page * rylev: This is a well known limitation and has simply not been implemented. We'd love to implement it but just haven't had the time yet * aaron hill: and there can often be trade-offs between compilation speed and disk usage * nagisa: Right now we use statistics extensively to hide what we consider to be irrelevant (right now only those that are within noise) changes in performance. I think it is important that we are careful to not hide what we consider synthetic benchmarks the same way. * pnkfelix: I think our current dashboard both shows too much (as in, too many benchmarks, or at least too many toy ones that are given equal prominence to real-world ones), but it also shows too little (having to switch between different metrics, rather than being able to see multiple metrics at once. * rylev: We also need to take profile (e.g., release, debug, doc, etc.) and scenario (i.e., incremental comp with different changes and cache states) into account # Discussion Topics * Assuming some (simple) classification, maybe a binary one, how would we use it to improve things? * what metrics should we gather and make prominent, by default, in our dashboard. * crater for benchmarks?

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully