James
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Publish Note

      Everyone on the web can find and read all notes of this public team.
      Once published, notes can be searched and viewed by anyone online.
      See published notes
      Please check the box to agree to the Community Guidelines.
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Publish Note

Everyone on the web can find and read all notes of this public team.
Once published, notes can be searched and viewed by anyone online.
See published notes
Please check the box to agree to the Community Guidelines.
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Rustc-perf - multi-collectors This document aims to note some of the design considerations required to make `rustc-perf` both multi-architecture and multi-collector, and also to support new features, such as backfilling missing benchmark results for non-standard benchmark parameters. The overall philosophy described below is to provide a base to build upon. Balancing today's needs with tomorrow's ideals. It might be missing in certain details and perhaps too detailed in others. For purposes of discussion the table below details a set of keywords, or a glossary of terms. The naming aims to minimally identify the constituent parts of the system. However the precise naming of these items below is illustrative and open to improvement. ## Keywords | Term | Meaning | |------|---------| | **artifact** | A single Rust compiler toolchain built from a specific commit SHA. | | **metric** | A quantifiable metric gathered during the execution of the compiler (e.g. instruction count). | | **benchmark** | A Rust crate that will be used for benchmarking the performance of `rustc` (a compile-time benchmark) or its codegen quality (a runtime benchmark) | | **profile** | Describes how to run the compiler (e.g. `cargo build/check`). A profile is a **benchmark parameter**. | | **scenario** | Further specifies how to invoke the compiler (e.g. incremental rebuild/full build). A scenario is a **benchmark parameter**. | | **backend** | Codegen backend used when invoking `rustc`. A backend is a **benchmark parameter**. | | **target** | Roughly the Rust target triple, e.g. `aarch64-unknown-linux-gnu`. A target is a **benchmark parameter**. | | **benchmark suite** | A set of *benchmarks*. We have two suites - compile-time and runtime. | | **test case** | A combination of a *benchmark* and its *benchmark parameters* that uniquely identifies a single *test*. For compile-time benchmarks, it's *benchmark* + *profile* + *scenario* + *backend* + *target*, for runtime benchmarks it's just *benchmark*. Unique instance of compile-time/run-time benchmark parameters. | | **test** | Identifies the act of benchmarking an *artifact* under a specific *test case*. Each test consists of several *test iterations*. | | **test iteration** | A single actual execution of a *test*. | | **collection** | A set of all *statistics* for a single *test iteration*. | | **test result** | The result of gathering all *statistics* from a single *test*. Aggregates results from all *test iterations* of that *test*, so a *test result* is essentially the union of *collections*. Usually we just take the minimum of each statistic out of all its *collections*. | | **statistic** | A single measured value of a *metric* in a *test result*. | | **run** | A set of all *test results* for a set of *test cases* measured on a single *artifact*. | | **benchmark request** | A request for a benchmarking a *run* for a given *artifact*. Can be either created from a try build on a PR, or it is automatically determined from merged master/release *artifacts*. | | **collector** | A physical runner for benchmarking the compiler. | | **cluster** | One or more collectors of the same target, for benchmarking the compiler. | | **collector_id** | A unique identifier of a *collector* (hard-coded at first for simplicity). | | **COLLECTOR_PER_TARGET_COUNT** | Number of collectors *per architecture* (initially `2 x x86_64`, `1 x AArch64`). | | **COLLECTOR_COUNT_TOTAL** | Collectors across **all** architectures. | | **job** | High-level "work item" that defines a set of *test cases* that should be benchmarked on a specific collector. | | **job_queue** | Queue of *jobs*. | | **MAX_JOB_RETRIES** | Maximum number of retries for a *job*. | | **Assigning a job** | The act of allocating one or more *jobs* to a collector. | | **website** | A standalone server responsible for inserting work into the queue. | ## Launching a multi-collector system To support both multi-collector and multi-architecture execution, we propose building a parallel system, which would have separate DB tables. Some of them for new concepts, some of them for duplicating old data in a better format. This would allow us to ship code incrementally and test in a live environment on a parallel collector. Once we have confirmed that the new system works as expected, we will remove the old code and switch to the new system. ## Requirements We want to support the following features, which affect the design of the whole system: - The *artifact* (commit SHA of the compiler) is used as the unique identifier of benchmark results (of the *run*). - When a *benchmark request* is created, it might request *test cases* that were not previously benchmarked for its parent commit (e.g. Cranelift codegen backend). In that case it should be possible to *backfill* additional *test cases* into the *run* of the parent commit, even though it was benchmarked previously and marked as finished. - However, it will not be possible to remove old results, only append to them. And you can only append test results for test cases that were missing previously. - This should be useful for requesting non-default benchmark parameters on a PR, e.g. Clippy or rustdoc with JSON output. - *Test cases* should be split into multiple subsets, so that each subset is always executed on exactly one *collector*. - They should be split based on the *target* and a subset of *benchmarks*. ## High-level design From the outside, the whole system will behave quite similarly as [before](https://kobzol.github.io/rust/rustc/2023/08/18/rustc-benchmark-suite.html). The website will make sure that try build benchmark requests from PRs, and master and published artifacts from our CI, will be benchmarked in a timely manner. Benchmarks are always recorded in the DB using *benchmark requests*. These can be created in two ways: - When someone does `@rust-timer queue/build` on a PR, a benchmark request for a try build will be created and stored in the [benchmark_requests](#benchmark_requests-table) table, with the `waiting for artifacts` or `waiting for parent` status (`queue` vs `build`). - For the `build` command, we should also check if a request for the same commit SHA wasn't already made previously. We can either error here or allow backfilling data (but this should be super rare). - When the website notices a missing master/published artifact, it will also be stored into this table. The website will run a periodic cron job (e.g. every minute or something) that will do a number of things for different types of artifacts: > Note that the descriptions here use some terminology described in the `benchmark_request` table. It's a dependency cycle and we have to unwrap it somewhere :) ### Published artifacts The website will go through all recent published artifacts, and check if they are done by looking at the `sha` and `status` column in the `benchmark_request` table. - If the request is already marked as `completed`, nothing happens. - If the request is `in progress`, nothing happens. - If it request is missing, it will be immediately inserted into the table and will be [*enqueued*](#Enqueing-a-commit). ### Master artifacts The website will go through all recent master commits, and check if they are done by looking at the `sha` and `status` column in the `benchmark_request` table. - If the request is already marked as `completed`, nothing happens. - If the request is `in progress`, check [request completion](#Checking-request-completion). - If the request is `waiting for parent` commit benchmark to be completed, nothing happens. - If it request is missing, we will recursively find a set of parent master commits that are missing data (by looking at their status in `benchmark_request`). - If the set is non-empty, these commits will be handled recursively with the same logic as this commit. - If the set is empty, the request will be *enqueued*. ### Try artifacts > The logic for try artifacts can either happen both in cron and in the GH webhook listener (that receives `@rust-timer queue/build` notifications), or only in cron. The website will go through all try artifacts in `benchmark_request` that are not yet marked as `completed`. - If the request is `waiting for artifacts`, do nothing (sometime later a GH notification will switch the status to `waiting for parent` once the artifacts are ready). - If the request is `waiting for parent`: - Recursively find a set of **grandparent** master commits that are missing data (by looking at their status in `benchmark_request`). This could happen on the edge switch from `waiting for artifacts` to `waiting for parent` in the GH webhook handler, or it could happen in each cron invocation. - If that set is empty, generate all necessary **parent** jobs and check if they are all completed in the `job_queue`. - If yes, *enqueue* the request. - If not, insert these jobs into the jobqueue. This is where backfilling happens, as we can backfill e.g. new backends for a parent master commit that was only benchmarked for LLVM before. - If the request is `in progress`, check [request completion](#Checking-request-completion). ## Enqueing a commit Enqueing a commit means two things: 1) Generate all jobs for a request 2) Insert them into `job_queue` AND ATOMICALLY set the request to have status `in progress`. ## Checking request completion Once the website sees a try or a master request with status `in progress`, it will check if all its jobs in `job_queue` have been completed. We could either: 1) Store a FK for each job that links it to a single benchmark request. With this approach, we can simply query all jobs belonging to a given request and check if they are completed. - This would however mean that "fake" backfilled jobs that were inserted into the DB for a master parent commit would link to a benchmark request that wouldn't be fully consistent with the job (e.g. a job with cranelift backend would link to a request that does not ask for cranelift). However, that might not be an issue :man-shrugging: 2) Alternatively, we can generate all jobs required for the try benchmark request, and check if all of them are in the DB with status `completed`. This has the benefit that collector wouldn't have to touch `benchmark_request` at all (but it shouldn't really matter, it would only read anyway). Once we do that, and we figure out that a request was completed, we switch its state from `in progress` to `completed`, and if it was a try or a master request, send a comment to its PR. ## Job lifecycle When a job is inserted into the job queue, it starts in the status `queued`. ### Collector Once a collector tries to pick up a job, it does the following: - If there is already a job for the collector in state `in progress`, it keeps that same status, but increments the retry counter, and then goes on to benchmark the job. - If the retry counter reaches a predetermined maximum, the job is marked as `failed` instead. - Invariant: there shouldn't ever be more than a single job in state `in progress` for a single collector. - If there are jobs for the collector in state `queued`, it picks up (according to the [job ordering](#Job-ordering)) and marks it as `in progress` - Note that each job already contains predetermined collector ID, so two collectors shouldn't ever race on the same job. - If the collector fails expectedly during benchmarking a job (i.e. `Result`), and it thinks the error is unrecoverable, it marks the job as `failed`. - If the collector fails unexpectedly during benchmarking a job (i.e. panic/crash), the job will stay at `in progress` and it should be picked up later once the collector restarts. - If the benchmark job is successful, it is marked as `success`. ### Website In the cron job, the website goes through benchmark requests that are marked as `in progress`. For each such request, it: - Gets all jobs for that request. - Finds out if they are all *completed* (either `failed` or `successful`). If not, it bails out. - If yes, and it's a master/try artifact, it sends a PR comment to GitHub with the result of the benchmark request. It also looks for jobs that have a non-NULL `completed_at` date, and if it is older than 30 days, it removes these jobs from `job_queue`. ## `benchmark_request` table This table stores permanent benchmark requests for try builds on PRs and for master and published artifacts. If any benchmarking happens (through the website), there has to be a record of it in `benchmark_request`. Columns with `?` are `NULL`able. | Column | Data Type | |--------------|--------------| | id | auto int | | tag | text | | parent_sha | text? | | commit_type | text | | pr | int | | created_at | timestamptz | | completed_at | timestamptz? | | status | text | | backends | text | | profiles | text | - `tag` represents commit SHA for master/try artifacts, and release name for release artifacts - `commit_type` is `master`/`try`/`release`. - `finished_at` is set when `status` becomes `complete` - The benchmark parameters included in this table determine what we can backfill. - `backends` => backfill Cranelift - `profiles` => backfill Clippy/DocJson The `status` of the request can be: - `waiting for artifacts`: a try build is waiting until CI produces the artifacts needed for benchmarking - `waiting for parent`: - master artifact waits for all its (grand)parent benchmark requests to be completed - try artifact waits for all its (grand)parent benchmark requests to be completed, plus optionally for all its direct parent jobs to be completed (due to backfilling) - `in progress`: jobs for this request are currently in `job_queue`, waiting to be benchmarked - `completed`: all jobs have been completed, and a GH PR comment was sent for try/master builds ### Benchmark requests ordering We need to figure out how to construct a "virtual queue" to display on the status page. This queue is also used to estimate when will a given benchmark request finish. 1) In-progress requests - Sort them by start time, then by PR number 2) Release requests - Sort them by release date, then by name 3) Requests whose parent is ready - Do a topological sort (topological index = transitive number of parents that are not finished yet) - Order by topological index, type (master before try), then PR number, then `created_at` 4) Requests that are waiting for artifacts - Order by PR number, then `created_at` ## `job_queue` table This table stores ephemeral benchmark jobs, which specifically tell the collector which benchmarks it should execute. The jobs will be kept in the table for ~30 days after being completed, so that we can quickly figure out what master parent jobs we need to backfill when handling try builds. If you request backfill of data after 30 days (should be incredibly rare), new jobs will be created, but that shouldn't matter, because the collector will pick them up, do essentially a no-op (because the test results will be already in the DB), and then mark the job as finished, at which point it will stay in the queue for another 30 days. The table keeps the following invariant: each job stored into it has all its corresponding parent test cases benchmarked and stored in the DB. | Column | Data Type | |-----------------|-------------| | id | auto int | | request_id | FK to `benchmark_request` | | target | text | | backend | text | | profile | text | | benchmark_set | int | | collector_id | text | | started_at | timestamptz | | completed_at | timestamptz | | status | text | | retry | int | | error | text | - `request_id` is a FK that allows fetching commit SHA, PR number and commit type - `collector_id` could alternatively be a FK to the `collector_spec` table - `status` is one of `queued`, `in progress`, `failed`, `success`. - `retry` marks the number of times the job has been tried but has failed. A job could be retried up to a predetermined number of times. - `error` contains a "global" error that happened during the job. Benchmark errors are actually stored in a separate `errors` table, which links to a given artifact and benchmark. But there can also be non-benchmark errors, such as failure to download an artifact from CI (the most common error). That would be stored here. ### Job ordering When a collector determines what job to pull from the queue, it should: - Filter only jobs with `status` in TODO - Order them by (`commit_type`, `pr`, `created_at`, `sha`) - `commit_type`: "release" then "master" then "try" --- REST of the document (@kobzol ended here) --- ## High level diagramatic overview ![benchmark_request_job_queue](https://hackmd.io/_uploads/S1PtjH_7lg.jpg) <figure>Overview of the job queue hierarchy</figure> This structure is then used in combination with the following. Which allows multiple collectors to read configuration to determine which jobs they should take; ![collector_config](https://hackmd.io/_uploads/B1CqirdXee.jpg) <figure>Overview of how a collector consults some intermediary tables to know what benchmark "test iterations" it should perform</figure> ## benchmark_set A `benchmark_set` tells a collector which benchmarks it must run. If, for example, the crate `serde` belongs to set 1, the collector assigned to set 1 will run every combination of profile, scenario, backend, and target for serde. Thus, when both the Cranelift and LLVM back-ends are requested, that same collector handles them all. The configuration will be hardcoded in the github repository and changes to it will be made through pull requests. This saves us from configuring things at the database level with little visibility of the changes. Some of the downsides of this static dependency; - If a collector goes offline then the queue stalls. We could mitigate against this by having an idle collector take the job. If we had both collectors doing all jobs over time we would know how long it takes for each collector to benchmark a particular job. Or we might be able to send a message to Zulip to notify us. - An open question is; if we decommission a collector what do we do with the old results? How do we re-balance the jobs? ## Benchmark set schema While somewhat verbose, the below provides a way for a collector to look up what jobs it should be benchmarking. The simplest way to describe this would be a `benchmark_set_id` and a list of strings for the jobs. However to allow for future extensibility (perhaps some jobs have special configuration) a job object; a `"name": "<job_name>"` pairing seems a reasonable starting point. Currently we split between a compile-time/runtime benchmark is by directory, thus it could be added here instead. ```json { "<benchmark_set_id>": { "jobs": [ { "name": "<job_name>" } ] } } ``` In the absence of an admin dashboard for the maintenance of rustc-perf configuration hardcoded JSON in the repo is seemingly the simplest approach. If we need to change the configuration we can do so via submitting PRs which means we have a form of audit trail for the changes to the configuration. As opposed to SSH-ing into a database to update a Table which is fairly opaque. The downside of a hardcoded configuration is it becomes another thing we need to update when adding a new benchmark for the repo. We could add a "Adding a New Benchmark" section to the README.md to provide a checklist which describes the process for adding a new benchmark. ## How to split the benchmark jobs for multiple collectors? For purposes of discussion we will assume `COLLECTOR_PER_TARGET_COUNT = 2` and `JOB_COUNT = 4` we need to roughly determine how long each job takes to equally split the jobs between the collectors. Say we have the following jobs; - `A` 4mins - `B` 20mins - `C` 6mins - `D` 10mins In this case, one collector would take jobs `A`, `C` and `D` the other would take `B`. As this would perfectly be a 20-minute split per collector. In the instance where there was only one collector all jobs would need to be taken by that collector. In order to set this split up we would need to compose a list of all the jobs. ## Information about collectors Possibly we might want to have some information about the collectors that are running so we can identify which collector ran which job. This may also be useful for internal bookkeeping. On this table there is a suggestion for a `last_heartbeat_at` column so we can detect if a collector has gone offline. The collector would have a cron job that periodically updates the date. The website, which is responsible for queueing work, would determine if the collector is still alive or not. `is_active` denotes if the collector should be used for benchmarking. **`collector_config` table** | Column | Data Type | |--------|------| | id | UUID | | target | TEXT | | date_added | TIMESTAMPTZ | | last_heartbeat_at | TIMESTAMPTZ | | benchmark_set | UUID | | is_active | BOOLEAN | - `benchmark_set`, if this is NULL it can be assumed the collector should do all of the benchmarking ### Job Queue open questions (2025/06/04) - How to handle iterations? - How to handle errors? (retry count) - Which benchmark parameters are in the job? - target, backend - **How to represent the benchmark sets?** - How to represent runtime benchmarks and the special rustc benchmark? ### Debugging Have a temporary page where we can inspect the contents and ordering of the queue.

Import from clipboard

Paste your webpage below. It will be converted to Markdown.

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template is not available.
Upgrade
All
  • All
  • Team
No template found.

Create custom template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

How to use Slide mode

API Docs

Edit in VSCode

Install browser extension

Get in Touch

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

No updates to save
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully