Dorian Péron
  • NEW!
    NEW!  Connect Ideas Across Notes
    Save time and share insights. With Paragraph Citation, you can quote others’ work with source info built in. If someone cites your note, you’ll see a card showing where it’s used—bringing notes closer together.
    Got it
      • Create new note
      • Create a note from template
        • Sharing URL Link copied
        • /edit
        • View mode
          • Edit mode
          • View mode
          • Book mode
          • Slide mode
          Edit mode View mode Book mode Slide mode
        • Customize slides
        • Note Permission
        • Read
          • Only me
          • Signed-in users
          • Everyone
          Only me Signed-in users Everyone
        • Write
          • Only me
          • Signed-in users
          • Everyone
          Only me Signed-in users Everyone
        • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invite by email
        Invitee

        This note has no invitees

      • Publish Note

        Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

        Your note will be visible on your profile and discoverable by anyone.
        Your note is now live.
        This note is visible on your profile and discoverable online.
        Everyone on the web can find and read all notes of this public team.

        Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

        Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

        Explore these features while you wait
        Complete general settings
        Bookmark and like published notes
        Write a few more notes
        Complete general settings
        Write a few more notes
        See published notes
        Unpublish note
        Please check the box to agree to the Community Guidelines.
        View profile
      • Commenting
        Permission
        Disabled Forbidden Owners Signed-in users Everyone
      • Enable
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Suggest edit
        Permission
        Disabled Forbidden Owners Signed-in users Everyone
      • Enable
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
      • Emoji Reply
      • Enable
      • Versions and GitHub Sync
      • Note settings
      • Note Insights New
      • Engagement control
      • Make a copy
      • Transfer ownership
      • Delete this note
      • Save as template
      • Insert from template
      • Import from
        • Dropbox
        • Google Drive
        • Gist
        • Clipboard
      • Export to
        • Dropbox
        • Google Drive
        • Gist
      • Download
        • Markdown
        • HTML
        • Raw HTML
    Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
    Create Create new note Create a note from template
    Menu
    Options
    Engagement control Make a copy Transfer ownership Delete this note
    Import from
    Dropbox Google Drive Gist Clipboard
    Export to
    Dropbox Google Drive Gist
    Download
    Markdown HTML Raw HTML
    Back
    Sharing URL Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Customize slides
    Note Permission
    Read
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Write
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Design document for MCDC coverage instrumentation in rustc This document aims to give some ideas of how to implement MCDC coverage instrumentation in rustc. It makes the assumption that we will be using LLVM as a backend for coverage analysis. It uses a bottom-up approach, where we enumerate what needs to be provided to LLVM for the analysis to work and use these requirements to guide our implementation choices. ## Alternative approaches This document does not dive into other approaches that were discussed in the context of the MCDC project goal. Namely, the Stable-MIR approach was discussed, compared to the approaches suggested below it would require preliminary work to first enrich MIR to be able to identify decisions, their shape and differentiate them from nested decisions, and a mechanism to modify the representation and feed it back into the compilation process. ## What LLVM expects For MCDC instrumentation, LLVM expects 2 things: coverage mappings, which allow reporting tools to map coverage decision results back to source code locations, and appropriate instrumentation of the decisions within functions. ### Mappings LLVM MCDC mappings are a kind of coverage mapping, like `CodeRegion` or `BranchRegion` (see [CounterMappingRegion](https://github.com/llvm/llvm-project/blob/b82c7fc65229c8b2b6a964f023f6ec59b3cf9210/llvm/include/llvm/ProfileData/Coverage/CoverageMapping.h#L232)). In particular, an MCDC decision requires 2 types of mappings to be given: - A top-level mapping `MCDCDecisionRegion`, which holds a span to the entire boolean expression, an indice to the bitmap to find the executed test vectors, and the number of conditions in the decision (see [mcdc::DecisionParameters](https://github.com/llvm/llvm-project/blob/b82c7fc65229c8b2b6a964f023f6ec59b3cf9210/llvm/include/llvm/ProfileData/Coverage/MCDCTypes.h#L28)) - For each condition, an additional `MCDCBranchRegion`, which holds the span of this specific condition, its conditionID and the 2 conditionIDs to be evaluated after it (see [mcdc::BranchParameters](https://github.com/llvm/llvm-project/blob/b82c7fc65229c8b2b6a964f023f6ec59b3cf9210/llvm/include/llvm/ProfileData/Coverage/MCDCTypes.h#L42)) With this data, the reporting tool can re-build the BDD and map test vectors to actual evaluations of the conditions. ### Decision instrumentation If a function contains at least one decision worthy of being instrumented, LLVM expects to find a call to the `llvm.instrprof.mcdc.parameters` intrinsic ([documentation here](https://llvm.org/docs/LangRef.html#llvm-instrprof-mcdc-parameters-intrinsic)) before entering the first decision. It is used to let LLVM know that MCDC instrumentation is expected in this function. Among the arguments passed to it, one is the total number of possible test vectors for all decisions within the function. Additionally, a condition bitmap should be allocated. A simple 32-bits integer is enough to handle the maximum size of decisions accepted by LLVM. While the global bitmap is only accessible through LLVM intrinsics, we are responsible for allocating and handling the condition bitmap. > Note: In clang, nested decisions are not instrumented at all. In the previous MCDC instrumentation of rustc, nested decisions were instrumented by allocating extra condition bitmaps to account for concurrent evaluation of decisions. Then, when leaving a decision, a call to the `llvm.instrprof.mcdc.tvbitmap.update` intrinsic ([doc here](https://llvm.org/docs/LangRef.html#llvm-instrprof-mcdc-tvbitmap-update-intrinsic)) shall be made. It is used to update the global bitmap to register. The overall value of the condition bitmap, after it has been successively updated with the true or false evaluation of each condition, uniquely identifies an executed MC/DC test vector and is used as a bit index into the global test vector bitmap. > Note: for this, clang took the approach of routing all exits of the decision to the same basic block with the call to the intrinsic, and then re-dispatch to the right outcome. ### Details on test vector indices The condition bitmap is used to represent the evaluated test vector. Its value at the end of the evaluation must be distinct for each test vector. For a decision with N conditions, The naive implementation would be to assign an ID in (0..N) to each condition, and allocate a condition bitmap of size N. Therefore, a test vector is simply computed by setting the N-th bit of the condition bitmap if the N-th condition is true. The downside to this approach is that the condition bitmap grows linearly with the number of conditions, and due to short-circuiting operators, many of these bits won’t ever be set. The way LLVM handles this is by looking at the Binary Decision Diagram, and inserting increments to the condition bitmap (which effectively becomes a test vector counter) on specific edges. The algorithm responsible for computing these increments is defined [here in LLVM](https://github.com/llvm/llvm-project/blob/d594d9f7f4dc6eb748b3261917db689fdc348b96/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp#L226). It is important to implement the same algorithm on the frontend, since the values it generates are used by the reporting tool (llvm-cov) to understand which test vectors were encountered. [The previous implementation re-implemented the algorithm](https://github.com/rust-lang/rust/pull/144999/changes#diff-d9a6d6537722bc1c1a8b47796feaf6610569d37d505cabdb54d7fdee2b5c3147L273), alternatively we could generate some bindings to the C++ API to compute the offsets to minimize the need to manually update the algorithm should it change. ### Example Let's work with the following example: `A && B || C`. This decision has 3 conditions: `A`, `B`, `C`. The BDD for this decisions looks like so ```mermaid flowchart TD A{A} B{B} C{C} T[T] F[F] A -- T --> B B -- T --> T B -- F --> C A -- F --> C C -- T --> T C -- F --> F ``` After analysis, the instrumented code could look like this (The values here give a correct solution, but may not be the exact ones LLVM would expect) ```mermaid flowchart TD A{A<br><small>tvidx = 0</small>} B{B} C{C} T["T<br><small>tvbitmap.update(..., tvidx)</small>"] F["F<br><small>tvbitmap.update(..., tvidx)</small>"] A -- T --> B B -- T<br><small>tvidx += 4</small> --> T B -- F --> C A -- F<br><small>tvidx += 1</small> --> C C -- T<br><small>tvidx += 2</small> --> T C -- F --> F ``` From this, we can infer the value of all test vectors in the following table: | A | B | C | OUTCOME | `tvidx` | |---|---|---|---------|---------| | T | T | * | True | 4 | | T | F | T | True | 2 | | T | F | F | False | 0 | | F | * | T | True | 3 | | F | * | F | False | 1 | Note the following properties: - Each path in the BDD produces a different `tvidx` - All possible values of `tvidx` are contained in 0..5 ## What we will implement We've identified a couple of potential directions for instrumentation, assuming a fully in-compiler instrumentation. Both rely on a traversal of THIR to extract mappings and identify expressions of interest. ### Approach #1: THIR to MIR instrumentation With this approach, we hook to the THIR-to-MIR step. Specifically, when transforming a boolean expression to MIR, a MCDC builder is created and will follow along with the generation of the condition nodes. It will both extract the code spans of interest and insert true and false markers after conditions evaluations. These markers are then used in the InstrumentCoverage MIR pass to insert condition bitmap increments as decided by the computation of test vectors indices (see [Computing test vector indices](#Precision-on-test-vector-indices)). Pros: - Decisions are easily identified thanks to having access to an AST form - We get the BDD generation for free, since it’s the MIR form - Integrates well with existing coverage instrumentation. Cons: - Instrumentation spans over several steps in the compiler, thus having a footprint > Note: This is a similar approach to the previous implementation ### Approach #2: Direct THIR instrumentation Decision coverage and MC/DC are criteria defined on source constructs. As such instrumenting on a representation still fairly close to the source allows for less bookkeeping overall. In this approach the goal is to gather all relevant data to generate the mappings and compute the BDD for all decisions like in the approach #1, but then to also modify the THIR tree to declare for each decision a temporary condition bitmap, update the said bitmap to compute the final tv_bitmap index on each condition, and insert placeholders for the tv bitmap declaration and tv bitmap update intrinsics. The sort of transformation we’d aim for would be something like: ```rust a && (b || c) {let cond_bitmap_n : u32 = 0; let res: bool = { {if a {cond_bitmap_n += [offset_for_a]; true} else {false}} && ( {if b {cond_bitmap_n += [offset_for_b]; true} else {false}} || {if c {cond_bitmap_n += [offset_for_c]; true} else {false}} ) }; placeholder_tv_bitmap_update(..., decision_n_idx, cond_bitmap_n); res } ``` Where: - `cond_bitmap_n` is the condition bitmap for a given decision (hence the `_n` suffix) - `[offset_for_x]` are constants computed when establishing the BDD (see [Computing test vector indices](#Details-on-test-vector-indices)) - `placeholder_tv_bitmap_update` is a dummy function to be replaced by the right intrinsic during codegen. The example above is in source form instead of THIR form as the expanded AST is a bit verbose. Pros: - Instrumentation closer to source level constructs - No need for special handling of expressions not generating branches (constants, last condition in an assignment, etc) Cons: - Distinct instrumentation pass from block & branch instrumentation, this one happening first it might generate spurious blocks and/or branches - A lot more nodes to be generated in the IR compared to the MIR based instrumentation - Rewriting some of the THIR nodes can be a bit fragile and requires careful update of all the indices when replacing a node with a new one. ## Concerning Pattern Matching Pattern matching surely is of interest for MC/DC analysis, since it is a principal way of making decisions in Rust. This design proposal is pretty much that of the paper [Towards MC/DC of Rust](https://elib.dlr.de/219590/1/219590%20-%20toward-modified-condition-decision-coverage-of-rust.pdf), and informed by the discussions that happened on [Zulip's "What's MCDC in Rust"](https://rust-lang.zulipchat.com/#narrow/channel/445688-safety-critical-consortium/topic/What.27s.20MC.2FDC.20in.20Rust.3F/with/577541230). We define the notion of "condition" for pattern matching using pattern refutability, i.e. the fact that a given pattern may match all or only a subset of the possible values that can reach it. A few examples (the paper dives into the details): - the `_` pattern is a wildcard that matches everything and this is irrefutable by definition. - the `true` boolean pattern by itself is refutable, since it would not match a `false` value. - the `true | false` boolean pattern is irrefutable, since it will matches all boolean values. - the `true` pattern seen after a `false` pattern is irrefutable. - more generally, "composite" patterns' refutability depends on the refutability of its children. We define MC/DC for patterns as follows: a whole pattern (e.g. a match arm, or a `let pat = expr`) is a decision, and its refutable subpatterns are its conditions. Example: ```rust if let Some(true) = opt { ... } // Can be interpreted as a decision looking like if opt.is_some() && opt.0 { ... } ``` ### Single pattern vs. multi-pattern constructs While single-pattern constructs like the `if-let` or the `let-else` are relatively simple to cover the same way you would cover a standard `if <boolean>` expression, in a `match` expression refutability depends on the patterns seen before. For example here the `false` is considered irrefutable. ```rust match (a, b) { (true, Enum::X) => ... (false, Enum::X) => ... _ => ... } ``` ### Irrefutability by context The design choice we made reflects the fact that it is considered good practice to write explicitely pattern values that will always match, for example the previous example is equivalent to ```rust match (a, b) { (true, Enum::X) => ... (_, Enum::X) => ... _ => ... } ``` yet the first version is preferred by good practice. If we instrumented patterns one by one, we would consider the `false` pattern to be refutable, thus would be an uncoverable condition, leading MC/DC to guide the user to write the less good practice version. For now, we plan to support "naive" pattern instrumentation that ignores context, and save context-aware analysis for further enhancement. > Note: Pattern matching instrumentation will require heavy analysis both on the THIR and MIR sides, further justifying our choice to implement MC/DC instrumentation in the compiler itself. ## Notions yet to be addressed - `.and_then(...)` functions and friends for idiomatic types `Option<T>` and `Result<T>` - The `Option<T>` enum can be considered as a boolean type, with Some being true and None being false, and considering the various logical functions defined in Std as boolean operators. This requires special cases in the compiler though, as these are implemented as inlined functions. - Constants (both explicit and deduced at compile time) ## Glossary - **Decision**: In MCDC context, a decision is a boolean expression, featuring one or more conditions, associated with logical boolean operators (or, and, not). - DO-330 wording: *Decision – A Boolean expression composed of conditions and zero or more Boolean operators. If a condition appears more than once in a decision, each occurrence is a distinct condition.* - **Condition**: In MCDC context, a condition is a simple boolean expression. These can be “True” and “False” literals, boolean variable references, boolean-returning function calls, etc... These can be seen as leaves of an boolean expression AST. - DO-330 wording: *Condition – A Boolean expression containing no Boolean operators except for the unary operator, that is, NOT.* - **Test Vector**: For a given decision, a test vector is a possible input for it. It assigns a boolean value to each condition. - **CFG**: Control Flow Graph, typically the MIR representation in rustc - **BDD**: Binary Decision Diagram, subgraph of the CFG that represents the evaluation process of a decision. A BDD only has one root node (the first condition to be evaluated), and 2 leaf nodes (one for the true outcome, one for the false) # Discussion ## On MIR vs THIR etc Nadrieril: if I correctly understand, we need to know the shape of the condition graph of a given decision to know how to increment indices correctly. If so, then we need to operate on MIR, because the shape of the condition graph is computed by MIR lowering and we don't want to duplicate this logic, particularly for patterns. One approach I can imagine is that THIR->MIR would insert some markers that say "a decision starts at this block and ends at these other blocks". Then a MIR pass can scan the identified subgraphs and insert the right extra statements. ## On LLVM as a backend Félix Fischer: If we're going to be using LLVM as a backend for coverage analysis, then perhaps [Mull](https://github.com/mull-project/mull)'s experience could be useful. Mull is a library for doing Mutation Testing (which we could perhaps call a form of stochastic MC/DC) through LLVM. Last I studied it was a while ago, but I recall the developers saying that the mutants were generated at the branching points that LLVM made from the program.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully