Stable MIR Design

Note: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Introduction

Many static analysis tools rely on the internals of the rust compiler to be able to efficiently analyze programs that were written in Rust. Most tools analyze the mid-level IR (MIR) together with rustc's type system. The MIR is a much simpler language when compared to Rust, but much richer than LLVM's IR.

In order to do that today, these tools use rustc's internal APIs to extend the compiler. These mechanisms only work for nightly toolchain distributions, and the API changes very often, which makes maintenance very costly.

Another issue is that the semantics of the MIR is not well documented, which can potentially impact the result of these analysis.

The goal of the Stable MIR project is to provide a stable interface to the Rust compiler that allow tool developers to develop sophisticated analysis with a reduced maintenance cost without compromising the compiler development speed.

Tenets

  1. Enable tool developers to implement sophisticated analysis with low maintenance cost.
  2. Do not compromise the development and innovation speed of the rust compiler.
  3. Once stable (v1.0+), releases must follow semantic versioning https://semver.org/.
  4. Backward compatible changes are preferred, not required.
  5. The rust compiler should expose interface to Stable MIR versions released 3 months prior to its release. It may support multiple Stable MIR versions at the same time to achieve this.
  6. The SMIR should be performant, with minimum memory and processing overhead when compared to using internal MIR.
  7. Unstable APIs should be developed under feature gates and they shall be excluded from the semantic versioning.

Project Milestones

This is a big endeavor for both compiler and tool developers. We anticipate that tool developers will need to make changes to their code to transition from internal APIs to stable ones. Whenever possible, the new APIs should mimic the current APIs to reduce the transition burden to tool developers.

For the compiler developers, once an API has been incorporated to the Stable MIR, they shall be maintained following the tenets and the Stable MIR API guidelines.

We will follow a gradual transition to the new model.

Minimum Viable Product (Stable MIR v0.1)

The first milestone will be landing a version of stable MIR that allows a MIR consumer (e.g. kani, miri, or a mini-miri demo) to switch its use of MIR datastructures completely over to it (even if it still uses other unstable APIs). We envision the following requirements for this release:

  • Provide API to emit the body of a function solely with stable MIR datastructures.
  • Provide opaque objects which will store enough information for users to leverage other parts of the compiler. E.g.: Extract the type of an Operand.

For this release, there will be no changes to how tools load the Rustc library or find the functions/statics they want to get the stable MIR for. Also, only nightly will be supported, even just to access the stable MIR datastructures.

We will only support MIR's "runtime-optimized".

API Coverage and Stabilization (v0.2+)

Subsequent releases of the Stable MIR will focus on increasing the API coverage to include other parts of the compiler that are required to semantically analyze the program as well as APIs used to provide a similar UX to the compiler.

  • Provide API to visit all items (static variables, constants, and functions) of every crate compiled (current crate + dependencies).
  • Support emitting stable MIR of analysis-MIR

Long term goal (v1.0+)

The first stable release (v1.0) shall only be done once the SMIR has sufficient API coverage and the APIs are stable.

The following changes are orthogonal to the stabilization effort but shall be implemented to improve the overall experience, and they are in the scope of this project:

  • Completely decouple tools from Rustc's internal APIs. Including invoking Rustc's driver, error reporting, type queries.
  • Support different MIR dialects.
  • Interface can be used with all toolchain channels (stable, beta, nightly)
  • Provide a server / client architecture that provides a more robust and interactive experience.
  • The Stable MIR can be used as an input to the compiler.
  • possible future extension: share the MIR datastructures with the stable MIR datastructures via https://github.com/rust-lang/compiler-team/issues/233

MVP Design

The stable-mir will follow a similar approach to proc-macro2. It's implementation will be broken down into two main crates:

  • stable_mir: Public crate, to be published on crates.io, which will contain the stable data structure as well as proxy APIs to make calls to the compiler.
  • rustc_smir: The compiler crate that will translate from internal MIR to SMIR. This crate will also implement APIs that will be invoked by stable-mir to query the compiler for more information.

This will help tools to communicate with the rust compiler via stable APIs. Tools will depend on stable_mir crate, which will invoke the compiler using APIs defined in rustc_smir. I.e.:

   ┌───────────────────────────────┐           ┌───────────────────────────────┐
   │   External Tool  ┌──────────┐ │           │ ┌──────────┐   Rust Compiler  │
   │                  │          │ │           │ │          │                  │
   │                  │stable_mir| │           │ │rustc_smir│                  │
   │                  │          │ ├──────────►| │          │                  │
   │                  │          │ │◄──────────┤ │          │                  │
   │                  │          │ │           │ │          │                  │
   │                  │          │ │           │ │          │                  │
   │                  └──────────┘ │           │ └──────────┘                  │
   └───────────────────────────────┘           └───────────────────────────────┘

A compile time check will be added to trigger an error if the stable_mir is not compatible with the current compiler.

Multiple version support

In order to support multiple versions of stable_mir, the rustc_smir will contain modules that can be mapped to different versions of stable_mir.

Example, the stable_mir v0.1, will invoke APIs from the rustc_smir::version_0_1::*, while the version v.0.2 will invoke rustc_smir::version_0_2::*.

What about forward compatibility? See Open Questions section.

MIR / SMIR interoperability

Until we have enough coverage, users will still rely on internal APIs to initialize the Stable MIR constructs and to retrieve further information that hasn't been covered by stable APIs. E.g.: Getting the type of an operand.

There are a few ways that this can be implemented, and we still need to assess them before settling into one. A few options are:

  1. Some Stable MIR API's will still handle internal compiler data structures, such as TyCtxt and Ty. These types would be re-exported using an unstable or internal module.
  2. Stable MIR constructs can be converted to internal ones.
  3. Expose internal compiler APIs that support Stable MIR constructs.

API Guidelines

MIR Datatypes

  • No interning, everything is using Box<T> instead of `&'tcx T'
    • Types are opaque handles (struct Ty(String)) that for now only offer Display impls and no other API
    • we'll quickly iterate to (struct Ty(usize)) that the API has a map for figuring out the Ty<'tcx> from and then querying that for infomation (e.g. size, layout, ).

Type Datatypes

  • re-use rustc_type_ir for our types. This is independent of TyCtxt but still exposes the entirety of TyKind and similar.

Appendix

Open Questions

  1. Should MIR dialects coverage be a requirement for v1.0?
  2. Should we support forward compatibility? I.e.: Using a new version of stable-mir with an older version of the compiler? Do we have a use case in mind?
  3. What traits should StableMIR types implement? For now, we've been adding Clone and Debug. We should come up with a guideline to when we can add more traits. E.g.::
    • Hash
    • Copy
    • PartialEq / Eq
    • PartialOrd / Ord
    • Serialize / Deserialize (serde): This one could be guarded by a feature.

FAQ

Requirements

The table below contains a more detailed list of requirements and in which version of stable-mir they should be met. For the target version, we use the following values:

  • v0.1: represents the minimum viable product.
  • v0.2+: represents subsequent releases of stable-mir that increase the API coverage.
  • v1.0+: represents the long term goal.
# Requirements Description Target Version*
0 Re-use rustc_smir Remove all unstable depencencies of rustc_smir v0.1
1 MIR Datatypes Provide API that correspond to the current MIR Datatypes. v0.1
2 Support API Provide API to retrieve code location, attributes, debug information. v0.2+
3 Message API Provide API to generate user friendly and json messages. v0.2+
4 Type API Provide API to retrieve type information (including layout) v0.2+
5 Semantically versioned API New versions of stable-mir follow semantic versioning (https://semver.org/). v1.0
6 MIR Dialects Support multiple MIR dialects. v1.0+
8 Stable channel stable-mir can be used with all toolchain channels including stable. v1.0+
9 Server/Client The stable-mir APIs are implemented over IPC. Tools no longer need to link against rust compiler library. v1.0+
10 Load SMir The compiler is able to load SMir definitions and compile it to the target platform. v1.0+
Select a repo