Coupled borrows: Modelling approximate aliasing to support modern Rust features for Prusti

# Coupled borrows: Modelling approximate aliasing to support modern Rust features for Prusti - Author: _Markus de Medeiros_ - Last modified: April 2023 # §0. Proposal (for UBC thesis) [proposal]: #proposal Prusti is a modular static analysis framework for Rust built atop the Viper verification engine. Central to the Prusti design philosophy is its zero-cost integration: by leveraging the strong safety guarantees of Rust’s type system Prusti can synthesize a core proof of a Rust program’s memory safety without any intervention from a verification expert. On its own, Prusti’s automatically generated core proof can verify important properties of Rust code such as panic-freedom, and Prusti users can optionally add additional specifications at function boundaries to extend their proof with stronger correctness facts. Prusti’s synthesis is designed around an intermediate representation of a Rust program called the place capability summary (PCS). However, the production version of Prusti does not explicitly compute the PCS, and this has led to challenges in synthesizing proofs for several common features of safe Rust. In the summer of 2022 we designed an intermediate language which allowed a refinement of the PCS to be computed in many simple cases. Equipped with our basic framework, we are now well positioned to model features of Rust's type system which Prusti does not currently support, and to tune aspects of the theory towards Prusti’s future development. This project aims to design sound representations of Rust features in the PCS as well as efficient algorithms for their construction. To this end we will first model the internal state of the Polonius^[1] borrow checker with affine permissions, in order to capture relationships between aliased borrows at the same level of granularity as the compiler. Thereafter we hope extend our PCS model to include - Reborrows inside Loops, which are unsupported in Prusti. - Multiple lifetime parameters in function calls, which are unsupported in Prusti. Some other Prusti features which this model may unblock include - Shared borrows, which are partly supported in Prusti. - Nested references, which are unsupported in Prusti and whose theory is not yet clear. We also hope to prove algorithmic properties of the PCS construction- It is believed that inferring a PCS for a loop body is not only decidable (untrue in languages such as C) but can be computed as a join of at most two iterations of the loop body. This proposal builds on previous work done as part of a USRA in the summer of 2022. The main results of this USRA are: - A permissions-oriented operational semantics for Rust’s intermediate language MIR. - Algorithms for elementary PCS inference which interface using real compiler data. # §1. Summary [summary]: #summary Rust programs satisfy the Polonius borrow checker when, among other criteria, borrow's lifetime is never ended when it possibly aliases a live place. Our program represents the possible live aliases of a borrow in a directed acyclic hypergraph we call the _coupling DAG_. To compute the possible aliases of a borrow, we first reconstruct semantic meaning for each instance of Polonius's ``subset_base`` relation (the compiler's representation of all relationships between borrows) at all program points. Our interpretation of the ``subset_base`` facts then drive a fixed point analysis to compute a trace of the coupling DAG through the program, which encodes aliasing relations between permissions at the same level of precision as Polonius. The coupling graph is lowered (simplified, and made more precise to accommodate the Viper backend) to obtain the _reborrowing DAG_ as described in the summer of 2022. We note that coupled borrows (borrows which are possibly aliases) are free to expire only when all of their associated loans exit the coupling graph; thus borrow expiries are regulated by the approximate Polonius analysis, but are executed with encoding-specific information at the level of the MicroMir. # §2. Motivation [motivation]: #motivation This project has three primary goals: 1. Add support for more advanced features of Rust in Prusti's core proof inference, such as shared borrows and magic wand (loop invariant) inference. 2. Implement the PCS in Prusti in order to clarify Prusti's dependencies on the compiler. 3. Provide a common framework for other tools to view borrow checking facts from a permissions perspective. # §3. Guide-level explanation (Draft) [guide-level-explanation]: #guide-level-explanation ## §3.0. Preliminaries This section summarizes the work we completed in the summer of 2020, and the compiler background for the remainder of our analysis. ### §3.0.0. (omitted; thesis only?) The Rust compiler and MIR terminology - Places - MovePaths - Liveness - ### §3.0.1. Fractional Permissions (todo: integrate https://hackmd.io/@be6mqvt0QwS_i18WLiSLBA/S1n9vY6Jn/edit) Our model extends the work by Astrauskas et al. [1] on modelling Rust with fractional linear resources. :::info **Definition** (Permissions). A **permission** for a piece of data `d` alongside one of - `(e d)`: Exclusive - `(s d)`: Shared A permission can also be **tagged** with information _t_, giving us three additional kinds of permissions: - `(e@t d)`: Tagged Exclsuive - `(u@t d)`: Tagged Uninit - `(s@t d)`: Tagged Shared Permissions for Rust places encode the read and write permissions to that place: | Permission | Write? | Read? | Can Alias? | | - | - | - | - | | `e` | :heavy_check_mark: | :heavy_check_mark: | None | | `s` | :x: | :heavy_check_mark: | `s` | | `u` | :heavy_check_mark: | :x: | None | | `e@l` | :x: | :x: | None | | `s@l` | :x: | :x: | `s@l` | | `u@l` | :x: | :x: | None | fixme: I've included `u@l` for completeness, though I'm yet unconvinced that it's meaningful in safe Rust (no tagged places are ever in the free PCS, and unlike `e@l` it cannot block loan expries as it does not correspond to a live variable). ::: As in the Prusti paper, we interpret permissions as affine resources in a fractional linear logic. :::info **Definition** (PCS). A **PCS** defined as a separating conjunction of permissions, where - ``e`` and ``u`` have permission of value 1 for their underlying resource - ``s`` has permission of value ε for its underlying resource - ``(p@x d) * (p@y d)`` for all permissions ``p``, resources ``d``, and tags ``x, y`` when ``x != y`` - ``(p@x d) * (p d)`` for all permissions ``p``, resources ``d``, and tags ``x``. (fixme: is this right and/or necessary to the current document?) ::: Our final major deviation from the permissions model is the introduction of subtyping rules :::info **Definition** (Permissions Subtyping `<:`). - ``e p <: u p`` and ``e@t p <: u@t p``. - ``e p <: s p`` and ``e@t p <: s@t p``. ::: Intuitively, ``e`` represents exclusive (read and write) permissions to a place, ``s`` represents shared permissions to a place, and ``u`` represents write-only permissions (as in, for places which are not yet initialized). The first subtype relationship is used to model the restriction that programs can only read definitely initialized places, and the second is used to model two-phase borrows as a downcast. ### §3.0.2. Owned Semantics and the Free PCS Our prior work assigns a Hoare triple of permissions to each statement in the MIR, where the underlying data is a MIR place, and where tags are Polonius ``point``s. At each Polonius ``Start`` location, we assert that the PCS must contain the permissions associated with the statement's precondition. Those permissions are removed from the PCS and replaced by the permissions postcondition at the ``Mid`` location. The same MIR place can be represented in many degrees of granularity. For example, exclusive permission to the place ``x`` (where ``x`` is a reference to a struct with two fields ``f`` and ``g``) can be represented as ``{e x}``, ``{e *x}``, and ``{e *x.f, e *x.g}``. As in the Prusti paper, we say a place can be **unpacked** (conversely, **packed**) to a set of places depending on type place's type. Two PCS's which can be transformed into each other by a sequence of ``pack`` and ``unpack`` statements are said to be **equivalent up to repacking**, and this is an equivalence relation on the collection of PCS's. We call the PCS trace of a program up to repacking the **Free PCS** (as in, a PCS which is free to repack itself). The central problem solved in our prior work is showing that the operational semantics we assign to the MIR can construct a PCS from a Free PCS by interleaving ``pack``, ``unpack``, and subtype upcasts between statements in the MIR. The Free PCS can also be queried to repack itself to include any particular set of permissions (with or without type casting) provided that such a repacking is possible. Alone, the operational semantics and trace free PCS is enough to construct a PCS trace for any Rust program which does not contain borrows. ### §3.0.3. Polonius Polonius is a borrow checker which computes non-lexical lifetimes by tracking aliasing between MIR places. The Rust compiler emits a collection of _base_ or _input facts_ which are iterated to a a fixed point and emitted as _output facts_. The relevant Polonius atoms are as follows - *point*: a ``Start`` and ``Mid`` associated with each MIR location. - *loan*: a unique identifier for each syntactic borrow statement. - *origin*: an identifier corresponding to a linear resource up to repacking. The linear resources are either MIR places, or a temporary value associated to each loan. Polonius interprets each origin as a set of loans, as tracked by the ``origin_contains_loan_at`` fact. The relevant Polonius facts include: - ``loan_issued_at``: a loan is issued into an origin, at a point. - ``subset_base``: the loans in one origin flow into another origin, at a point. - ``origin_live_at``: the origin may be used in the future from this point. - ``loan_invalidated_at``: the loan cannot be contained in any live origin at this point. - ``loan_killed_at``: this loan can be removed from all origins which contain it. Invalidations of this loan are now allowed. ## §3.1. Translation of Polonius Facts Our analysis is an abstract interpretation Polonius facts and MIR statements. The first step is to reconstruct Polonius ### §3.1.0. Origin Left Hand Sides While each origin is associated to either a function argument, borrow temporary, or MIR place up to repacking, Polonius does not expose this information. :::info **Definition** (Resource). A **resource** is one of - A Mir ``Place``, representing exclusive permission to that place, - A _Borrow Temporary_, equipped with a Polonius ``BorrowIndex``. - An _universal resource_ equipped with a universal origin. ::: Function arguments are not yet implemented, so we will ignore the last case for the time being. :::info **Definition** (Normalization). A resource ``r`` is **maximally packed** if - `r` is a borrow temporary, or - `r` is a Local, or - `unpack(parent(r)) != { r }`. All linear resources can be **normalized** to a maximally packed place, by repeatedly applying ``pack`` statements. ::: We posit that each origin can be assigned a unique, maximally packed resource. Note that this does not mention permissions or tagging; the resources associated to an origin do not operate at that level of granularity. ### §3.1.1. Structural Edges Our analysis also relies on a characterization of ``subset_base`` facts. :::info **Definition** (Structural Edge): A **structural edge** is one of - `Move(from: Resource, to: Resource)` - `Reborrow(from: Resource, to: Resource)` - `LoanIssue(Loan, from: Resource)` _Note: This definition comes from the prototype; expect more variants to be added._ ::: (The reason for the name _edge_ will be more clear once we explain the other kinds of edges and the _coupling graph_) Structural edges make up the operational description of the aliasing between borrows in Polonius. Our translation emits one structural edge for each ``subset_base`` fact, and a more detailed explanation of that inference can be found in section 4. As a motivating example, consider the reborrow: ```rust /* Location: Start(p) */ let x = &mut (*y) // loan ID: bw0 ``` where the prior step of our analysis assigned the following origin left-hand-sides: ``` #1: {e x} #2: {e y} #3: {e bw0} ``` We can translate the Polonius facts the compiler emits at this location into an operational description of this reborrow: ``` loan_issued_at(#3, bw0, p) // new loan is issued into #3 subset_base(#2, #3, p) // loans flow from y into the new borrow // Translation: bw0 reborrows from a place in the // left-hand side of #2. ~> Reborrow((e bw0), (e y)) subset_base(#3, #1, p) // Loans in new borrow flow into x // Translation: bw0 is assigned to a place in the // left hand side of #3. ~> Move((e bw0), (e x)) ``` As another example, the here is how the model constructs a borrow from owned data ```rust let x: &'1 mut T = &'3 mut z ``` Whose facts can be explained as the edges ``` loan_issued_at(#3, bw0, p) // new loan is issued into #3 // no other subsets of #3 // Translation: bw0 borrows owned data into // left-hand side of #2. ~> LoanIssue((e bw0), (e z)) subset_base(#3, #1, p) // Loans in new borrow flow into x // Translation: bw0 is assigned to a place in the // left hand side of #3. ~> Move((e bw0), (e x)) ``` These examples are deliberately simple; the fact that a loan was issued, was a reborrow, and got moved into `x` could have been deduced from MIR syntax alone. However, by ensuring that our analysis can explain all ``subset_base`` facts we ensure that we do not approximate any aliasing differently to Polonius. Assigning structs with borrows is one example where it is essential that we explain each subset fact. (fixme: include an example). ### §3.1.2. Packing requirements Recall that Polonius origins are associated to a place only up to repacking. We will see that each origin's _left hand side_ is a Free PCS: a linear conjunction of resources which can be repacked to obtain any place in it's memory footprint. Some ``subset_base`` facts require an origin's left hand side to contain a particular place. For example, the reborrow ``let x = &mut (*y)`` requires the reborrowed-from origin to contain ``*y`` whereas the borrow assignment ``let x = y`` requires the moved-from origin to contain ``y``. To ensure that our analysis meets these requirements, we encode them as a ``packing_constraint`` fact wherever needed (see section 4 for details). ### §3.1.2. Other Facts Out analysis also requires some other Polonius facts, which can be directly lifted from the compiler output: - ``origin_contains_loan_at`` - ``loan_killed_at`` ## §3.2. Analysis: The Coupling Graph Once we have translated all of the Polonius input facts, we can move on to interpreting the output. ### §3.2.0. Coupled Borrows _Coupling_ is a term we use to describe the phenomenon of approximate aliasing lengthening the lifetimes of borrows. :::info **Definition** (Coupling). Two borrows are **coupled** when the borrow checker cannot prove that they are not aliased by the same place at compile time. ::: Since the compiler only expires borrows when their aliases are definitely dead each origin's liveness may extend the lifetimes of several coupled borrows, despite only aliasing one borrow at runtime. ### §3.2.1. The Coupling Graph The purpose of the coupling graph is to explain the ``origin_contains_loan_at`` fact in terms of resources. :::info **Definition** (The model-level, or mathematical Coupling Graph). The **Coupling Graph** is a directed, acyclic, hypergraph whose vertices are resources and whose edges are annotated with - Any structural edge, - `Coupled(pre: from: Set<Resource>, loans: Set<Loan>, post: Set<Resource>)`, - ``Pack(from: Set<Resource>, to: Resource)``, - ``Unpack(from: Resource, to: Set<Resource>)``. The edges in the coupling graph are related to their annotations. In particular | Edge annotation | From vertices | To vertices | |-----------------|---------------|-------------| | ``Move(from, to)`` | ``{ from }`` | ``{ to }`` | | ``Reborrow(from, to)`` | ``{ from }`` | ``{ to }`` | | ``LoanIssue(loan, from)`` | ``{ from }`` | ``{ e loan }`` | | ``Pack(from, to)`` | ``from`` | ``{ to }`` | | ``Unpack(from, to)`` | ``{ from }`` | ``to`` | | ``Coupled(pre, _, post)`` | ``pre`` | ``post`` | ::: The coupling graph in the form above should be read a mathematical object, and is not implemented that way in code. This is because our collected facts define a collection of _subgraphs_ of the coupling graph; the coupling graph itself is the smallest graph which contains all of the subsets. In order to maintain this relationship we implement the graph as follows: :::info **Definition** (The Coupling Graph, in implementation). The **Coupling Graph** is a map from _origins_ to _sets of edges_. ::: Edges in the coupling graph implementation are for the most part represented the same as in the mathematical coupling graph. The exception is coupled edges: which are implemented as several edges split based on left-hand side. :::info **Definition** (Coupling graph edges). The edges in the implemented coupling graph are one of - Any model-level coupling graph edge except ``Coupled`` - ``Opaque(lhs: Set<Resource>, coupled: CoupledBorrowIndex)``. Each opaque edge with the same ``CoupledBorrowIndex`` refers to the same ``coupled`` edge in the model-level graph, whose left-hand side is a union of the ``lhs`` for each opaque edge. The right-hand side of an opaque edge is the right-hand side of ``coupled``. The analysis keeps track of the edges associated to each ``CoupledBorrowIndex`` in a separate data structure alongside the graph itself. ::: As a concrete example, the coupled edge ``` cb1 = Coupled({e x, e y}, {bw0, bw1}, {e w, e z}) ``` is represented as two separate edges belonging to two separate origins ``` #1: ..., Opaque({e x}, cb1), ... #2: ..., Opaque({e y}, cb1), ... where #1 is the origin with left-hand-side x, and #2 is the origin with left-hand-side y. ``` An edge is allowed to expire only when all it's instances are eliminated from the coupling graph; representing coupled edges with several edges is a simple and compositional way to ensure ``cb1`` expires only when origins `#1` and `#2` are both dead. Each edge has a pre- and post-condition corresponding to the to- or from-vertices outlined above (with the exception of ``Coupled(pre, _)`` whose precondition is ``pre``). Because the coupling graph is acyclic, and the order of edges in an origin is uniquely defined, we can derive a pre- and post-condition for expriring an entire origin: :::info **Definition** (Origin signature). An origin $o$ which has edge set $S$ has a pre- and post-condition: $$ \textrm{pre}(o) := \left(\cup_{e \in S} \; \textrm{pre}(e)\right) - \left(\cup_{e \in S} \; \textrm{post}(e)\right) \\ \textrm{post}(o) := \left(\cup_{e \in S} \; \textrm{post}(e)\right) - \left(\cup_{e \in S} \; \textrm{pre}(e)\right) $$ The Hoare triple ``(pre (o)) o (post (o))`` is called the **signature** of the origin, and corresponds to the permissions required to expire that entire origin at once. ::: For the remainder of section 3, the _coupling graph_ will refer to the implementation of the coupling graph unless otherwise specified. ### §3.2.2. Straight Line Interpretation with the Coupling Graph First, we describe how the coupling graph is used on sections of code without branches or joins. #### §3.2.2.0. Interfacing with the Free PCS Modifications of the coupling graph can require that the Free PCS inhale or exhale certain permissions at each program point. The interleaving between the Free PCS and the coupling graph is governed by two principles. :::info **Principle 1** (Origin / Free PCS coherence). The untagged places in the left-hand sides of origins in the coupling graph must be represented exactly in the free PCS. - When the free PCS repacks a reference-typed place, the corresponding edges must be added to the coupling graph. - When the coupling graph repacks the left-hand side of an origin (ie. due to ``origin_packing_at`` facts) the free PCS must also do the same repacking. ::: The reborrowing graph changes in other ways as well, as governed by the ``subset_base`` facts and ``origin_live_at`` facts. For these situations: :::info **Principle 2** (Change in Vertices). The coupling graph only requests a resource from the free PCS (via a precondition) when it is not already a vertex in the graph. The coupling graph releases a resource to the free PCS (via a postcondition) only when that resource is no longer a vertex in the graph. ::: For example, the expiry of an origin with signature ``{e x} --* {e y}`` exhales ``e x`` from the free PCS and inhales ``e y``, presuming those vertices appear nowhere else in the coupling graph. If either ``e x`` or ``e y`` appear elsewhere in the coupling graph, we do not apply those inhales or exhales. The operational semantics for owned places is governed by the rules we outlined last summer. ##### Shared Borrows _Note: The following section has not been prototyped._ Note that the coupling graph does not make any special delineation between vertices representing shared and exclusive permissions. Besides coupling, this nonlinearity can be used to model shared borrows: ```rust let z: T = ...; // PCS: { e z } = { s z, u z } // Graph: [] let x: &'1 T = &z; // PCS: { s x, s z } // Graph : [ #1: { s x } --* { u z }] let y: &'2 T = &z; // PCS: { s x, s y, s z } // Graph : [ #1: { s x } --* { u z }; #2: { s y } --* { u z }] drop(x); // PCS: { s y, s z } // Graph : [{ s y } --* { u z }] drop(y); // PCS: { u z, s z } = { e z } // Graph : [] ``` #### §3.2.2.1. Interfacing with Backends Several backends (principally, Prusti) interpret the removal of edges from the coupling graph by emitting a sequence of annotations in their own language. Those annotations may include expiring borrows, moving references, or applying magic wands. When a backend interprets an origin expiry in the coupling graph, a similar principle to the free PCS applies: :::info **Guideline:** A backend should only emit annotations related to an edge expiry when that edge is removed from the _mathematical_ coupling graph. A edge in the _mathematical_ coupling graph possibly corresponds to several edges in the _implemented_ coupling graph. ::: Following this guideline is most essential for coupled edges, whose representation in the implemented coupling graph consists of several distinct edges. This guideline ensures a borrow is not expired while it possibly is blocked by a live place. ### §3.2.3. Coupling Inference, and Coupling at Joins Polonius makes a simple approximation at all join points in the control flow graph: live origins contain all loans they could have possibly contained up to the join. Our simplest model interprets all join points similarly: taking a complete approximation at every join point (section **§3.2.5.** will elaborate on this). :::info **Procedure:** (Join point coupling). To compute the **join** of a family of coupling graphs $G = \{ G_1, \ldots, G_n\}$ is a coupling graph $H$, as constructed by the following: 1. Invariant: the set of live origins in each $G_i$ will be the same. Denote the set of live origins as $O$. 2. Invariant: the set of loans in each graph is the same as well. Denote this set $B$. 3. Construct the finest partition $P = \{P_1, \ldots, P_k\}$ of $O$ such that for all $o_a, o_b \in O$, and $G_a, G_b \in G$, if any loan $l$ is contained in $o_a$ in $G_a$ and $o_b$ in $G_b$ then $o_a$ and $o_b$ are in the same partite set. - When we approximate all join points, this partition should be correspond to the ``origin_contains_loan_at`` fact after the join. However, as we will see in section **§3.2.5.**, we will not always want the coupling algorithm to refer to facts at particular locations. 4. For each $P_i \in P$: 4.1. For each origin $o \in P_i$, find a common repacking $\textrm{lhs}(o)$ for the origin in each input graph. Emit annotations to the free PCS's before the join to repack to these left-hand sides. Let $L_i = \cup_{o' \in P_i} \; \textrm{lhs}(o')$. 4.2. Collect the set $B_i$ of loans contained in any origin of $P_i$ in any graph of $G$. 4.3. Invariant: The family $\{B_i \,|\, P_i \in P\}$ is a partition of $B$. 4.4. Collect the union of all right-hand sides for origins $o \in P_i$ over graphs in $G$ as $R_i$. 4.5. Construct the following origins in $H$: ``` cbi = Coupled(Li, Bi, Ri) for each o' in Pi: H(o') = { Opaque(lhs(o'), cbi) } ``` 4.6. Emit annotations (eg. a magic wand) to fulfill the coupled borrows in each input graph. A minor detail: the ``CoupledBorrowIndex``es may need to be renamed in $H$ if their indices clash between graphs in $G$. ::: The above algorithm mentions _emitting annotations_ at several points. These should be considered as instructions for various backends to do only after a fixed point of coupling graphs is found; the exact location these annotations can depend on the backend itself. The model can also be parameterized by how the model handles flow-sensitivity, which may be useful for backends which require different levels of approximation. The algorithm is written under the presumption that all origins should be coupled, however at the start of step 4 we can also choose not to couple together partitions under certain conditions. One obvious situation in which not to couple not to do so is if $P_i = \{o'\}$ and $G_j(o')$ is the same for all $G_j \in G$. Flow-sensitive coupling is discussed in **§3.2.5.** Choosing which partitions to create coupled edges for depends on the level of additional detail the coupling graph is expected to provide a backend and can be changed without breaking the rest of the model. ### §3.2.4. Coupling graph invariants The coupling graph has several invariants related to Polonius facts: - **Subset Invariant**: Each origin is associated to a subset of the coupling graph at each point; if ``subset(O1, O2, P)``, then the edges associated with ``O1`` are a subset of the edges associated with ``O2`` at ``P``. - **Coherent Origins Invariant**: The leaves of any origin's graph must always be equivalent to the origin's left hand side, up to repacking and unkilling. - **Free PCS Invariant**: All unkilled vertices in the coupling graph are contained in the free PCS, and are packed the same. All borrow-typed places in the free PCS are unkilled vertices in the coupling graph. - **Eager Expiry Invariant**: The coupling graph is completely covered by the graphs associated with each live origin at that point. - **Reconstruction Invariant**: Each origin's set of loans can be unambiguously reconstructed into a subgraph of the coupling graph. - (fixme) this invariant is not actually necessary, if we reimplement the graph to keep a tree instead of a set of edges. - **Loan liveness Invariant**: At the fixed point (computed below), the set of loans ``{ L | origin_contains_loan_at(O, L, P)}`` equals the set of loans ``O`` contains in the graph. An origin contains a loan when the subgraph associated with that loan contains - a loan node for ``L``, or - a coupled loan edge whose annotation includes ``L``. Additionally, our analysis has the following desirable property: - **Convergence**: If a location ``l`` is dominated by $n$ loop heads whose in-degrees are $d_1, d_2, \ldots, d_n$, then the coupling graph can be computed with at most $1 + \prod_{i=1}^n d_i$ traversals of ``l``. This is due to a form of idempotency and monotonicity in the approximation, but I have not yet formalized that. ### §3.2.5. Subgraph Sharing and Flow Dependence The above representation of the coupling graph is an sound representation of the Polonius facts at each point in terms of permissions. However it has several drawbacks, including - approximations at all join points, even when flow-dependent information is statically knowable (for example at the join of an ``if`` statement) - excessive duplication of subset edges in their superset origins. Both are drawbacks that Polonius is facing, and because of our model's close relationship to Polonius we can alter the model to solve both at once. Interestingly, it turns out that these issues are also issues are related. #### §3.2.5.0. Subgraph Sharing First, we extend the simple model as follows :::info **Definition** (Flow-dependent Coupling Graph): Label every edge between basic blocks ``bbi`` and ``bbj`` in the MIR with a unique **Flow Identifier** ``bbi->bbj``. For each basic block ``bbx``, let **``preds(bbx)``** be the set of flow identifiers for each in-edge to ``bbx`` in the control flow graph. A **Flow Origin** is one of - An origin ``o``, or - a **dependent origin** ``o~j``, for any flow identifier ``j``, or - A **killed flow origin** ``f@l``, for any flow origin ``f`` and location ``l``. Finally, we extend the edges in the coupling graph to include - ``Subgraph(f)`` for any flow origin ``f``. The pre- and post-conditions of this edge are the pre- and post- conditions of the signature of ``f`` The **Flow-Dependent Coupling Graph** is a map from **Flow-Origins** to sets of edges. ::: Additionally we enforce that, for any origin ``o``, if the graph contains dependent origins ``o~j1, ..., o~jn`` then it also contains an origin ``o`` which is the join of ``o~j1, ..., o~jn``. This join does not need to be computed eagerly. A flow-dependent coupling graph can be interpreted as follows: - A killed flow origin is not considered live. When an origin expires, it can either be removed from the graph or killed at that location, depending on if a shared edge in an unkilled origin refers to it. - The graph must still be a DAG, subgraph cycles are not allowed. - New edges due to ``subset_base`` facts can only be added to regular origins, not killed or dependent origins. Clearly, the flow-dependent coupling graph subsumes the formulation given in **§3.2.1.** However, the flow-dependent coupling graph implements a form of _subgraph sharing_. As an example, here is how a (simplified) dependent coupling graph might look for a chain of reborrows ```rust // PCS: { e d } // Graph: [] let x: &'1 mut T = &mut y; // PCS: { e x } // Graph: [ // #1: { e x } -...bw0...-> { e d } // ] let y: &'2 mut T = &mut (*x); // PCS: { e y } // Graph: [ // #1: { e *x } -unpack-> ... bw0 ... -> { e d } // #2: { e y } -move-> { e *x } -Shared(#1)-> { e d } // ] ``` Note that none of the annotations are duplicated in origin ``#2``. #### §3.2.5.1. Persistent and Instantaneous effects In [an issue in the Polonius repo](https://github.com/rust-lang/polonius/issues/107#issuecomment-499427026) Niko discusses a deficiency in the current Polonius model: :::info **Definition** (Instantaneous and Deferred subsets): An **instantaneous subset** is a ``subset_base`` fact which is meant to enforce all loans in one origin be included in another subset at a point. A **deferred subset** is a subset constraint between origins which should be propagated forward until the origin expires. ::: Polonius currently does not have instantaneous subsets, and as mentioned in the issue, adding them may involve adding flow-sensitivity to the analysis. One major feature of the flow-dependent coupling graph is that it can can caputure the difference between instantaneous and deferred subsets precisely: - To interpret an instantaneous subset, copy edges from the subset origin into the superset origin. - To interpret a deferred subset, add a ``Shared`` edge of the subset origin in the superset origin. ##### §3.2.5.1.0. Caveat and speculation The Polonius interpretation of how ``outlives`` constraints are interpreted as subsets and propagated through the control flow graph are not decided. There are some situations where Polonius may want to break the our permissions-based interpretation to impose constraints between origins. For example: Polonius might enforce two lifetimes be equal with two deferred subsets ``#a <: #b <: #a``, even though cycles are meaningless in terms of permissions and irrepresentable in the coupling graph. While shared subgraphs are possibly a good model of _some_ deferred subsets _right now_, interpreting all constraints between origins in this manner is not necessary for the future. Our analysis is permitted to ignore subsets without a permissions-based interpretation, provided we can still treat origin liveness as an oracle. #### §3.2.5.2. Gradual Flow Dependency for Free Instead of fully approximating at each join point, in the flow-dependent coupling graph it is easy to propagate forward flow-dependent information inside dependent origins. Since we also propagate the joined version of dependent origins, backends can interpret coupled The following assumption is true under current Polonius, and we suspect that it will remain so in the future: :::warning **Assumption**: Origins do not have flow-dependent: - Signatures - Contained loans - Origin liveness To modify either of these would require a major rework of Polonius, and it would likely make borrow checking undecidable in many cases. ::: Under these assumptions, ``Shared`` edges never target flow-dependent origins. Under these assumptions, the graph sharing implementation can use a linear number of origins instead of the exponential number required to track all possible control flow paths. Consider the following heavily abbreviated example: ```rust // bb1: // Graph: [ // #1: ... // #2: ... // ] if ... {... /* bb1: #2 does not reborrow #1 */ } else if ... { /* bb2: #2 instantaneously shares from #1 */ } else { /* bb3: #2 deferred shares from #1 */} // bb4: // Graph: [ // #1: ... (unchanged) // #2~bb1->bb4: ... (unchanged) // #2~bb2->bb4: ... (includes old #1 edges) // #2~bb3->bb4: ... -Shared(#1)-> ... // #2: { ... } -Opaque(...)-> { ... } // ] let mut x: &'3 mut T = ...; // Graph: [ // #1: ... (unchanged) // #2~bb1->bb4: ... (unchanged) // #2~bb2->bb4: ... (includes old #1 edges) // #2~bb3->bb4: ... -Shared(#1)-> ... // #2: { ... } -Opaque(...)-> { ... } // #3: ... // ] if ... {... /* bb5: #3 does not reborrow #2 */ } else if ... { /* bb6: #3 instantaneously shares from #2 */ } else { /* bb7: #3 deferred shares from #2 */} // bb8: // Graph: [ // #1: ... (unchanged) // #2~bb1->bb4: ... (unchanged) // #2~bb2->bb4: ... (includes old #1 edges) // #2~bb3->bb4: ... -Shared(#1)-> ... // #2: { ... } -Opaque(...)-> { ... } // #3~bb1->bb4: ... (unchanged) // #3~bb2->bb4: ... (includes old #2 edges) // #3~bb3->bb4: ... -Shared(#2)-> ... // #3: { ... } -Opaque(...)-> { ... } // ] ``` Because Polonius does only chains approximate behavior (it does not scrutinize opaque edges) we can soundly represent all $3^2$ control flow paths through this program in $C + 3 + 3$ origins. Note that the kind of subset does not change the signature of the origin and thus does not change the opaque edge in ``#2``. Thus, the flow-dependency that Niko expects to add can _probably_ be represented in this model. See the caveat in §3.2.5.1.0. ##### (for Markus) how to write the section: Explain the proposal as if it was already included in the language and you were teaching it to another Rust programmer. That generally means: - Introducing new named concepts. - Explaining the feature largely in terms of examples. - Explaining how Rust programmers should *think* about the feature, and how it should impact the way they use Rust. It should explain the impact as concretely as possible. - If applicable, provide sample error messages, deprecation warnings, or migration guidance. - If applicable, describe the differences between teaching this to existing Rust programmers and new Rust programmers. - Discuss how this impacts the ability to read, understand, and maintain Rust code. Code is read and modified far more often than written; will the proposed feature make code easier to maintain? For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms. # §4 (Not finished) Reference-level explanation [reference-level-explanation]: #reference-level-explanation Note to Markus: Chat about MicroMir. Program with a bunch of overlays First, here is an architecture diagram as the prototype is being implemented now: ![](https://i.imgur.com/dkZQML7.png) In the diagram: - ovals represent inputs and outputs - rectangles represent passes, each of which is a pointwise trace of some struct through a program - parallelograms represent steps we impose a semantics on the program - dashed lines represent possible extensions to the system ### Statement Granularity Each pass can be represented as a map from a _location_ to some struct. The meaning of _location_ changes at several points in our system: 1. MIR Locations (``rustc_middle::mir::Location``). Represented as a basic block, and statement index. Passes: - MIR/SMIR input 2. Rich Locations (``rustc_borrowck::consumers::RichLocation``). Polinius-level locations: a ``Start`` and ``Mid`` locaiton for each MIR location. Passes: - Polonius input - Coupling Graph - FactTable 3. MicroMir steps. MicroMir steps are finer than RichLocations, and only exist between ``Start`` and ``Mid`` locations. Passes: - MicroMir - Free PCS - Reborrowing DAG (see: Elaboration) - Interleave - Annotation Semantics 4. Operational steps for borrows. We interpret ``subset_base`` facts as an (ordered) sequence of steps. These steps may or may not coincide with MicroMir steps, and are finer than RichLocations. - Operational description of ``subset_base`` _Note on Granularity:_ Part of the reason there are so many levels of granularity is because our fixed point analysis only operates on the RichLocation level. Provided - We rewrite the coupling graph calculuation to be a fixed point analysis at the micromir level (not yet possible in the analysis crate), and - The MicroMir is designed to accommodate all``subset_base`` facts a simpler architecture is possible: ![](https://i.imgur.com/KxQQUqv.png) The interleaving step in this architecture should also be fairly simple, as the hoare triples emitted by the coupling graph correspond exactly to the sequence of annotations at that point. The cost of refactoring the analysis crate to this structure is too great for the prototype, but I advocate for this design in the final version. ``` x = &mut a y = &mut b Coupling graph #1: x -> a #2: y -> b if ... { tmp = y y = x x = tmp } Coupling graph #1: {x, y} --Coupled--> {a, b} #2: {x, y} --Coupled--> {a, b} Reborrowing DAG Condition 1: x -> a y -> b Condition 2: x -> b y -> a ... test_y = y test_x = x Coupling graph hoare triple {x, y} ... {a, b} Reborrowing DAG {x, y} if condition 1 ... else ... {a, b} ``` ### §4.?.? Implementation: Interleaved translation of Polonius Facts (WIP): Unfortunately, the current prototype does not implement the above translation in such clean and distinct steps. The current prototype translates - If ``loan_issued_at(O, P, I)`` - ``O`` has Left Hand Side ``I``. - If ``subset_base(F, T)`` at a mir assignment, where the normalized assigned-from place has origin ``T``. - The normalized assigned-to place has origin ``F``. - ``subset_base(R, F)`` at a mir assignment, where the assigned-from place has origin ``F`` - The subset is a ``Reborrow`` structural edge. #### (WIP) This is the technical portion of the RFC. Explain the design in sufficient detail that: - Its interaction with other features is clear. - It is reasonably clear how the feature would be implemented. - Corner cases are dissected by example. The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. # §5. Drawbacks (WIP) [drawbacks]: #drawbacks Why should we *not* do this? # §6. Rationale and alternatives (WIP) [rationale-and-alternatives]: #rationale-and-alternatives - Why is this design the best in the space of possible designs? - What other designs have been considered and what is the rationale for not choosing them? - What is the impact of not doing this? - If this is a language proposal, could this be done in a library or macro instead? Does the proposed change make Rust code easier or harder to read, understand, and maintain? # §7. Prior art (WIP) [prior-art]: #prior-art Discuss prior art, both the good and the bad, in relation to this proposal. A few examples of what this can include are: - For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? - For community proposals: Is this done by some other community and what were their experiences with it? - For other teams: What lessons can we learn from what other communities have done here? - Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. Please also take into consideration that rust sometimes intentionally diverges from common language features. # §8. Unresolved questions (WIP) [unresolved-questions]: #unresolved-questions - What parts of the design do you expect to resolve through the RFC process before this gets merged? - What parts of the design do you expect to resolve through the implementation of this feature before stabilization? - What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? # §9. Future possibilities (WIP) [future-possibilities]: #future-possibilities Think about what the natural extension and evolution of your proposal would be and how it would affect the language and project as a whole in a holistic way. Try to use this section as a tool to more fully consider all possible interactions with the project and language in your proposal. Also consider how this all fits into the roadmap for the project and of the relevant sub-team. This is also a good place to "dump ideas", if they are out of scope for the RFC you are writing but otherwise related. If you have tried and cannot think of any future possibilities, you may simply state that you cannot think of anything. Note that having something written down in the future-possibilities section is not a reason to accept the current or a future RFC; such notes should be in the section on motivation or rationale in this or subsequent RFCs. The section merely provides additional information. # §10. Citations 1. [An alias-based formulation of the borrow checker](http://smallcultfollowing.com/babysteps/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/). Niko Matsakis, April 2018. 2. V. Astrauskas, P. Müller, F. Poli, and A. J. Summers. Leveraging Rust Types for Modular Specification and Verification. Object-Oriented Programming Systems, Languages, and Applications (OOPSLA) 2019, ACM, 147:1–147:30. 2019. [pdf](https://www.cs.ubc.ca/~alexsumm/papers/AstrauskasMuellerPoliSummers19.pdf) # §A1. Example program traces #### Shared Borrows ([Link to PCS Trace](https://hackmd.io/@be6mqvt0QwS_i18WLiSLBA/rJAAPt50o/edit)) ```rust struct T {} fn main() { let mut t = T {}; let x = &t; let y = &t; let tx = x; let ty = x; let tty = ty; let ttx = tx; t = T {}; } ``` #### Shared Borrows with Owned Structs ([Link to PCS Trace](https://hackmd.io/@be6mqvt0QwS_i18WLiSLBA/B18IMPpAs/edit)) ```rust #[derive(Copy, Clone)] struct T {} #[derive(Copy, Clone)] struct S {} #[derive(Copy, Clone)] struct Z {f: T, g: S} fn main() { let mut x: Z = Z {f: T {}, g: S {}}; let y = &x.g; let w = x; // copy x.f = T {}; let y_live = y; } ``` #### Borrow in a struct ([Link to PCS Trace](https://hackmd.io/@be6mqvt0QwS_i18WLiSLBA/SktSax7yh/edit)) ```rust struct T {} struct S<'a, 'b> { f: T, g: &'a mut T, h: &'b mut T } fn main() { // tagging of structs with borrows let mut d1 = T {}; let mut d2 = T {}; let mut d3 = T {}; let mut d4 = T {}; let mut x = S { f: T {}, g: &mut d1, h: &mut d2}; // borrow the struct let w = &mut x; // reborrow from the struct let z = &mut ((*w).g); // kill the borrow let x = S { f: T {}, g: &mut d3, h: &mut d4}; // ensure z is still live. let z_test = z; let x_test = x; } ``` #### Function call ([Link to Partial PCS Trace](https://hackmd.io/@be6mqvt0QwS_i18WLiSLBA/rJNrMiQy2/edit)) ```rust fn f<'a, 'b>(x: T, y: &'a mut T) -> &'b mut T { unimplemented!(); } fn main() { let mut d = T {}; let mut bd = &mut d; let mut x = f(T{}, bd); let z = x; } ``` #### Function call with annotation ([Link to Parital PCS Trace](https://hackmd.io/@be6mqvt0QwS_i18WLiSLBA/Sk1OdoQ1n/edit)) ```rust struct T {} fn f<'b, 'a : 'b>(x: T, y: &'a mut T) -> &'b mut T { unimplemented!(); } fn main() { let mut d = T {}; let mut bd = &mut d; let mut x = f(T{}, bd); let z = x; } ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.