# Pre-RFC: Rust Has Provenance - Feature Name: rust_has_provenance - Start Date: 2023-11-22 - RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary [summary]: #summary Pointers in Rust have **two** components. * The pointer's "address" says where in memory the pointer is currently pointing. * The pointer's "provenance" says where in memory the pointer is allowed to access. Accessing memory using a pointer with incorrect provenance causes Undefined Behavior (UB), regardless of the address value of the pointer. Most of the rest of the details, such as a specific provenance model, are delibertely left unspecified. This RFC very deliberately aims to be as **minimal** as possible, to just get the entire Rust Project on the "same page" about the long-term future development of the language. **Note:** There was previously a [lang team meeting](https://github.com/rust-lang/lang-team/blob/master/design-meeting-minutes/2022-10-05-provenance.md) on this subject. The most relevant parts of those meeting notes will be copied fairly directly into this RFC. # Motivation [motivation]: #motivation ## Optimizations Many (most?) optimizations done by compilers require some form of *alias analysis*. This is an analysis that reports when two memory operations might alias each other. Alias analysis benefits greatly from notions of provenance since this generally means there is more UB and more information with which to justify optimizations. For example, without provenance, the following program must be DB as the two marked lines are entirely equivalent. This means it is unsound to optimize to `print(0)`. However, with allocation level provenance it is possible to call this program UB (i.e., it is possible to declare that `q` and `p+1` do not alias) without the commented line being UB, and so the optimization can be permitted. ```c= char p[1], q[1] = {0}; uintptr_t ip = (uintptr_t)(p+1); uintptr_t iq = (uintptr_t)q; if (iq == ip) { *(p+1) = 10; // <-- This line // *q = 10; // <-- And this line print(q[0]); // can be optimized only with provenance } ``` Similarly, it has long been desirable for it to be sound to optimize code like this: ```rust= fn foo(x: &mut i32) -> i32 { *x = 10; bar(); *x } ``` It's very difficult to see how to make this optimization sound without provenance. Ralf J. has [attempted](https://www.ralfj.de/blog/2017/07/17/types-as-contracts.html) such a model in the past, but it was unsuccessful in a number of ways. ## LLVM LLVM IR (despite its lack of a clear spec) recognizes a notion of allocation level provenance. Compiling Rust to LLVM IR if Rust does not recognize provenance is likely to be impossible. We'd probably have to insert a `black_box` after every allocation and every memory access, and it's not clear that that is enough. As far as I know there is no option to turn this off, and the assumptions are sufficiently widespread that it is unlikely that we could convince upstream to add one. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation No specific guide-level changes to the Rust Book or standard library documentation are intended at this time. This isn't as big a deal as that sounds, since provenance is not an issue that ever needs to be considered within Safe Rust code. Unsafe Rust programmers will, in the future, use various # Reference-level explanation [reference-level-explanation]: #reference-level-explanation Within the Rust Reference, the "Behavior considered undefined" page has a bullet point > Evaluating a dereference expression (`*expr`) on a raw pointer that is dangling or unaligned, even in place expression context (e.g. `addr_of!(*expr)`). Instead: > Evaluating a dereference expression (`*expr`) on a raw pointer that lacks proper provenance or that is unaligned, even in place expression context (e.g. `addr_of!(*expr)`). # Drawbacks [drawbacks]: #drawbacks The biggest downside of provenance is complexity. The existence of provenance means that authors of unsafe code must always not only be concerned with whether the pointer they have points to the right place, but also whether it has the right provenance (in practice, this means "was obtained the right way"). Not having provenance ensures that this is never a problem - all pointers that point to the right address are equally valid to use. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives All reasonably usable compiler backends use *some form* of provenance logic when optimizing code. There essentially are not alternatives to having provenance in some form. # Prior art [prior-art]: #prior-art * "[A Provenance-aware Memory Object Model for C](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3005.pdf)" describes how the C standard is attempting to fit provenance concepts into C. # Unresolved questions [unresolved-questions]: #unresolved-questions All the particulars about the exact provenance model are largely still undetermined. The appropriate standard library API functions to let programmers correctly work with provenance would depend on deciding more # Future possibilities [future-possibilities]: #future-possibilities Future RFCs will increase the specificity of how provenance works in Rust, including recommendations of specific standard library APIs to help correctly work with provenance.