owned this note
owned this note
Published
Linked with GitHub
# Const Interning and Interior mutability
## Const interning
"Interning" of a const refers to the process of taking the final result of const evaluation and moving it into the global compiler state (`tcx`). For instance:
```rust
const C: [u8; 256] = {
let mut v = [0u8; 256];
v[16] = 1;
v
};
```
When the constant is evaluated, we create an interpreter instance, and in that interpreter we generate an allocation of 256 bytes to hold the return value of the const initializer.
This is mutable state, which is crucial since the initializer can mutate its local memory like normal Rust code can.
When evaluation is done, we take the final state of the return value and copy it over to `tcx`, where it becomes a global allocation that is eventually passed to LLVM.
(For consts of very simple "scalar" types, we avoid creating a global allocation and just store their value. This is an optimization and not relevant for this discussion.
In some of the examples below, we use pairs to avoid triggering this optimization.)
However, some consts lead to more than one allocation.
For instance:
```rust
const C1: (&i32, i32) = (&0, 0);
const C2: (&Vec<i32>, i32) = (&Vec::new(), 0);
```
Both of these constants in the end involve *two* global allocations: one storing the pair, and one storing the `i32`/`Vec<i32>` that the first element of the pair points to.
We call the former the *base* allocation ("root" would also be a good term) and the latter a *nested* allocation.
There are actually *two* separate and largely independent mechanisms that can lead to such nested allocations: promotion and the "outer scope"/"tail expression" rule.
#### Promotion
`C1` above makes use of promotion.
The `&0` is moved to a separate fresh constant (it's not a normal constant item, promoteds are their own special thing) and evaluated there; the resulting constant is then simply referenced from `C1`.
#### Outer scope / tail expression
`C2` cannot make use of promotion since `Vec` has a destructor.
So usually, this should fail to borrow-check since the borrow of `Vec` is too short.
However, Rust has a special rule for borrows tail expressions:
they get placed into the outer scope. This allows e.g.
```rust
let v: (&Vec<i32>, i32) = {
let x = 5;
(&Vec::new(), x)
};
dbg!(v.0); // the vector is live here!
```
This rule works even for tail expressions in `const`/`static`, in which case the "outer scope" is the program itself -- basically this creates new anonymous globals.
(The fact that this works has caused a long stream of pain and bugs. But there's enough code out there relying on this that we can't realistically take it back.)
## Const semantics
Our intended const semantics are that they behave as-if the expression was inlined everywhere that `M` is used.
For consts that don't have any pointers in their value, this is trivially true -- we just compute the result once, store it in a global allocation (the "base" allocation), and make a copy each time the constant is used.
Note that `const` items are values (in contrast to `static` items which are places), i.e., the only way they will ever be referenced in Rust code is by making a copy of their contents.
However, with nested allocations, we have to be careful.
Now instead of creating a new nested allocation each time the expression is evaluated (as would be the case if `(&Vec::new(), 0)` got inlined), only a single global allocation is used and shared everywhere.
This is observable solely via `ptr::eq`, so we hand-wave that pointer identity around consts (and promoteds) is not guaranteed.
## Mutability concerns
Given this situation with nested allocations, we have to be very careful to not allow code like this:
```rust
const M: &'static mut i32 = &mut 0;
```
The "outer scope" rule would accept this code.
But if we do our usual copy of the final const value everywhere, then this does not match the intended const semantics.
We would create a single global allocation to store this `0`, and mutating through any copy of `M` would always mutate the same global state!
(Aside from violating the intended semantics, this would also be completely unsound due to data races.)
We therefore want to be quite sure that the final value of a `const` does not contain any pointer that permits mutation.
Only read-only pointers are allowed, i.e., pointers where writing through them would be UB anyway.
To this end, const-eval tracks a bit for each pointer (stored in the pointer provenance) saying whether this pointer allows mutation.
When creating a shared reference to a `Freeze` type, this bit gets set, and it is then preserved by all pointers derived from this shared reference.
When interning the final value of a const, if we find any pointer that is *not* immutable, we raise an error.
That is exactly the error that shows up in the [regression](https://github.com/rust-lang/rust/issues/121610).
### Immutable values vs mutable types
In addition to the outer scope rule, we also have a value based analysis that checks whether a constant contains mutable memory by looking at the actual value instead of just checking for `Freeze`. This caused us to accept
```rust
use std::cell::Cell;
pub struct Foo(Option<Cell<bool>>);
impl ::std::ops::Drop for Foo {
fn drop(&mut self) {}
}
const NONE: &Foo = &Foo(None);
```
Also works without drop by avoiding promotion
```rust
use std::cell::Cell;
pub struct Foo(Option<Cell<bool>>);
const NONE: &Foo = {
let x = Foo(None);
&x
}
```
even though `Foo` may contain interior mutability. This specific constant does not contain interior mutability, but it was created from a reference that pointed to a type that may contain interior mutability. Specifically the order of instructions can be derived from the following MIR
```rust
let mut _0: &Foo;
let _1: &Foo;
let _2: Foo;
let mut _3: std::option::Option<std::cell::Cell<bool>>;
bb0: {
StorageLive(_1);
StorageLive(_2);
StorageLive(_3);
_3 = Option::<Cell<bool>>::None;
_2 = Foo(move _3);
StorageDead(_3);
_1 = &_2;
_0 = &(*_1);
StorageDead(_1);
return;
}
```
The statement `_1 = &_2;` creates a pointer/reference, and we only look at the type to which we create a pointer to determine whether the pointer is in fact pointing to mutable memory.
## Possible solutions
### Reject the shown example
A breaking change, albeit with small fallout https://github.com/rust-lang/rust/issues/121610
### Add value based analysis to const eval
While we could implement a scheme to look at the value, that would be prohibitively expensive (we haven't actually benchmarked it, but validation simply is expensive, and running it on every reference would run it a lot on all but the simplest constants).
### Just allow mutable memory in constants
Not really an option, but let's talk about it for completeness:
According to stacked borrows it is legal to modify a `&Foo(None)` with unsafe code to become a `&Foo(Some(...))`.
```rust
fn main() {
let x = NONE;
let y = x as *const Foo;
// Legal because `Foo` contains an `UnsafeCell`
unsafe {
std::ptr::write(x, Foo(Some(Cell::new(true))));
}
}
```
This means the `NONE` constant has always been legal to modify under stacked borrows rules, but it was unsound, because we actually put the `Foo(None)` into immutable memory. So we'd have to actually make the allocation mutable, but then that constant could be used from multiple threads and mutated, even though it's very obviously `!Sync`.
Note that this would not mean we'd allow
```rust
use std::cell::Cell;
pub struct Foo(Option<Cell<bool>>);
impl ::std::ops::Drop for Foo {
fn drop(&mut self) {}
}
pub const NONE: &Foo = &Foo(Some(Cell::new(true)));
```
we still reject that with static checks.
-------------------------------------------
Adding questions
## Tail scope and temporary lifetimes
from above:
> However, Rust has a special rule for borrows tail expressions:
> they get placed into the outer scope.
scottmcm: Is this one of the things that the `super let` (and other temporary lifetimes) project has been trying to re-work? That we take advantage of that work to make things easier here? I support if we have to continue to support existing code then probably not...