owned this note
owned this note
Published
Linked with GitHub
# Heap allocation in const eval: Design doc
## Prelude
Before we dig into actual allocations, we'll introduce some examples that motivate the implementation. First of all,
```rust
const FOO: Vec<i32> = Vec::new();
```
works just fine on the stable compiler, and we'll want this to keep working forever. The reason this is sound right now ([full discussion on zulip](https://rust-lang.zulipchat.com/#narrow/stream/146212-t-compiler.2Fconst-eval/topic/drop.20vs.20construction.20from.20const/near/216729919)) is that since there are no heap alloctions possible at all in const contexts, we know that `FOO` cannot contain any heap allocations and thus know that invoking `drop(FOO)` will not actually deallocate anything, because nothign was allocated.
Now, since we want to actually allow heap allocations, we need to consider the difference between transient and non-transient allocations. Transient allocations are alloations only used for computing the final value of a constant, but are not part of the final constant itself. One example would be
```rust
const FOO: Vec<i32> = {
let foo = "foo".to_string();
assert!(foo.len() == 3);
Vec::new()
};
```
The allocation created by `to_string()` is deallocated before the evaluation of the constant is done, and even if we used during the evaluation itself.
Non-transient allocations are allocations that end up in the final constant. An illegal example would be
```rust
const FOO: Vec<i32> = vec![42];
```
This is illegal because invoking `drop(FOO)` at runtime would drop an allocation created during const eval. and `drop(FOO); drop(FOO);` would drop it twice, causing a double-free, even if we somehow managed to make a single `drop(FOO)` legal.
A legal way to put a heap allocation into a constant would be
```rust
const FOO: &[i32] = vec![42].leak();
```
since there is no way to ever deallocate the resulting constant, so it is indistinguishable from
```rust
const FOO: &[i32] = &[42];
```
which already works on the stable compiler today.
### Non-transient allocations
We'll start out with non-transient allocations, as these are the actually
problematic ones. First of all, `Box<T>` actually has a second generic parameter: `Box<T, A>`, but the `A` is defaulted to `std::alloc::Global`. This means if you use `Box::new(42)`, you actually get `Box::<i32, Global>::new(42)` which will then invoke `Global::alloc` internally $somehow (the details are not important). This gets very interesting with `Vec`, because while `Vec` is defined as `struct Vec<T, A = Global>`, the implementation of `new` is
```rust
impl<T> Vec<T> {
fn new() -> Self { /* code goes here */ }
}
```
which is actually
```rust
impl<T> Vec<T, Global> {
fn new() -> Self { /* code goes here */ }
}
```
This means you can invoke `Vec::new()`, even though it references `Global`, which does not implement `const AllocRef`. But you won't be able to do `x.push(42)` anymore, because that is defined in a different impl block:
```rust
impl<T, A: AllocRef> Vec<T, A> {
fn push(t: T) { /* code goes here */ }
}
```
which requires that the allocator implements `const AllocRef`.
## `AllocRef`, `alloc::Global` and `alloc::ConstGlobal`
The general scheme for heap allocations in const contexts will use a custom allocator which never deallocates. So, even if you have a `Box<i32, ConstGlobal>` and you `drop` it, no deallocation will happen. This means that we'll be able to write
```rust
const FOO: Box<i32, ConstGlobal> = Box::new_in(42, ConstGlobal);
let x = FOO;
let y = FOO;
```
without causing any heap allocations whatsoever.
When used with reallocating datastructures like `Vec`, there can be unexpected behaviour though:
```rust
const FOO: Vec<i32, ConstGlobal> = Vec::new_in(ConstGlobal);
FOO.push(42); // Panic due to "out of memory" in the given allocator
```
It's not too problematic, since the above code will cause the `const_item_mutation` lint to trigger. We may create additional lints for
```rust
let mut x = FOO;
x.push(42); // Panics
```
and suggest
```rust
let mut x: Vec<i32> = FOO.into_iter().collect();
```
or whatever API will be the default for moving between different allocators.
The same issue occurs when attempting to use `ConstGlobal` at runtime
```rust
let x = Box::new_in(42, ConstGlobal); // Panic due to "out of memory"
```
but such situations should be easy to lint against and are not a soundness problem.
While we could just make `ConstGlobal` actually heap allocate at runtime, but never free, @oli-obk is of the opinion that erroring (or panicking) is better than silently causing memory leaks. Users who want leaking allocations can use a leaking allocator defined in user space.
### `impl const AllocRef for ConstGlobal`
Since we'll want to have different behaviour between runtime and compile-time, we'll need to introduce a new `const` intrinsic for allocating. There is no requirement to have anything for deallocating, as deallocation will just do nothing. We could consider adding a deallocation intrinsic anyway, in order to ensure that long running const evaluations don't blow up memory beyond what they actually still can access.
This will allow users to develop an "oracle" which decides whether we're at runtime or at compile-time by using `catch_unwind`. Since `catch_unwind` is not `const fn`, this is a non-issue at this time. But it still is a first for successful compile-time behaviour with panicking runtime behaviour.
An alternative would be to have an additional way to do `impl !AllocRef for ConstGlobal` even though there is a `impl const AllocRef for ConstGlobal`, but @oli-obk has no clue if we really want to open this can of worms. It should be considered before stabilization, but does not block an initial impl.
## Transient heap allocations via `Global`
We may still want to support `Global` allocations at some point, since that
will make it easier to interact with code that is not generic over allocators. This has various problems though, as discussed in the prelude.
These problems can mostly be resolved by pushing the proof obligation of correctness to the user via auto traits similar to `Send` and `Sync` and post-monomorphization errors. Note that the post monomorphization errors only guarantee soundness if the users correctly specified the unsafe trait impls for their types.
### `ConstSafe` and `ConstRefSafe` auto traits
We add (names bikesheddable!) `ConstSafe` and `ConstRefSafe` unsafe auto traits.
`ConstSafe` types may appear in constants directly. This includes all types except
* `&T: ConstSafe where T: ConstRefSafe`
* `&mut T: !ConstSafe`
Other types may (or may not) appear behind references by implementing the `ConstRefSafe` trait (or not)
* `*const T: !ConstRefSafe`
* `*mut T: !ConstRefSafe`
* `String: ConstRefSafe`
* `UnsafeCell<T>: !ConstRefSafe`
* `i32: ConstRefSafe + ConstSafe`.
* the same for other primitives
* `[T]: ConstRefSafe where T: ConstRefSafe`
* the data pointer of a fat pointer follows the same rules as the root value of an allocation
* rationale: the value itself could be on the heap, but you can't do anything bad with it since trait methods at worst can get a `&self` if you start with a `&Trait`. Further heap pointers inside the are forbidden, just like in root values of constants.
* ... and so on (needs full list and rationale before stabilization)
Additionally values that contain no pointers to heap allocations are allowed as the final value of a constant.
Our rationale is that
1. we want to forbid types like
```rust
struct Foo(*mut ());
```
whose methods convert the raw pointer to a raw pointer to the actual type (which might contain an unsafe cell) and the modify that value.
2. we want to allow types like `String` (at least behind references), since we know the user can't do anything bad with them as they have no interior mutability. `String` is pretty much equivalent to
```rust
struct String(*mut u8, usize, usize);
```
Which is indistinguishable from the `Foo` type via pure type based analysis.
In order to distinguish these two types, we need to get some information from the user. The user can write
```rust
unsafe impl ConstRefSafe for String {}
```
and declare that they have read and understood the `ConstRefSafe` documentation and solemly swear that `String` is only up to good things.
#### Backcompat issue 1
Now one issue with this is that we'd suddenly forbid
```rust
struct Foo(*mut ());
const FOO: Foo = Foo(std::ptr::null_mut());
```
which is perfectly sane and legal on stable Rust. The problems only happen once there are pointers to actual heap allocations or to mutable statics in the pointer field. Thus we allow any type directly in the root of a constant, as long as there are none such pointers in there.
#### Backcompat issue 2
Another issue is that
```rust
struct Foo(*mut ());
const FOO: &'static Foo = &Foo(std::ptr::null_mut());
```
is also perfectly sane and legal on stable Rust. Basically as long as there are no heap pointers, we'll just allow any value, but if there are heap pointers, we require `ConstSafe` and `ConstRefSafe`
## Related links
Proposal: https://github.com/rust-lang/const-eval/issues/20
Discussion on zulip:
* https://rust-lang.zulipchat.com/#narrow/stream/146212-t-compiler.2Fconst-eval/topic/.60ConstSafe.60.20and.20bitwise.20copies/near/180012229
* https://rust-lang.zulipchat.com/#narrow/stream/146212-t-compiler.2Fconst-eval/topic/Questions.20regarding.20Heap.20allocation.20in.20const.20eval/near/216728136
## Prior art
- C++20 allows heap allocation in `constexpr` [link to proposal](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0784r7.html)
- Dlang has very powerful compile time function evaluation(CTFE).