owned this note
owned this note
Published
Linked with GitHub
# Niches and `UnsafeCell<T>` summary
## Background: UnsafeCell and niches
Presently, `UnsafeCell<T>` is considered to expose "niches" found in `T` that can be used for layout optimization.
This implies that `Option<UnsafeCell<bool>>`, for example, has the same size as a bool, since the invalid bit patterns of `bool` can be used to represent `None`.
## Potential for data races
However, as reported in [#68206], because the memory inside an `UnsafeCell` may be accessed from other threads, this introduces the potential for a data race:
* A `&Option<Mutex<bool>>`, for example, might be shared between two threads.
* One thread could be matching on the `&Option<_>` to determine if it is `Some` or `None`. This would require reading the `bool` value that is protected by the mutex (but this would happen without acquiring the mutex).
* Another thread could acquire the mutex and do writes to that same memory, introducing a data race.
* The same pattern can happen with a `&Option<RefCell<bool>>`, as one might use `borrow_mut()` to get an `&mut bool` that is then sent to another thread using rayon or some such library.
* Naturally the same pattern could apply to numerous types in third-party libraries (e.g., parking-lot) that are based on `UnsafeCell`.
## A deeper conflict
Even beyond data races, there is a deeper conflict at play here.
The first observation is that the design of `UnsafeCell<T>` generally defines "some portion" of a type which may be mutated even when shared. Things that are logically outside of the `UnsafeCell` are not affected by it.
Further, with enums, if you ignore niche optimizations, then there is conceptually a discriminant field and a payload field. **Niches allow these two conceptually distinct fields to overlap in memory.**
If you combine these two things, you start to see the problem with `Option<UnsafeCell<bool>>` -- the discriminant for the option is conceptually outside of the `UnsafeCell`, and hence unaffected by it. The `bool` meanwhile is inside the `UnsafeCell`. However, because of the niche, the discriminant is actually stored **in the same memory as the bool**.
This is particularly problematic for our attempts to formalize the unsafety rules in an operational and machine checkable way. Stacked borrows has the goal of defining the rules in terms of "which memory can be mutated and when" and not introducing very specific bits of "ghost state", such as tracking the "current variant" of an enum. But this implies that if you can mutate the payload of an `UnsafeCell`, you are also able to mutate the discriminant of the enum, unless those permission rules start to be expressed in a much more subtle and complex way that accounts somehow for niches.
This has manifested in two surprising examples thus far: [#68303] and [unsafe-code-guidelines#204].
### Issue 68303 -- reading discriminant invalidates reference to payload
Issue [#68303] is as follows:
```rust
use std::cell::RefCell;
fn main() {
let optional=Some(RefCell::new(false));
// handle acquires a **mutable lock** on the payload
let mut handle=optional.as_ref().unwrap().borrow_mut();
// accessing the discriminant **releases that lock**, because
// discriminant is in the same memory as the payload
optional.is_some();
// lock has expired here, error:
*handle=true;
}
```
This results (presently) in a miri error. Why is that? The comments above try to give an intution for it. The basic idea of stacked borrows is that creating a `&` or `&mut` reference to some memory effectively "locks it" so that it can only be accessed in a way compatible with the reference. So a `&` reference acquires a "read lock", making the memory immutable, and a `&mut` acquires a "write lock", making the memory inaccessible. These locks are automatically released when the memory is accessed in a way that is incompatible with the reference. Trying to use the reference after its lock has been released is an error.
In this case, the `handle` acquires a "write lock" on the payload of the option (the `bool` inside the `RefCell`, specifically). But then `optional.is_some()` accesses the discriminant of the `Option`. Remember that this **logically distinct** but (thanks to niches) **physically overlapping** memory. Since the actual memory is the same, the lock from `handle` is released.
In that case, it becomes an error to do `*handle = true` later on, as the lock hasbeen released.
This explanation is very specific to stacked borrows, but the conflict is more fundamental.
### Unsafe code guidelines #204 -- mutating payload invalidates discriminant
In [unsafe-code-guidelines#204], the problem is reversed. It contains a function `evil` that takes a `&Option<Cell<&T>>`. `evil` then mutates the payload of that `Cell`. This has the side-effect of also mutating the discriminant, effectively converting the `Option` (outside the `Cell`!) from a `Some` into a `None`:
```rust
/// Sets `x` to `None`.
fn evil<T>(x: &Option<Cell<&T>>) {
assert_eq!(
mem::size_of_val(x), mem::size_of_val(x.as_ref().unwrap()),
"layout optimization did not kick in?",
);
unsafe { *(x as *const _ as *mut usize) = 0; }
}
```
This could be quite surprising to innocent callers of `evil`:
```rust
fn main() {
let my = Some(Cell::new(&0));
evil(&my);
dbg!(my); // surprise! `my` is now `None`!
}
```
This would also imply that the compiler must be much more conservative around code that manipulates a `&Option<T>`, as the discriminant could now suddenly change from `Some` to `None` (and vice versa).
## What about Cell?
`Cell` is different from `RefCell` and `Mutex`, since it does not permit references to its interior. This means that many of the examples and potential problems do not apply to it, as they often rely on creating a reference to the interior and then accessing an enum discriminant that is logically outside that reference but physically overlapping. One might be tempted to think that it should still permit niches. However, the [unsafe-code-guidelines#204] example shows why this is probably a bad idea. In particular, while mutating the interior of a `Cell` cannot still alter the contents of enum discriminants outside the cell, and thus have surprising side-effects that will inhibit optimization and reasoning.
[RalfJung expands](https://github.com/rust-lang/rust/issues/68206#issuecomment-575274715): *I see an inherent conflict here between niches, that basically "overlay" multiple pieces of information on the same location, and trying to define interior mutability operationally (in terms of which memory can and cannot be mutated when) in a way that it only applies to "parts of" the information stored in that location.*
## Resolved
In the lang-team meeting on [2020-01-16], we discussed this and came to the conclusion that **we ought to inhibit niches in `UnsafeCell<T>` in order to resolve these various issues.** (This was a preliminary in-meeting consensus, not a formal decision.)
## Implications
This will cause potential regressions. These regressions are considered justified as this is a soundness fix, and that scenario is not very likely.
* Specifically, the type `Option<UnsafeCell<&i32>>` [is currently considered FFI-safe](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7027b7e22dab5bedeeb9c54679dd4ea7) (and same for `Cell`), as it would be represented as a `Option<&i32>` which in turn is a nullable pointer. After this change, that will no longer be the case.
## Possible future directions
It is possible that we could re-enable niches on `Cell`, although it is not clear that we want this (c.f. [unsafe-code-guidelines#204]). If we were to do so, we would either want to introduce some way for `Cell` to "opt back in" to niches or perhaps introduce a variant of `UnsafeCell` that *also* implies that its contents are guaranteed to be thread-local.
[#68206]: https://github.com/rust-lang/rust/issues/68206
[unsafe-code-guidelines#204]: https://github.com/rust-lang/unsafe-code-guidelines/issues/204
[2020-01-16]: https://github.com/rust-lang/lang-team/blob/master/minutes/2020-01-16.md
[#68303]: https://github.com/rust-lang/rust/issues/68303