owned this note
owned this note
Published
Linked with GitHub
# Safe\(r\) Transmute
The need for the ability to view one type as another with no copying and only the absolute necessary runtime checks is important for systems programming. However, this process, known as "transmuting" one type to another is extremely dangerous so much so that the docs for [std::mem::transmute](https://doc.rust-lang.org/std/mem/fn.transmute.html) are essentially a long list of how to avoid doing so.
Transmuting cannot always be avoided though. For instance, in extremely performance-sensitive use cases, it may be necessary to transmute from bytes instead of explicitly deserializing and copy bytes from a buffer into a struct.
While type transmuting is the general act of viewing one type as another type, there is one flavor of the transmute that is extremely common - viewing a slice of bytes (i.e., a byte buffer) as some arbitrary type and vice versa.
This pre-RFC attempts to solve only this problem while still leaving room for future improvements that allow for arbitrary transmute.
## Use Cases
Viewing a bytes as a type and vice versa is useful in a wide range of use cases such as:
* **Parsing**: many file formats layout bytes in a way compatible with C struct layouts meaning copying is often times not necessary. For example
* Network protocols like HTTP, TLS, etc.
* Binary files like image, zip, executables, etc.
* Memory-mapped files
* **Search**: High performance search algorithms generally want to copy as little data as possible.
* **Kernel and Embedded Development**: often in low level contexts you will not have the stack space to copy items from memory to the stack to perform manipulations
## Causes of Unsafety and Undefined Behavior (UB)
At the core of understanding the safety properties of transmutation is understanding Rust's layout properties (i.e., how Rust represents types in memory). The best resource I've found for understanding this is [Alexis Beingessner's blog post]() on the matter.
The following are the reasons that transmutation from some buffer of bytes is generally unsafe:
* **Wrong Size**: A buffer of bytes might not contain the correct number of bytes to encode a given type. Referring to uninitialized fields of a struct is UB. Of course, this assumes that the size of a given type is known ahead of time which is not always the case.
* **Illegal Representations**: Safe transmutation of a slice of bytes to a type `T` is only possible if every possible value of those bytes corresponds to a valid value of type `T`. For example, this property doesn't hold for `bool` or for most enums. While `size_of::<bool>() == 1`, a `bool` can _only_ legally be either `0b1` or `0b0` - transmuting `0b10` to `bool` is UB.
* **Non-Deterministic Layout**: Certain types might not have a deterministic layout in memory. The Rust compiler is allowed to rearrange the layout of any type that does not have a well defined layout associated with it. Explicitly setting the layout of a type is done through `#[repr(..)]` attributes. To be deterministic, both the order of fields of a complex type as well as the exact value of their offsets from the beginning of the type must be well known. This is generally only possible by marking a complex type `#[repr(C)]` and recursively ensuring that all fields of the struct are composed of types with deterministic layout.
* **Alignment**: Types must be "well-aligned" meaning that where they are in memory falls on a certain memory address interval (usually some power of 2). For example the alignment of `u32` is 4 meaning that a valid `u32` must always start at a memory address evenly divisible by 4. Transmuting a slice of bytes to a type `T` that does not have proper alignment for type `T` is UB.
Transmuting from a type `T` to a slice of bytes can also be unsafe or cause UB:
* **Padding**: Since padding bytes (i.e., bytes internally inserted to ensure all elements of a complex type have proper alignment) are not initialized, viewing them is UB.
* **Non-Deterministic Layout**: The same issue for transmuting from bytes to type `T` apply when going the other direction.
## Proposed Improvements
### Introduce traits for types that can be safely transformed to/from bytes
We first introduce the traits `FromAnyBytes` and `ToBytes` (names subject to bikeshedding - see below).
`FromAnyBytes` represents any type where all properly aligned and sized byte patterns are legal (from here on referred to as "byte-complete" types), such that any byte slice of the same size can be transmuted into the type in-place without further checking.
`ToBytes` represents any type that can be transmuted into bytes in-place, which in requires that the type must not have any padding.
All core types that are byte-complete implement both `FromAnyBytes` and `ToBytes` (a full list appears below). Core types like `bool` that need further validation before being safely transmuted from bytes only implement `ToBytes`. Both traits can be safely opted into either using #[derive(...)] or impl blocks as long as:
* They are only recursively composed of FromAnyBytes or ToBytes types respectively
* They have a deterministic layout (such as types using repr(C) or repr(transparent))
* Additionally, for `ToBytes` they contain no padding bytes.
The compiler will return an error when the type does not fit all of the necessary conditions.
### `FromAnyBytes` Definition
`FromAnyBytes` contains no methods and serves as a marker trait.
### `ToBytes` Definition
```rust
trait ToBytes: Sized {
#[inline]
fn to_bytes(&self) -> &[u8; std::mem::size_of::<Self>()] {
/// ... implementation
}
#[inline]
fn to_bytes_mut(&mut self) -> &mut [u8; std::mem::size_of::<Self>()] {
/// ... implementation
}
#[inline]
fn into_bytes(self) -> [u8; std::mem::size_of::<Self>()] {
/// ... implementation
}
/// One more discussed below ...
}
```
### Casting
This proposal only proposes one initial function for doing safe transmute: `cast`
```rust
trait ToBytes: Sized {
/// ... other methods seen above
fn cast<U: FromBytes(from: T) -> U {
}
}
```
### Impact on Public API
The user must opt into a complex type implementing `FromAnyBytes` and `ToBytes`, because this has implications on the public API of the type. For instance, changing normally private details of a complex type such as ordering of private fields may become a breaking change.
### Padding
A struct that requires internal padding can become a struct that can derive `ToBytes` by explicitly defining padding fields.
```rust
#[derive(ToBytes)]
#[repr(C)]
struct Foo {
field1: u8,
_0: u8,
field2: 16
}
```
Note that some structs may have "surprise" padding at the end and as such should not implement `ToBytes`. For example: `struct MyType(u32, u8)`.
### Implementing `std` Types
The following core types will be marked as `FromAnyBytes` and `ToBytes`:
* `u8`, `u16`, `u32`, `u64`, `u128`, `usize`
* `i8`, `i16`, `i32`, `i64`, `i128`, `isize`
* `f32`, `f64`
* `()`
* all SIMD types that are byte-complete
* `Option<T>` where `T` is any `NonZeroU*` or `NonZeroI` type
* `[T; N]` for any `T` implementing the corresponding trait.
* Note that all types guarantee their size is a multiple of their alignment, so a slice `[T; N]` can never contain padding that the type `T` doesn't itself contain.
The following additional core types will be marked as `ToBytes` only:
* `bool`
* any `NonZeroU*` or `NonZeroI*` type
* `char`
* Note that this will produce and consume UCS-4 characters and would require committing to the internal UCS-4 representation of `char`. We could, alternatively, omit the trait implementations for `char`.
All tuples composed of `FromAnyBytes` types will themselves implement `FromAnyBytes`. All tuples composed of `ToBytes` types without padding can implement `ToBytes`. (Providing such implementations in the standard library may require compiler assistance.)
### Enums
C-style enum types (with no fields in any variant) marked with `#[repr(C)]` or `#[repr($INT)]` may derive `ToBytes`.
### Generics
While it is theoretically possible to derive `ToBytes` and/or `FromAnyBytes` for generic structs which are generic over types that are `ToBytes` and/or `FromAnyBytes`, this is left to future work.
### Endianess
Transmute deals with in-memory data in-place, and thus does not have any provisions to perform translations between native endianness and non-native endianness.
### Unsafe Impl
There is no way to `unsafe impl` either `FromAnyBytes` or `ToBytes` for a type that doesn't meet the requirements.
### Raw Pointers
Raw pointers could potentially implement both `ToBytes` and `FromAnyBytes`, and references or Option of references could potentially implement `ToBytes`. There may be uses for such implementations, but they also seem potentially error-prone.
We propose to evaluate them further and consider such implementations in the future, but to not provide such implementations in the initial version.
## Naming
The names for these traits are still subject to bikeshedding. There were several criteria used to select each trait name. First, the names should make their usages recognizable out of context although not necessarily sufficiently clear without prior exposure. It should be clear through the names how the two marker traits contrast with each other as well as potential extensions in the future.
The `FromAnyBytes` trait should convey that any combination of bytes the same length as `size_of<T>()` is a valid representation of type `T` in memory. The `ToBytes` trait should convey that it is a well-defined operation to view the raw memory representation of the marked type.
Note that the working assumption is that these types will exist in the `std::mem` namespace.
Other names that were considered include:
* `FromValidBytes` / `AsValidBytes`
* `FromValidBytes` / `ToValidBytes`
* `SafeFromBytes` / `SafeToBytes`
* `FromBytes` / `AsBytes`
* `SafeTransmuteFrom` / `SafeTransmuteTo`
* `FromAnyBytes` / `ToBytesInPlace`
## Limitations
This proposal is purposfully fairly limited. It does not, for instance, give any way to convert to a type from bytes, only the tools to ensure that doing so would be safe.
## Possible Future Extensions
The following proposals could be made in the future in a way that is compatible with this proposal. This document is not advocating one way or another for their adoption, but these proposals were also considered in the creation of this document.
### Extension to `ToBytes` for Transmuting
The `ToBytes` trait could be extend to allow transmuting between types through a bytes intermediary.
```rust
trait ToBytes {
/// to_bytes() defined here
/// Safely cast this type in-place to another type, returning a
/// reference to the same memory. Panics if alignment or size of types
/// differ
fn cast<T: FromAnyBytes>(&self) -> &T { /*...*/ }
/// Safely cast this type in-place to another type, returning a mutable
/// reference to the same memory. This requires `Self` to satisfy
/// `FromAnyBytes`, because writes through the returned mutable
/// reference will mutate `Self` without validation. As with the above
/// this method will panic if size or alignment is off.
fn cast_mut<T: FromAnyBytes>(&mut self) -> &mut T
where Self: FromAnyBytes { /*...*/ }
}
```
### AlginOf and SizeOf
The `ToBytes` proposal above requires two runtime checks to be provably safe. The size of the type being cast to must be smaller than the size of the type being cast from and the types must have an alignment which is compatible with each other.
These properties are (usually) possible to know at compile time so having some sort of traits that encapsulate this in the type system such as `SizeOf<N>` and `AlginOf<N>` would be ideal.
That way a safe casting function could be written as:
```rust
fn safe_transmute<From, To, Size, Align>(from: From) -> To
where From: SizeOf<Size> + AlignOf<Align> + ToBytes,
To: SizeOf<Size> + AlginOf<Align> + FromAnyBytes
{
/// implementation
}
```
### `FromBytes`
Additionally a `FromBytes` trait could be introduced for expressing conversions from bytes that might fail:
```rust
trait FromBytes: Sized {
fn from_bytes(bytes: &[u8; std::mem::size_of::<Self>]) -> Result<&Self, FromBytesError>;
}
#[non_exhaustive]
#[derive(Debug, PartialEq, Eq, Copy, Clone)]
enum FromBytesError {
InsufficientAlignment,
InsufficientBytes,
InvalidValue
}
```
With this in place it may be tempting to have some sort of variant of the above `ToBytes` casting methods which can fail instead of panicking.
### Safe Unions
Unions whose fields all implement both `FromAnyBytes` and `ToBytes` can potentially allow reads of their fields without requiring unsafe, since writing to one field and reading from another acts as a transmute operation, and these traits make transmutes safe.
However, when a union's fields have differing lengths (referred to here as "unbalanced unions"), initializing a shorter field does not necessarily zero out the remainder of the union. This means initializing a union with a shorter field and then reading a longer field leads to reading from uninitialized memory.
To make this well defined, it would perhaps be wise to add a new repr, `#[repr(zero_init)]`, which initializes the remainder of the union to zero when initializing any field. Thus, safe Rust can allow reading fields of unbalanced unions if and only if the union type implements ToBytes and FromAnyBytes and is `#[repr(zero_init)]`.
## Alternatives
### typic
`typic` is an experimental crate for encoding a type's layout as a trait. This allows for easy comparison at compile time of two distinct types layouts. As long as they implement equivalent layout types, they should be safe to transmute between (assuming proper alignment).
This approach is appealing since it seeks to solve general transmutate instead of simply viewing bytes as types and vice versa.
What is unsure is what impact that this approach has on compiler performance and if using the type system in such a way is an approach the compiler team feels comfortable with supporting.
More information on `typic` can be [found here](https://github.com/jswrenn/typic).
### `Compatible<T>`
The `Compatible<T>` proposal also seeks to address the general problem of transmute through use of the type system. The proposal suggests an extension to the type system that allows the type system to know if two types are transatively compatible with one another.
The downside to this approach is that it requires changes to the trait system (albeit ones that are currently in the works in [chalk](https://github.com/rust-lang/chalk)).
More about this proposal can be [found here](https://gist.github.com/gnzlbg/4ee5a49cc3053d8d20fddb04bc546000).
### Padding
Instead of forbidding padding in `ToBytes`, an API that returns an `[std::mem::MaybeUninit<u8>; std::mem::size_of::<T>()]` could be provided. We believe that this is an ergnomics hit that can more easily be overcome through other means (e.g., adding explicit padding fields that are zeroed or introducing a `repr(zeroed)` attribute).
## Open Questions
* Is it possible that two types with the same size and alginment that are both `FromAnyBytes` and `ToBytes` can still not be convertible to one another due to differing ABI needs?
* Should all fields of a `FromAnyBytes` type be required to be marked public as exposing the type's in memory representation already exposes its internals publicly?
## Acknowledgments
Shout out to the following crates for paving the way with many good ideas:
* [safe-transmute](https://crates.io/crates/safe-transmute)
* [zerocopy](https://crates.io/crates/zerocopy)
* [Compatible<T>](https://internals.rust-lang.org/t/pre-rfc-frombits-intobits/7071/24)
* [typic](https://crates.io/crates/typic)