# Design meeting: writing code once for multiple SIMD extension sets
Even within the same architecture, CPUs vary significantly in which SIMD ISA extensions they implement. For example, `x86` CPUs currently have about [12](https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512) (!) different possible configurations with respect to AVX-512 support alone.
Most libraries that are shipped to users in binary form thus tend to generate code for different sets of features, and then use runtime dispatch to choose between the different implementations depending on the specific extensions available.
This approach is very common i.e. in modern image and video codec implementations.
Traditionally, this has been done by explicitly writing a version of the relevant code for [every architecture and set of SIMD instructions](https://github.com/memorysafety/rav1d/tree/main/src), often even in hand-coded ASM, with obvious downsides for both developer convenience and safety properties of the resulting code.
A different approach is to write the code once and have the compiler generate multiple versions; the C++ library [highway](https://github.com/google/highway) is a fairly used example of this pattern, used in [multiple projects across different domains](https://github.com/google/highway?tab=readme-ov-file#examples).
The topic of how to best support this pattern in Rust has been discussed extensively in this [Zulip thread](https://rust-lang.zulipchat.com/#narrow/channel/257879-project-portable-simd/topic/struct_target_features.20.28RFC.20.233525.29), without reaching a consensus on which of the three approaches proposed below to proceed with (nor on many of the other design questions discussed below). The main question for this design meeting would be to figure out the best approach(es), among the variants of the proposals below or even a completely different one.
## Current state in Rust
As of writing this document, *efficiently* using SIMD instructions _locally_ (i.e. in a single function, as opposed to program-wide) requires annotating the function with an attribute, and making the function `unsafe`, for example:
```rust
#[target_feature(enable="avx")]
unsafe fn with_avx() {
// can use AVX intrinsics here.
}
```
This attribute influences how LLVM does code generation in multiple ways:
- LLVM might apply automatic vectorization in a way that causes AVX to be used, even if the source code of the function does not explicitly indicate that.
- Other AVX-using functions can be inlined into `with_avx` (in particular, functions from `core::arch` that explicitly map to AVX instructions).
- The ABI of calls to `extern "C"` functions that take `AVX` registers as arguments changes; this was a potential source of surprising UB, and situations that could trigger this are being made into errors. See [#116558](https://github.com/rust-lang/rust/issues/116558) for details.
Since this attribute is required for efficient code to be generated, libraries that want to provide a way to generate multiple copies of a function with different SIMD instruction (such as [pulp](https://docs.rs/pulp/latest/pulp/index.html) and [multiversion](https://docs.rs/multiversion/latest/multiversion/)) sets use one of the following methods:
- They require intermediate functions to be declared `#[inline(always)]`, causing all the relevant SIMD code to be inlined into a single function (which is then multiversioned); this has obvious downsides in terms of code size.
- They require the user to perform dynamic dispatch again, which has a meaningful performance impact for medium-to-small sized functions.
- They synthetize an intermediate function for callees that the user wishes not to be inlined, into which the user-provided callee is inlined and that enables the appropriate features. This requires significant boilerplate.
### Requiring `unsafe`
Apart from the ABI issues mentioned above, `unsafe` is required (regardless of the source of the function) because CPUs that do not understand a certain SIMD ISA extension might interpret instructions as arbitrary other instructions.
[target feature 1.1](https://github.com/rust-lang/rust/issues/69098) removes the need for `unsafe` in free functions, making the functions safe to call *in a context-dependent way*. In particular, the call is safe iff the caller has a superset of the features of the callee.
Note that trait methods still require `unsafe` even with target feature 1.1.
### SIMD abstraction choices
Assuming one does not want to rely on autovectorization (which is typically the case for the use cases in question), supporting this kind of function multiversioning inherently requires having some form of abstraction over SIMD ISAs.
Multiple such abstraction layers have been written over time, with somewhat different objectives:
- Efforts such as `std::simd` provide vector types that offer identical behaviour across different platforms. When the behaviour is not identical across platforms, one standard behaviour is chosen and emulated on platforms for which it would not be the native behaviour.
- `pulp` and `highway` aim to be as close to the hardware as possible. This implies that operations might have slightly different semantics across platforms; they also have a concept of "vector of native length", and may not implement some vector lengths for some targets.
The second approach typically results in higher performance, at the cost of somewhat more cumbersome code.
Moreover, sometimes the most efficient implementation of an algorithm for a specific SIMD architecture might be quite different from the one for a different architecture, for example requiring different types of pre-computation or storage, or also just adapting to the native vector size; thus, it is desirable for the multiversioning/abstraction mechanism to allow for some amount of platform-specific adaptation. See [this document](https://hackmd.io/@veluca93/flexible-feature-abstractions) for some examples of how the SIMD ISA might affect the optimal implementation.
## Proposal: `struct_target_features`
(this proposal is mostly based on [RFC 3525](https://github.com/rust-lang/rfcs/pull/3525))
One possible approach to ease function multiversioning is to allow attaching features to *types*. Such types would gain a _validity_ invariant that requires the appropriate features to be present for an instance of the given type to exist.
A function taking such a type by argument would then be able to assume the feature to be present, and the compiler could thus safely instruct LLVM to generate code accordingly.
Note that this implies that a type explicitly marked with features should be unsafe to construct, unless the caller already has the required features. This is consistent with the behaviour proposed by `target_feature_11`.
This allows using the existing infrastructure for generics to achieve multiversioning, for example like so:
```rust
// See below for a discussion of other possible ways to define such structs
#[target_features(enable = "avx2")]
struct WithAvx2;
#[target_features(enable = "avx512f")]
type WithAvx512;
fn do_something<S: TargetSpecificOperation>(simd: S) {
// code that can work under any set of target features here,
// presumably using some form of SIMD abstraction.
// Specializing the implementation of some operations depending
// on the available instruction set can be done through traits.
simd.target_specific_op();
}
trait TargetSpecificOperation {
fn target_specific_op(self);
}
impl TargetSpecificOperation for WithAvx2 {
// See below for considerations on how to inherit features
// from arguments.
fn target_specific_op(self) {
// AVx2-specific code
}
}
impl TargetSpecificOperation for WithAvx512 {
fn target_specific_op(self) {
// AVx512-specific code
}
}
fn main() {
if has_avx512() {
do_something(unsafe { WithAvx512::new_unchecked() });
} else if has_avx2() {
do_something(unsafe { WithAvx2::new_unchecked() });
}
// *ALTERNATIVE*, with appropriate (user-defined) methods on
// `WithAvx*` types that do feature detection:
if let Some(avx512) = WithAvx512::new() {
do_something(avx512);
} else if let Some(avx2) = WithAvx2::new() {
do_something(avx2);
}
}
```
Beyond this basic idea, there's multiple specific design choice that could considerably change the shape of the resulting feature:
- Should user-provided types be annotable this way, or only stdlib types?
- Should non-ZST be annotable?
- Should types inherit target features from nested types?
- How should a function inheriting features from its arguments work?
### What types can be annotated?
- The main advantage of allowing annotating non-ZSTs is that it allows using features in the implementation of existing traits, i.e. a `Avx2U8Vec` type can implement `std::ops::Add` using `avx2` intrinsics, as long as `self` is not passed by reference (see below).
- Potentially, applying appropriate such attributes to `#[repr(simd)]` types could allow making them pass-by-value even in the Rust ABI, as any such case of pass-by-value would enable the required features implicitly and thus ABI incompatibilities should no longer be possible.
- Allowing any struct to be annotated with `#[target_features(enable = ...)]` lets user code define subsets of features to do dispatching for with a familiar syntax.
- Restricting the creation of such types to the stdlib (or `core`) allows for more freedom in the design of, for example, zero-sized witness types to be used by user code. For example, witness types could be designed as a lang-item type `Witness<const FEATURES: TargetFeatures>`, where `TargetFeatures { avx: bool, avx2: bool, ...}` has fields that depend on the current target; an `impl` of such a type could then be provided to offer functionality that would otherwise require new auto traits or other compiler support.
### Inheriting features from nested types
Some reasonable options are:
- Not inheriting features at all
- Only inheriting features from fields if an attribute is present, i.e. `#[target_features(inherit)]`
- Unconditionally inheriting features for i.e. tuples, but not for user-defined types
- Unconditionally inheriting features
If features inheritance is desired, it should be compatible with validity invariants. As such, the following should probably be true, given a type `T` with features that is in some way contained in a type `S`:
- `S` containing `T` by value should cause features to be inherited
- `S` containing `&T` should probably *not* cause features to be inherited (see [here](https://github.com/rust-lang/unsafe-code-guidelines/issues/412) for rationale on not recurring through references for validity invariants)
- `S` containing `[T; N]` with `N=0` should definitely not cause features to be inherited. With `N>0`, it probably should (unless doing so would be too confusing).
- `S` containing `*T` should definitely not cause features to be inherited.
- If `S` is a `union` or `enum`, it should not inherit features.
Note that *inherited* features do not require the struct to be unsafe to construct.
The rules that apply to inheriting features from struct fields should likely be the same as the rules for inheriting from function arguments.
### Behaviour in function definitions
The main question is whether to require functions that inherit features to have a `#[target_features(from_args)]` attribute (or equivalent).
Such an attribute could also be made more specific, i.e. `#[target_features(from_args(0, 1))]` or `#[target_features(from_args(arg.field))]`.
The main arguments for requiring such an attribute are twofold:
- Not having such an attribute might be surprising for people that are used to having codegen controlled by attributes.
- Currently, the [MIR inliner](https://github.com/rust-lang/rust/blob/de7cef75be8fab7a7e1b4d5bb01b51b4bac925c3/compiler/rustc_mir_transform/src/inline.rs) refuses to inline functions that differ in features. Without such an attribute, features would effectively be unknowable until monomorphization, effectively disabling MIR inlining for generics; this is expected to have an unacceptable performance degradation. However, this check is arguably overly conservative: for example, LLVM happily performs inlining between functions that differ in features if it can determine that doing so is sound, and we'd expect that doing similar checks on MIR would effectively remove this drawback.
Note: not inheriting features from values passed by reference would make implementing trait methods that take `&self` and wish to use features slightly more cumbersome, as it would require either annotating the trait method with `#[inline(always)]` and having it call a function taking `self` by value, or something along the lines of `#[unsafe(target_features(enable = ...)]` which does not require the annotated function to be `unsafe` (by moving the burden of proof to a safety invariant of one of the function arguments).
## Proposal: `#[target_features(caller)]`
(this proposal is mostly based on [this draft RFC](https://github.com/calebzulawski/rfcs/blob/contextual-target-features/text/0000-contextual-target-features.md))
The main idea of this proposal is that annotating functions with `#[target_features(caller)]` would cause them to inherit the features of the caller. The proposal also involves adding macros `is_{arch}_feature_enabled!` to generate different code depending on which instruction set the current function is being compiled for.
Example:
```rust
#[target_features(caller)]
fn do_something() {
// code that can work under any set of target features here,
// presumably using some form of SIMD abstraction.
// Specializing the implementation of some operations depending
// on the available instruction set
if is_x86_feature_enabled!("avx512f") {
// target-specific code with AVX512
} else {
// target-specific code when AVX512 is not available
}
}
#[target_features(enable = "avx512f")]
fn do_something_avx512() {
do_something()
}
#[target_features(enable = "avx2")]
fn do_something_avx2() {
do_something()
}
fn main() {
if has_avx512() {
unsafe {do_something_avx512();}
} else if has_avx2() {
unsafe {do_something_avx2();}
}
}
```
This proposal comes with a few open questions, partially discussed below:
- What should happen when the address of such a function is taken?
- Should this attribute be allowed on trait method implementations?
- At which point of the compilation process should functions with this annotation be multiversioned?
- How should such a function interact with name mangling?
- A caller might have features that a callee does not benefit from; how could features *not* be passed to callees to save binary size?
- What should happen if the call graph contains cycles between `#[target_features(caller)]` functions?
Note: `#[target_features(caller)]` on non-trait methods could be implemented as sugar for `struct_target_features` in the presence of a `features_of_current_function!()` macro that constructs an appropriately-annotated ZST. Many of the discussion points below are also relevant to such an approach.
### Taking addresses
Since there are now multiple copies of the function sharing the same name, converting the function to a function pointer (or `Fn` trait) needs special care.
There are two reasonable approaches:
- Converting to a function pointer/`Fn` trait only uses the baseline features for the current architecture. This could cause some behaviours that are very surprising (i.e. calling `iterator.map(fn_with_caller_features)` would use baseline features).
- Converting to a function pointer/`Fn` trait uses the features of the function that the call appears in. Note that this should ignore inlining.
If this attribute is allowed on trait method implementations of type `T` for trait `Trait`, similar considerations then apply to converting `&T` to `&dyn Trait`.
### Time at which multiversioning happens
The main question is whether multiversioning should happen before MIR creation or before codegen.
Doing multiversioning before MIR creation might impact compilation times negatively, as potentially significantly more MIR would be generated.
Doing multiversioning at codegen time (effectively making `#[target_features(caller)]` functions equivalent to generics, from MIR's point of view) also comes with its own challenges. Most notably, even if inlining a `#[target_features(caller)]` function is by definition always fine, inlining of *callers* of `#[target_features(caller)]`-annotated functions during MIR optimizations would change program behaviour in an observable way if implemented naively. Thus, calls to `caller` functions need to be annotated to keep track of the original set of features.
Note that similar considerations apply to inlining function bodies that contain `is_{arch}_feature_enabled!` macro calls.
## Proposal: `#[target_features(generic)]`
This proposal can be seen as a more explicit version of `#[target_feature(caller)]`.
The main idea of this proposal is that annotating a const generic function with `#[target_features(generic)]` would cause the function to obtain the features specified in the first const generic argument of appropriate type.
Example:
```rust
trait Simd {
fn do_something(&self);
}
struct S<const FEATURES: str>()
impl Simd for S<{*"avx2"}> {
// Note: as written, this function would not have target features. See below
// for a proposal related to the main discussion, but somewhat orthogonal,
// that would fix this issue.
fn do_something(&self) {
// target-specific code with AVX2
}
}
impl Simd for S<{*"avx512f"}> {
fn do_something(&self) {
// target-specific code with AVX512
}
}
#[target_features(generic)]
fn do_something<const FEATURES: str>(s: S<FEATURES>) where S<FEATURES>: Simd {
// code that can work under any set of target features here,
// presumably using some form of SIMD abstraction.
// Specializing the implementation of some operations depending
// on the available instruction set
s.do_something();
}
fn main() {
if has_avx512() {
unsafe {do_something::<{*"avx512f"}>();}
} else if has_avx2() {
unsafe {do_something::<{*"avx2"}>();}
}
}
```
This proposal comes with a few open questions:
- What type should the generic argument have? Some reasonable options are `str` (requires `unsized_const_params`, but can re-use existing syntax), or a new lang item (i.e. with boolean fields for each feature).
- Is "first generic argument" the best semantics for the attribute? Should we allow specifying the index of the generic?
## Comparison between the proposals
In the author's opinion, the additional control granted by `#[target_features(generic)]` and `struct_target_features` is highly desirable.
It however comes at the cost of a more verbose syntax (somewhat mitigated by generic argument deduction).
With `#[target_feature(caller)]`, it is possible to achieve similar levels of flexibility by making feature-generic functions also generic over an appropriate witness type. However, this duplicates the feature information and introduces the possibility of the two disagreeing.
`struct_target_features` suffers significantly from not being able to rely on recursive reference validity, as having significantly different behaviour when passing by reference is a major footgun.
Neither `#[target_feature(caller)]` nor `#[target_feature(generic)]` offer a way to ergonomically create vector types that implement arbitrary traits; see the next section for a possible solution.
## `#[unsafe(target_features(enable = ...))]`
Consider creating a `Avx2u32x8` type, that contains 8 `u32`, has a safety invariant of "`avx2` is available", and relies on `avx2` to implement `std::ops::Add`.
Today, the implementation of `Add` would look like this:
```rust
use core::arch::x86_64::*;
struct Avx2u32x8(__m256i);
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[inline(always)]
fn add(self, other: Self) -> Self {
unsafe {
Self(_mm256_add_epi32(self.0, other.0))
}
}
}
```
This relies on the caller function to have the appropriate features (which is reasonable), and forces inlining (which might be less reasonable for more complex functions).
Even with `target_features_11`, the following is not possible, because `target_feature` is not allowed on safe trait methods:
```rust
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[target_feature(enable = "avx2")]
fn add(self, other: Self) -> Self {
unsafe {
Self(_mm256_add_epi32(self.0, other.0))
}
}
}
```
A workaround to still leave inlining decisions up to the compiler is the following:
Even with `target_features_11`, the following is not possible, because `target_feature` is not allowed on safe trait methods:
```rust
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[inline(always)]
fn add(self, other: Self) -> Self {
#[target_feature(enable = "avx2")]
unsafe fn inner(this: Self, other: Self) -> Self {
Self(_mm256_add_epi32(this.0, other.0))
}
unsafe {
inner(self, other)
}
}
}
```
However, this is somewhat cumbersome.
`#[unsafe(target_feature(enable = ...))]` has the same semantics as `#[target_feature(enable = ...)]`, except that the annotated function (or trait method) is entirely safe: it is up to the person adding the attribute to ensure that the safety invariants of the function arguments ensure it can only be called if features are guaranteed to be present.
Example:
```rust
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[unsafe(target_feature(enable = "avx2"))]
fn add(self, other: Self) -> Self {
// no unsafe with https://github.com/rust-lang/libs-team/issues/494
unsafe {
Self(_mm256_add_epi32(this.0, other.0))
}
}
}
```