Even within the same architecture, CPUs vary significantly in which SIMD ISA extensions they implement. For example, x86
CPUs currently have about 12 (!) different possible configurations with respect to AVX-512 support alone.
Most libraries that are shipped to users in binary form thus tend to generate code for different sets of features, and then use runtime dispatch to choose between the different implementations depending on the specific extensions available.
This approach is very common i.e. in modern image and video codec implementations.
Traditionally, this has been done by explicitly writing a version of the relevant code for every architecture and set of SIMD instructions, often even in hand-coded ASM, with obvious downsides for both developer convenience and safety properties of the resulting code.
A different approach is to write the code once and have the compiler generate multiple versions; the C++ library highway is a fairly used example of this pattern, used in multiple projects across different domains.
The topic of how to best support this pattern in Rust has been discussed extensively in this Zulip thread, without reaching a consensus on which of the three approaches proposed below to proceed with (nor on many of the other design questions discussed below). The main question for this design meeting would be to figure out the best approach(es), among the variants of the proposals below or even a completely different one.
As of writing this document, efficiently using SIMD instructions locally (i.e. in a single function, as opposed to program-wide) requires annotating the function with an attribute, and making the function unsafe
, for example:
#[target_feature(enable="avx")]
unsafe fn with_avx() {
// can use AVX intrinsics here.
}
This attribute influences how LLVM does code generation in multiple ways:
with_avx
(in particular, functions from core::arch
that explicitly map to AVX instructions).extern "C"
functions that take AVX
registers as arguments changes; this was a potential source of surprising UB, and situations that could trigger this are being made into errors. See #116558 for details.Since this attribute is required for efficient code to be generated, libraries that want to provide a way to generate multiple copies of a function with different SIMD instruction (such as pulp and multiversion) sets use one of the following methods:
#[inline(always)]
, causing all the relevant SIMD code to be inlined into a single function (which is then multiversioned); this has obvious downsides in terms of code size.unsafe
Apart from the ABI issues mentioned above, unsafe
is required (regardless of the source of the function) because CPUs that do not understand a certain SIMD ISA extension might interpret instructions as arbitrary other instructions.
target feature 1.1 removes the need for unsafe
in free functions, making the functions safe to call in a context-dependent way. In particular, the call is safe iff the caller has a superset of the features of the callee.
Note that trait methods still require unsafe
even with target feature 1.1.
Assuming one does not want to rely on autovectorization (which is typically the case for the use cases in question), supporting this kind of function multiversioning inherently requires having some form of abstraction over SIMD ISAs.
Multiple such abstraction layers have been written over time, with somewhat different objectives:
std::simd
provide vector types that offer identical behaviour across different platforms. When the behaviour is not identical across platforms, one standard behaviour is chosen and emulated on platforms for which it would not be the native behaviour.pulp
and highway
aim to be as close to the hardware as possible. This implies that operations might have slightly different semantics across platforms; they also have a concept of "vector of native length", and may not implement some vector lengths for some targets.The second approach typically results in higher performance, at the cost of somewhat more cumbersome code.
Moreover, sometimes the most efficient implementation of an algorithm for a specific SIMD architecture might be quite different from the one for a different architecture, for example requiring different types of pre-computation or storage, or also just adapting to the native vector size; thus, it is desirable for the multiversioning/abstraction mechanism to allow for some amount of platform-specific adaptation. See this document for some examples of how the SIMD ISA might affect the optimal implementation.
struct_target_features
(this proposal is mostly based on RFC 3525)
One possible approach to ease function multiversioning is to allow attaching features to types. Such types would gain a validity invariant that requires the appropriate features to be present for an instance of the given type to exist.
A function taking such a type by argument would then be able to assume the feature to be present, and the compiler could thus safely instruct LLVM to generate code accordingly.
Note that this implies that a type explicitly marked with features should be unsafe to construct, unless the caller already has the required features. This is consistent with the behaviour proposed by target_feature_11
.
This allows using the existing infrastructure for generics to achieve multiversioning, for example like so:
// See below for a discussion of other possible ways to define such structs
#[target_features(enable = "avx2")]
struct WithAvx2;
#[target_features(enable = "avx512f")]
type WithAvx512;
fn do_something<S: TargetSpecificOperation>(simd: S) {
// code that can work under any set of target features here,
// presumably using some form of SIMD abstraction.
// Specializing the implementation of some operations depending
// on the available instruction set can be done through traits.
simd.target_specific_op();
}
trait TargetSpecificOperation {
fn target_specific_op(self);
}
impl TargetSpecificOperation for WithAvx2 {
// See below for considerations on how to inherit features
// from arguments.
fn target_specific_op(self) {
// AVx2-specific code
}
}
impl TargetSpecificOperation for WithAvx512 {
fn target_specific_op(self) {
// AVx512-specific code
}
}
fn main() {
if has_avx512() {
do_something(unsafe { WithAvx512::new_unchecked() });
} else if has_avx2() {
do_something(unsafe { WithAvx2::new_unchecked() });
}
// *ALTERNATIVE*, with appropriate (user-defined) methods on
// `WithAvx*` types that do feature detection:
if let Some(avx512) = WithAvx512::new() {
do_something(avx512);
} else if let Some(avx2) = WithAvx2::new() {
do_something(avx2);
}
}
Beyond this basic idea, there's multiple specific design choice that could considerably change the shape of the resulting feature:
Avx2U8Vec
type can implement std::ops::Add
using avx2
intrinsics, as long as self
is not passed by reference (see below).#[repr(simd)]
types could allow making them pass-by-value even in the Rust ABI, as any such case of pass-by-value would enable the required features implicitly and thus ABI incompatibilities should no longer be possible.#[target_features(enable = ...)]
lets user code define subsets of features to do dispatching for with a familiar syntax.core
) allows for more freedom in the design of, for example, zero-sized witness types to be used by user code. For example, witness types could be designed as a lang-item type Witness<const FEATURES: TargetFeatures>
, where TargetFeatures { avx: bool, avx2: bool, ...}
has fields that depend on the current target; an impl
of such a type could then be provided to offer functionality that would otherwise require new auto traits or other compiler support.Some reasonable options are:
#[target_features(inherit)]
If features inheritance is desired, it should be compatible with validity invariants. As such, the following should probably be true, given a type T
with features that is in some way contained in a type S
:
S
containing T
by value should cause features to be inheritedS
containing &T
should probably not cause features to be inherited (see here for rationale on not recurring through references for validity invariants)S
containing [T; N]
with N=0
should definitely not cause features to be inherited. With N>0
, it probably should (unless doing so would be too confusing).S
containing *T
should definitely not cause features to be inherited.S
is a union
or enum
, it should not inherit features.Note that inherited features do not require the struct to be unsafe to construct.
The rules that apply to inheriting features from struct fields should likely be the same as the rules for inheriting from function arguments.
The main question is whether to require functions that inherit features to have a #[target_features(from_args)]
attribute (or equivalent).
Such an attribute could also be made more specific, i.e. #[target_features(from_args(0, 1))]
or #[target_features(from_args(arg.field))]
.
The main arguments for requiring such an attribute are twofold:
Note: not inheriting features from values passed by reference would make implementing trait methods that take &self
and wish to use features slightly more cumbersome, as it would require either annotating the trait method with #[inline(always)]
and having it call a function taking self
by value, or something along the lines of #[unsafe(target_features(enable = ...)]
which does not require the annotated function to be unsafe
(by moving the burden of proof to a safety invariant of one of the function arguments).
#[target_features(caller)]
(this proposal is mostly based on this draft RFC)
The main idea of this proposal is that annotating functions with #[target_features(caller)]
would cause them to inherit the features of the caller. The proposal also involves adding macros is_{arch}_feature_enabled!
to generate different code depending on which instruction set the current function is being compiled for.
Example:
#[target_features(caller)]
fn do_something() {
// code that can work under any set of target features here,
// presumably using some form of SIMD abstraction.
// Specializing the implementation of some operations depending
// on the available instruction set
if is_x86_feature_enabled!("avx512f") {
// target-specific code with AVX512
} else {
// target-specific code when AVX512 is not available
}
}
#[target_features(enable = "avx512f")]
fn do_something_avx512() {
do_something()
}
#[target_features(enable = "avx2")]
fn do_something_avx2() {
do_something()
}
fn main() {
if has_avx512() {
unsafe {do_something_avx512();}
} else if has_avx2() {
unsafe {do_something_avx2();}
}
}
This proposal comes with a few open questions, partially discussed below:
#[target_features(caller)]
functions?Note: #[target_features(caller)]
on non-trait methods could be implemented as sugar for struct_target_features
in the presence of a features_of_current_function!()
macro that constructs an appropriately-annotated ZST. Many of the discussion points below are also relevant to such an approach.
Since there are now multiple copies of the function sharing the same name, converting the function to a function pointer (or Fn
trait) needs special care.
There are two reasonable approaches:
Fn
trait only uses the baseline features for the current architecture. This could cause some behaviours that are very surprising (i.e. calling iterator.map(fn_with_caller_features)
would use baseline features).Fn
trait uses the features of the function that the call appears in. Note that this should ignore inlining.If this attribute is allowed on trait method implementations of type T
for trait Trait
, similar considerations then apply to converting &T
to &dyn Trait
.
The main question is whether multiversioning should happen before MIR creation or before codegen.
Doing multiversioning before MIR creation might impact compilation times negatively, as potentially significantly more MIR would be generated.
Doing multiversioning at codegen time (effectively making #[target_features(caller)]
functions equivalent to generics, from MIR's point of view) also comes with its own challenges. Most notably, even if inlining a #[target_features(caller)]
function is by definition always fine, inlining of callers of #[target_features(caller)]
-annotated functions during MIR optimizations would change program behaviour in an observable way if implemented naively. Thus, calls to caller
functions need to be annotated to keep track of the original set of features.
Note that similar considerations apply to inlining function bodies that contain is_{arch}_feature_enabled!
macro calls.
#[target_features(generic)]
This proposal can be seen as a more explicit version of #[target_feature(caller)]
.
The main idea of this proposal is that annotating a const generic function with #[target_features(generic)]
would cause the function to obtain the features specified in the first const generic argument of appropriate type.
Example:
trait Simd {
fn do_something(&self);
}
struct S<const FEATURES: str>()
impl Simd for S<{*"avx2"}> {
// Note: as written, this function would not have target features. See below
// for a proposal related to the main discussion, but somewhat orthogonal,
// that would fix this issue.
fn do_something(&self) {
// target-specific code with AVX2
}
}
impl Simd for S<{*"avx512f"}> {
fn do_something(&self) {
// target-specific code with AVX512
}
}
#[target_features(generic)]
fn do_something<const FEATURES: str>(s: S<FEATURES>) where S<FEATURES>: Simd {
// code that can work under any set of target features here,
// presumably using some form of SIMD abstraction.
// Specializing the implementation of some operations depending
// on the available instruction set
s.do_something();
}
fn main() {
if has_avx512() {
unsafe {do_something::<{*"avx512f"}>();}
} else if has_avx2() {
unsafe {do_something::<{*"avx2"}>();}
}
}
This proposal comes with a few open questions:
str
(requires unsized_const_params
, but can re-use existing syntax), or a new lang item (i.e. with boolean fields for each feature).In the author's opinion, the additional control granted by #[target_features(generic)]
and struct_target_features
is highly desirable.
It however comes at the cost of a more verbose syntax (somewhat mitigated by generic argument deduction).
With #[target_feature(caller)]
, it is possible to achieve similar levels of flexibility by making feature-generic functions also generic over an appropriate witness type. However, this duplicates the feature information and introduces the possibility of the two disagreeing.
struct_target_features
suffers significantly from not being able to rely on recursive reference validity, as having significantly different behaviour when passing by reference is a major footgun. However, after some discussion with t-opsem / t-lang it appears to be the case that it would be reasonable to rely on safety invariants (instead of validity invariants) for enabling features, bypassing this issue.
Neither #[target_feature(caller)]
nor #[target_feature(generic)]
offer a way to ergonomically create vector types that implement arbitrary traits; see the next section for a possible solution.
#[unsafe(target_features(enable = ...))]
Consider creating a Avx2u32x8
type, that contains 8 u32
, has a safety invariant of "avx2
is available", and relies on avx2
to implement std::ops::Add
.
Today, the implementation of Add
would look like this:
use core::arch::x86_64::*;
struct Avx2u32x8(__m256i);
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[inline(always)]
fn add(self, other: Self) -> Self {
unsafe {
Self(_mm256_add_epi32(self.0, other.0))
}
}
}
This relies on the caller function to have the appropriate features (which is reasonable), and forces inlining (which might be less reasonable for more complex functions).
Even with target_features_11
, the following is not possible, because target_feature
is not allowed on safe trait methods:
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[target_feature(enable = "avx2")]
fn add(self, other: Self) -> Self {
unsafe {
Self(_mm256_add_epi32(self.0, other.0))
}
}
}
A workaround to still leave inlining decisions up to the compiler is the following:
Even with target_features_11
, the following is not possible, because target_feature
is not allowed on safe trait methods:
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[inline(always)]
fn add(self, other: Self) -> Self {
#[target_feature(enable = "avx2")]
unsafe fn inner(this: Self, other: Self) -> Self {
Self(_mm256_add_epi32(this.0, other.0))
}
unsafe {
inner(self, other)
}
}
}
However, this is somewhat cumbersome.
#[unsafe(target_feature(enable = ...))]
has the same semantics as #[target_feature(enable = ...)]
, except that the annotated function (or trait method) is entirely safe: it is up to the person adding the attribute to ensure that the safety invariants of the function arguments ensure it can only be called if features are guaranteed to be present.
Example:
impl std::ops::Add<Avx2u32x8> for Avx2u32x8 {
type Output = Avx2u32x8;
#[unsafe(target_feature(enable = "avx2"))]
fn add(self, other: Self) -> Self {
// no unsafe with https://github.com/rust-lang/libs-team/issues/494
unsafe {
Self(_mm256_add_epi32(this.0, other.0))
}
}
}