- Feature Name: `avx10`
- Start Date: 2024-10-07
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
# Summary
Adding `AVX10` target features to Rust
# Motivation
(For reference, CAPS refer to a CPU ISA, and `small` refers to a Rust or LLVM target feature).
Intel has decided to streamline the mess of ISAs that AVX512 was, and started the AVX10 project.
It is a versioned system, with a maximum vector width - e.g, AVX10.N-256 means that this is the Nth iteration of the ISA, and this implementation supports only vectors (and operations) of up to 256 bits.
There will be 2 ISAs in each iteration - AVX10.N-256 and AVX10.N-512.
The first iteration hasn't added any new instructions, just augmented AVX512 - if a CPU supports AVX10.1-512, it supports all AVX512 instructions, and if the CPU just supports AVX10.1-256, it supports all AVX512 instructions of 128- and 256-bit vectors.
The problem stems there - Rust has the target features `avx512f` etc to indicate support for AVX512.
LLVM, on the other hand, uses a different mechanism.
The LLVM feature `evex512` indicates the presence of 512-bit instructions, and the LLVM feature `avx512f` only means support of AVX512F instructions up to at least 256 bits - so LLVM uses `avx512f+evex512` to actually indicate the AVX512F support.
For this reason, the Rust feature `avx512f` actually maps to the LLVM features `avx512f+evex512` - so currently Rust code **cannot** use AVX10.N-256.
This proposal is to add AVX10.N-256 to Rust, and discuss the complications.
In this document, we propose 2 possible ways to add AVX10 to Rust.
# `evex512` Approach
## Guide-level explanation
This proposal just designates the target feature `evex512` for 512-bit vectors - if a function uses 512-bit vectors, it must have this target feature enabled.
For an example,
```Rust
#[target_feature(enable = "avx512f")]
fn do_256_bit_things() {
// Only 128- and 256-bit instructions/intrinsics from AVX512F (e.g. _mm256_reduce_add_pd) can be used here
}
#[target_feature(enable = "avx512f,evex512")]
fn do_512_bit_things() {
// All instructions/intrinsics from AVX512F can be used here
}
```
This also changes the `avx512*` target features to just mean that at least 128- and 256-bit versions of that cpu feature are available.
So, to use 512-bit AVX512F intrinsics, a function must have both `avx512f` and `evex512` (as shown in the example).
## Reference-level explanation
This proposal changes the semantics of all `avx512*` target features and adds a new target-feature `evex512` which indicates the presence of 512-bit instructions in the CPU.
This mechanism would mean that the target-feature `avx10.1-256`, when introduced, will imply `avx512f`, `avx512vl` etc, but **not** `evex512`.
And the target feature `avx10.1-512` will imply `avx10.1-256` and `evex512`.
This system essentially mirrors LLVM's solution to this issue.
A function with target feature `avx512f` will work with AVX512, AVX10.1-256 and AVX10.1-512 CPUs.
But a function with target features `avx512f` and `evex512` will work only with AVX512 and AVX10.1-512 CPUs.
The `is_x86_feature_detected` macro will also have to change - even though this would mean changing a stable API (the AVX512 feature detection was stabilized in `1.27.0`).
This would behave according to the target feature.
```Rust
// On a AVX512 or AVX10.1-512 CPU
println!("{}", is_x86_feature_detected!("avx512f")); // true
println!("{}", is_x86_feature_detected!("evex512")); // true
// On a AVX10.1-256 CPU
println!("{}", is_x86_feature_detected!("avx512f")); // true
println!("{}", is_x86_feature_detected!("evex512")); // false
```
Only functions with target feature `evex512` will be able to use 512-vectors (even when not using AVX512) due to ABI constraints.
This is similar to the restriction on 256-bit vectors with `avx`.
## Rationale
This preserves parity with LLVM features, and introduces just 1 new target feature.
Using it should be pretty intuitive with little experience, as `evex512` just means existence of 512-bit vectors.
## Drawbacks
The main counterargument to this proposal seems to be the loss of association between CPU features and Rust target features.
This proposal would imply that a function with target feature `avx512f` will only be able to use a subset of AVX512F instructions.
This may be counterintuitive to programmers.
Also, introducing this might break a lot of code in the ecosystem, as many crates use AVX512 to accelerate computing.
This proposal would mean that many of their uses of AVX512 target features have to be augmented with `evex512`, otherwise a SIGILL may result.
Though might be acceptable because the AVX512 target features are still unstable.
[Ralf: I would also mention that using `avx512f` as name for something that indicates 256bit operations is terrible naming. If everybody else ~~jumps of a cliff~~ names something in a totally nonsensical way, do we follow or do we pick sensible names instead?]
# `avx256` Approach
## Guide-level explanation
This proposal adds 14 new target features
- `avx256f`
- `avx256vl`
- `avx256bw`
- `avx256dq`
- `avx256ifma`
- `avx256vnni`
- `avx256vbmi`
- `avx256vbmi2`
- `avx256cd`
- `avx256vpopcntdq`
- `avx256vp2intersect`
- `avx256bitalg`
- `avx256bf16`
- `avx256fp16`
(There are some more possible naming conventions, e.g. `avx512f-256`).
These will function as the corresponding `avx512*` target feature, but only allowing operations on vectors of upto 256-bits.
For an example,
```Rust
#[target_feature(enable = "avx256f")]
fn do_256_bit_things() {
// Only 128- and 256-bit instructions/intrinsics from AVX512F (e.g. _mm256_reduce_add_pd) can be used here
}
#[target_feature(enable = "avx512f")]
fn do_512_bit_things() {
// All instructions/intrinsics from AVX512F can be used here
}
```
## Reference-level explanation
This doesn't change the semantics of any of the `avx512*` target features - this just adds new target features for each AVX512 ISA restricting it to only 256-bit vectors.
This mechanism would mean that the target-feature `avx10.1-256`, when introduced, will imply `avx256f`, `avx256bw` etc, but **not** `avx512f`.
And the target feature `avx10.1-512` will imply `avx10.1-256`, `avx512f`, `avx512bw` etc.
This creates a parallel ISA system to the existing AVX512 system, but for 256-bit system.
A function with target feature `avx256f` will work with AVX512, AVX10.1-256 and AVX10.1-512 CPUs.
But a function with target feature `avx512f` will work only with AVX512 and AVX10.1-512 CPUs.
The `is_x86_feature_detected` macro will not change its current behaviour, just adding new behaviour for `avx256*` target features.
```Rust
// On a AVX512 or AVX10.1-512 CPU
println!("{}", is_x86_feature_detected!("avx256f")); // true
println!("{}", is_x86_feature_detected!("avx512f")); // true
// On a AVX10.1-256 CPU
println!("{}", is_x86_feature_detected!("avx256f")); // true
println!("{}", is_x86_feature_detected!("avx512f")); // false
```
Only functions with target feature `avx512*` will be able to use 512-vectors (even when not using AVX512) due to ABI constraints.
This is similar to the restriction on 256-bit vectors with `avx`.
In effect, the `avx512f` target feature will map to LLVM feature `avx512f+evex512`, and the `avx256f` target feature will map to the LLVM feature `avx512f`.
As AVX512VL is somewhat different, having only 128- and 256-bit instructions, the only difference is that `avx512vl` implies `avx512f`, so it can use all AVX512F intrinsics,as well as AVX512VL intrinsics.
On the other hand, `avx256vl` only implies `avx256f`, so it can use only 128- and 256-bit AVX512F instructions along with all AVX512VL intrinsics.
## Rationale
This is completely backward-compatible, and preserves the association of CPU features and Rust features - `avx512f` simply means the existence of AVX512F.
## Drawbacks
The main criticism seems to be that the addition of that many target features can confuse users.
Also, this creates target features that do not directly map to a CPU feature, which some people might object to.
# Prior art
Both GCC and CLang uses a `evex512` style solution - but they didn't break any code when they introduced it because of negative target features.
Their target attribute `avx512f` by default enables `evex512`, and functions that don't need 512-bit vectors might use `no-evex512` to disable `evex512`.
This can't be used in Rust because negative target features are capable of causing ABI mismatches and unsoundness (e.g. [#116573](https://github.com/rust-lang/rust/issues/116573)).
So, we need to come up with our own solution to this.
# Future possibilities
**This would unblock the stabilization of AVX512 target features and intrinsics**, helping programmers leverage AVX512 capabilities from Stable Rust.
This has the capability to improve performance throughout the ecosystem as AVX512 (or at least VEX variants of AVX512) supporting CPUs are becoming commonplace.
Also, in future this would allow smooth integration of other AVX10 iterations (in the time of writing this, AVX10.2 support has already landed in GCC).