# Relaxed-math mode proposal
This is a proposal to modify the [relaxed-simd] proposal to use a new "relaxed-math mode".
It is a compromise that would enable most of the relaxed-simd optimizations, while creating a path to fully-deterministic arithmetic.
## Proposal
* Remove `relaxed_dot_bf16x8_add_f32x4` (aka `bfloat16_dot_product`)
* Rename `relaxed_madd` (aka `relaxed_fma`) to `alternate_fma`.
* Rename `relaxed_nmadd` (aka `relaxed_fnma`) to `alternate_fnma`.
* Rename the remaining `relaxed_*` to `alternate_*`.
* Change the binary opcodes for `alternate_*` (after the `0xfd` prefix) to start at 0x100000 instead of 0x100.
* Define deterministic semantics for all `alternate_*` instructions:
* For `swizzle`, `laneselect`, `q15mulr_s`, and `dot_i8x16_i7x16_*`, define them to have some agreeable deterministic semantics TBD.
* For `fma`/`fnma`, define them as IEEE 754 `fusedMultiplyAdd` (single rounding), adjusted for the negation in `fnma` and for Wasm's overall exception, rounding mode, and NaN stance.
* Define the rest to be identical to their non-`alternate_` counterparts.
* Change Wasm's NaN semantics:
* The result of any non-bitwise floating-point instruction when it returns a NaN is a canonical NaN (sign bit is zero, quiet bit is one, remaining mantissa bits are zero).
* Define a "relaxed-math mode". In this mode:
* All the `alternate_*` instructions have the non-deterministic semantics proposed in the [relaxed-simd] proposal.
* Wasm's NaN behavior is nondeterministic, using the NaN semantics previously specified in the core spec.
## Outlook for toolchains
As in the existing relaxed-simd proposal, we'd expect toolchains like clang to only use the `alternate_*` instructions under a sepecial flag, like `-mrelaxed-simd`. and only with explicit source-code intrinsics such as `__builtin_alternate_*`.
## Outlook for implementors
Web implementors would use either strict mode or relaxed-math mode, or a mix of both since strict mode is a subset of relaxed-math mode. Privacy-focused implementations may want to use strict mode to reduce their fingerprinting surface area.
Cloud/Edge implementors are expected to use CPUs with hardware `fma` instructions, and making the remaining `alternate_*` instructions deterministic is expected to have a relatively modest cost (about 1-10 extra instructions for the expected cases) in the context of a larger algorithm, so many implementors will hopefully offer deterministic mode, to maximize the portability/migration/snapshot-restore/debuggability/etc. opportunities.
Other implementors could chose based on what host CPUs they wish to support, and what the level of nondeterminism they desire.
## Appeal to hardware designers
Hardware designers are encouraged to design instructions supporting the strict mode semantics. This can be done incrementally. `min` and `max` would be a great place to start; they follow IEEE 754-2019 `minimum` and `maximum`, and such instructions would help JavaScript and Java too!
## Why remove `relaxed_dot_bf16x8_add_f32x4`?
The relaxed-simd proposal defines this instruction to be nondeterministic in [three different ways](https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#relaxed-bfloat16-dot-product) due to differences between popular architectures. It appears that any deterministic semantics we might chose for this would be prohibitively expensive to implement on some popular architectures.
## What kind of nondeterminism would the `alternate_*` instructions use?
There are two main options being discussed:
- They could use list nondeterminism, and the idea would be that having relaxed semantics contained in a special relaxed-math mode would minimize the impact on the main language specification.
- Or we could use staged compilation, and factor out the non-determinism to an import.
This proposal is ok with either.
[relaxed-simd]: https://github.com/WebAssembly/relaxed-simd/