Relaxed-math mode proposal

# Relaxed-math mode proposal This is a proposal to modify the [relaxed-simd] proposal to use a new "relaxed-math mode". It is a compromise that would enable most of the relaxed-simd optimizations, while creating a path to fully-deterministic arithmetic. ## Proposal * Remove `relaxed_dot_bf16x8_add_f32x4` (aka `bfloat16_dot_product`) * Rename `relaxed_madd` (aka `relaxed_fma`) to `alternate_fma`. * Rename `relaxed_nmadd` (aka `relaxed_fnma`) to `alternate_fnma`. * Rename the remaining `relaxed_*` to `alternate_*`. * Change the binary opcodes for `alternate_*` (after the `0xfd` prefix) to start at 0x100000 instead of 0x100. * Define deterministic semantics for all `alternate_*` instructions: * For `swizzle`, `laneselect`, `q15mulr_s`, and `dot_i8x16_i7x16_*`, define them to have some agreeable deterministic semantics TBD. * For `fma`/`fnma`, define them as IEEE 754 `fusedMultiplyAdd` (single rounding), adjusted for the negation in `fnma` and for Wasm's overall exception, rounding mode, and NaN stance. * Define the rest to be identical to their non-`alternate_` counterparts. * Change Wasm's NaN semantics: * The result of any non-bitwise floating-point instruction when it returns a NaN is a canonical NaN (sign bit is zero, quiet bit is one, remaining mantissa bits are zero). * Define a "relaxed-math mode". In this mode: * All the `alternate_*` instructions have the non-deterministic semantics proposed in the [relaxed-simd] proposal. * Wasm's NaN behavior is nondeterministic, using the NaN semantics previously specified in the core spec. ## Outlook for toolchains As in the existing relaxed-simd proposal, we'd expect toolchains like clang to only use the `alternate_*` instructions under a sepecial flag, like `-mrelaxed-simd`. and only with explicit source-code intrinsics such as `__builtin_alternate_*`. ## Outlook for implementors Web implementors would use either strict mode or relaxed-math mode, or a mix of both since strict mode is a subset of relaxed-math mode. Privacy-focused implementations may want to use strict mode to reduce their fingerprinting surface area. Cloud/Edge implementors are expected to use CPUs with hardware `fma` instructions, and making the remaining `alternate_*` instructions deterministic is expected to have a relatively modest cost (about 1-10 extra instructions for the expected cases) in the context of a larger algorithm, so many implementors will hopefully offer deterministic mode, to maximize the portability/migration/snapshot-restore/debuggability/etc. opportunities. Other implementors could chose based on what host CPUs they wish to support, and what the level of nondeterminism they desire. ## Appeal to hardware designers Hardware designers are encouraged to design instructions supporting the strict mode semantics. This can be done incrementally. `min` and `max` would be a great place to start; they follow IEEE 754-2019 `minimum` and `maximum`, and such instructions would help JavaScript and Java too! ## Why remove `relaxed_dot_bf16x8_add_f32x4`? The relaxed-simd proposal defines this instruction to be nondeterministic in [three different ways](https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#relaxed-bfloat16-dot-product) due to differences between popular architectures. It appears that any deterministic semantics we might chose for this would be prohibitively expensive to implement on some popular architectures. ## What kind of nondeterminism would the `alternate_*` instructions use? There are two main options being discussed: - They could use list nondeterminism, and the idea would be that having relaxed semantics contained in a special relaxed-math mode would minimize the impact on the main language specification. - Or we could use staged compilation, and factor out the non-determinism to an import. This proposal is ok with either. [relaxed-simd]: https://github.com/WebAssembly/relaxed-simd/