64-bit changes

# 64-bit changes Main points of 64-bit: - Registers are now 64-bit. - Some of the instructions now operate on 64-bits only. Some of the instructions stay as 32-bit (ignoring the upper 32-bits of the register) but unconditionally sign-extend the result to 64-bit, and have a separate 64-bit variant added. - If an instruction writes to a register then it always overwrites all 64-bits (there are no instructions that do partial register updates like there are on 64-bit x86). - When accessing memory the upper 32-bits of addresses are always ignored. - Operand encoding stays the same; physical immediate encoding is still at most 32-bits; when a 64-bit value is needed the immediate is sign extended to 64-bit. - Since some instructions now come in 32-bit and 64-bit variants it'd be a good idea to have `32` and `64` in their names to clearly disambiguate between them (as opposed to keeping one without any number and only e.g. the 64-bit one with 64). ## A.5.3. Instructions with Arguments of Two Immediates. Changes to existing instructions: - `store_imm_u32` - ignores upper 32-bits of the register New instructions: - `store_imm_u64` - sign-extends the immediate to 64-bit and stores it in memory ## A.5.5. Instructions with Arguments of One Register & One Immediate. Changes to existing instructions: - `load_imm` - the immediate value (which can be at most 32-bit due to our instruction encoding) is sign extended to 64-bit and loaded into the register - `load_i8` - is sign extended to 64-bit now - `load_i16` - is sign extended to 64-bit now - `store_u32` - ignores upper 32-bits of the register (add $mod\ 2^{32}$ to the equation) New instructions: - `load_i32` - reads a 32-bit value, sign extends it to 64-bit and loads it into a register - `load_u64` - reads a 64-bit value and loads it into a register - `store_u64` - stores the full 64-bit register in memory Opcode changes: - swap `load_i32` and `load_u32`'s opcodes (why: the raw instruction stays semantically the same wrt arithmetic instructions, due to RISC-V's convention of always sign extending 32-bit values) ## A.5.6. Instructions with Arguments of One Register & Two Immediates. New instructions: - `store_imm_ind_u64` - sign-extends the immediate to 64-bit and stores it in memory ## A.5.7. Instructions with Arguments of One Register, One Immediate and One Offset. Changes to existing instructions: - `load_imm_jump` - similar to `load_imm`, should sign extend the value to 64-bit before loading it into register - `branch_*_imm` - sign extend the immediate to full 64-bit before comparison, signed comparisons are now 64-bit ## A.5.9. Instructions with Arguments of Two Registers & One Immediate. Changes to existing instructions: - `load_ind_i8` - is sign extended to 64-bit now - `load_ind_i16` - is sign extended to 64-bit now - `add_imm`, `shlo_l_imm`, `neg_add_imm`, `shlo_l_imm_alt`, `shlo_r_imm_alt`, `shar_r_imm_alt` - the result is sign extended to 64-bit, add `32` to the name - `and_imm`, `xor_imm`, `or_imm` - the immediate is sign-extended to 64-bit, and the bitwise operation is applied to the 64-bit value from the register - `mul_imm` - the result is sign extended to 64-bit (after being truncated to 32-bits), rename to `mul32_imm` - `set_*_*_imm` - the immediate is sign extended to 64-bit, signed comparisons are now 64-bit - `shlo_r_imm`, `shar_r_imm` - the register operand is truncated to 32-bits (so that no upper 32-bits get right-shifted into the lower 32-bits), the result is sign extended to 64-bit, add `32` to the names - `cmov_iz_imm`, `cmov_nz_imm` - the immediate is sign extended to 64-bit New instructions: - `add_imm_64` - the immediate is sign extended into 64-bits, added to the value of the register, and the result is $\text{mod}\ 2^{64}$ - `mul_imm_64` - the immediate is sign extended into 64-bits, multiplied with the value of the register, and the result is $\text{mod}\ 2^{64}$ - `shlo_l_imm_64`, `shlo_r_imm_64`, `shar_r_imm_64` - 64-bit shifts, the shift amount in $v_{x}$ is now $\text{mod}\ 64$ - `neg_add_imm_64` - the immediate is sign extended into 64-bits, the operation is 64-bit - `shlo_l_imm_alt_64`, `shlo_r_imm_alt_64`, `shar_r_imm_alt_64` - the immediate is sign extended into 64-bits, the shift amount is now $\text{mod}\ 64$, the result is 64-bit - `load_ind_i32` - reads a 32-bit value, sign extends it to 64-bit and loads it into a register - `load_ind_u64` - reads a 64-bit value and loads it into a register - `store_ind_u64` - stores the full 64-bit register in memory Deleted instructions: - `mul_upper_s_s_imm`, `mul_upper_u_u_imm` - for 32-bit these did 64-bit multiplies so they had still seen some use, but for 64-bit the `mul_upper` opcodes now do 128-bit multiply (and grab the upper 64-bits) which is rare enough that it's probably not worth it to have a dedicated instruction to multiply with an immediate Opcode changes: - Swap opcodes of `load_ind_u32` and `load_ind_i32` ## A.5.10. Instructions with Arguments of Two Registers & One Offset. Changes to existing instructions: - `branch_lt_s`, `branch_ge_s` - comparisons are now 64-bit ## A.5.11. Instruction with Arguments of Two Registers and Two Immediates. Changes to existing instructions: - `load_imm_jump_ind` - similar to `load_imm`, should sign extend the value to 64-bit before loading it into register ## A.5.12. Instructions with Arguments of Three Registers. Changes to existing instructions: - `add`, `sub`, `mul`, `div_u`, `div_s`, `rem_u`, `rem_s`, `shlo_l`, `shlo_r`, `shar_r` - sign extend the resulting value to 64-bits, rename the instructions to have `32` in the name - `and` - instruction now operates on 64-bits - `xor` - instruction now operates on 64-bits - `or` - instruction now operates on 64-bits - `set_lt_s` - the comparison is now 64-bit - `mul_upper_s_s` - the instruction now returns the upper 64-bits of a signed x signed 128-bit multiply (where previously it returned the upper 32-bits of a signed x signed 64-bit multiply) - `mul_upper_u_u`, `mul_upper_s_u` - similar change as `mul_upper_s_s` New instructions: - `add_64`, `sub_64`, `mul_64`, `div_u_64`, `div_s_64`, `rem_u_64`, `rem_s_64`, `shlo_l_64`, `shlo_r_64`, `shar_r_64` - 64-bit variants of existing instructions --------------- Remaining changes to the instruction set until its feature complete (I'd propose to add this as the last final update after the 64-bit update; I'll add this as soon as possible once I get the 64-bit actually fully implemented): - `load_imm64` - a load immediate instruction in the instruction set which takes a full 64-bit immediate without sign extension; will need a dedicated "instructions with arguments of one extended width immediate" or something like that - the `Zbb` RISC-V extension (various bitmanipulation instructions; Alex wants this for accelerated Ethereum compatibility) - `memcpy`, `memset` (essentially instructions to accelerate copying memory and clearing memory; for WASM I've measured up to a 20% performance improvement in certain use cases when the Polkadot runtime was compiled with these enabled for WASM) - opcode reordering? Originally I've picked the opcode numbers (except for trap) to be ordered according to how frequently a given instruction is used to make instruction decoders a little bit more icache friendly in certain cases (because I had to pick *some* order, so why not make it the most efficient one); obviously after significant instruction set changes the order gets messed up. In general this is just a "nice to have" and not a huge deal if we don't do this, but if we do then it makes sense to do it as the very last thing once all instructions are in.