RV32 support for Tina coroutine
黃守維
GitHub
Task Description
My task is to add RV32I support to the Tina project using inline assembly for extensions. Additionally, I need to:
- Consider preserving RV32F/RV32D (floating-point) registers during execution.
- Validate the implementation on rv32emu or the Spike emulator.
- Develop test programs (self-written) to ensure the proper functionality of coroutines.
- Finally, contribute the code back to the Tina project.
Tina
Symmetric Coroutines:
- The primary mechanism for symmetric coroutines is the use of
TINA_swap()
.
- This allows switching between two fibers, including the main program.
Asymmetric Coroutines:
- Asymmetric coroutines use:
TINA_resume()
to enter a sub-coroutine.
TINA_yield()
to return control to the caller.
Assembly Architecture in Tina
In Tina, the tina.h
file uses multiple #ifdef
directives to distinguish between different platforms, such as:
Context switching is implemented using:
- Inline assembly
- Dedicated assembly blocks for platform-specific functionality.
Analysis of RISC-V RV32I + F/D Registers
General Registers (Integer Registers)
- RISC-V RV32I includes 32 general-purpose registers:
x0
to x31
, each 32-bit wide.
- During coroutine switching, the handling of saved registers and temporary registers can vary depending on the OS ABI conventions.
- To ensure the program's CPU state is fully restored after a coroutine switch, it is common practice to save and restore all registers that might be modified.
Floating-Point Registers (FPU Registers)
- If RV32F or RV32D is supported, there are an additional 32 floating-point registers:
f0
to f31
.
- Each register is 32-bit wide (RV32F) or 64-bit wide (RV32D).
Considerations for Coroutine Switching:
- To prevent disrupting floating-point operations between coroutine switches:
- All floating-point registers must be saved as part of the coroutine context.
- These registers should be restored when switching back to the coroutine.
Reference the Tina RV64 & the Minicoro RV32I Approach
Referencing the Tina RV64 Approach
The original RV64 approach uses assembly blocks to:
- Initialize the stack when starting a new coroutine.
- Save and restore both general-purpose and floating-point registers.
Below is an example of how the RV64 code handles this process:
#define TINA_ABI_aarch32 (__ARM_EABI__ && __GNUC__)
#define TINA_ABI_aarch64 (__aarch64__ && __GNUC__)
#define TINA_ABI_i386 ((__i386__ && __GNUC__) || (_M_IX86 && _MSC_VER))
#define TINA_ABI_SysV_AMD64 (__amd64__ && __GNUC__ && (__unix__ || __APPLE__ || __HAIKU__))
#define TINA_ABI_WIN64 ((__WIN64__ && __GNUC__) || (_M_AMD64 && _MSC_VER))
#define TINA_ABI_riscv64gc (__riscv && __riscv_xlen == 64 && __riscv_flen == 64)
asm("_tina_init_stack:");
asm(" addi sp, sp, -0xD0");
asm(" sd sp, (a1)");
asm(" sd ra, 0xC8(sp)");
asm(" sd s0, 0xC0(sp)");
asm(" sd s1, 0xB8(sp)");
asm(" sd s2, 0xB0(sp)");
asm(" sd s3, 0xA8(sp)");
asm(" sd s4, 0xA0(sp)");
asm(" sd s5, 0x98(sp)");
asm(" sd s6, 0x90(sp)");
asm(" sd s7, 0x88(sp)");
asm(" sd s8, 0x80(sp)");
asm(" sd s9, 0x78(sp)");
asm(" sd s10, 0x70(sp)");
asm(" sd s11, 0x68(sp)");
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" andi a2, a2, ~0xF");
asm(" mv sp, a2");
asm(" mv ra, x0");
asm(" tail _tina_start");
asm("_tina_swap:");
asm(" addi sp, sp, -0xD0");
asm(" sd sp, (a0)");
asm(" sd ra, 0xC8(sp)");
asm(" sd s0, 0xC0(sp)");
asm(" sd s1, 0xB8(sp)");
asm(" sd s2, 0xB0(sp)");
asm(" sd s3, 0xA8(sp)");
asm(" sd s4, 0xA0(sp)");
asm(" sd s5, 0x98(sp)");
asm(" sd s6, 0x90(sp)");
asm(" sd s7, 0x88(sp)");
asm(" sd s8, 0x80(sp)");
asm(" sd s9, 0x78(sp)");
asm(" sd s10, 0x70(sp)");
asm(" sd s11, 0x68(sp)");
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" ld sp, (a1)");
asm(" ld ra, 0xC8(sp)");
asm(" ld s0, 0xC0(sp)");
asm(" ld s1, 0xB8(sp)");
asm(" ld s2, 0xB0(sp)");
asm(" ld s3, 0xA8(sp)");
asm(" ld s4, 0xA0(sp)");
asm(" ld s5, 0x98(sp)");
asm(" ld s6, 0x90(sp)");
asm(" ld s7, 0x88(sp)");
asm(" ld s8, 0x80(sp)");
asm(" ld s9, 0x78(sp)");
asm(" ld s10, 0x70(sp)");
asm(" ld s11, 0x68(sp)");
asm(" fld fs0, 0x60(sp)");
asm(" fld fs1, 0x58(sp)");
asm(" fld fs2, 0x50(sp)");
asm(" fld fs3, 0x48(sp)");
asm(" fld fs4, 0x40(sp)");
asm(" fld fs5, 0x38(sp)");
asm(" fld fs6, 0x30(sp)");
asm(" fld fs7, 0x28(sp)");
asm(" fld fs8, 0x20(sp)");
asm(" fld fs9, 0x18(sp)");
asm(" fld fs10, 0x10(sp)");
asm(" fld fs11, 0x08(sp)");
asm(" addi sp, sp, 0xD0");
asm(" mv a0, a2");
asm(" ret");
Referencing the Minicoro RV32I Approach
Minicoro provides a lightweight and efficient implementation for coroutine context switching, including support for RISC-V RV32I architecture. The approach leverages inline assembly to save and restore both general-purpose and floating-point registers, ensuring seamless coroutine operation. Below is an example of how the registers are managed in Minicoro, highlighting its simplicity and adherence to RISC-V conventions.
Key Features of Minicoro's Approach:
- General-Purpose Register Management:
- Saves and restores
s0
through s11
, ra
, and sp
to maintain the coroutine's execution context.
- Floating-Point Register Support:
- Includes conditional handling for
RV32F
and RV32D
extensions, preserving floating-point computation results across coroutine switches.
- Scalable Design:
- Uses preprocessor directives like
__riscv_xlen
and __riscv_flen
to adapt the implementation to different configurations (e.g., RV32I, RV32F, RV32D).
Below is an example of how the RV32I code handles this process:
#elif __riscv_xlen == 32
" sw s0, 0x00(a0)\n"
" sw s1, 0x04(a0)\n"
" sw s2, 0x08(a0)\n"
" sw s3, 0x0c(a0)\n"
" sw s4, 0x10(a0)\n"
" sw s5, 0x14(a0)\n"
" sw s6, 0x18(a0)\n"
" sw s7, 0x1c(a0)\n"
" sw s8, 0x20(a0)\n"
" sw s9, 0x24(a0)\n"
" sw s10, 0x28(a0)\n"
" sw s11, 0x2c(a0)\n"
" sw ra, 0x30(a0)\n"
" sw ra, 0x34(a0)\n"
" sw sp, 0x38(a0)\n"
#ifdef __riscv_flen
#if __riscv_flen == 64
" fsd fs0, 0x3c(a0)\n"
" fsd fs1, 0x44(a0)\n"
" fsd fs2, 0x4c(a0)\n"
" fsd fs3, 0x54(a0)\n"
" fsd fs4, 0x5c(a0)\n"
" fsd fs5, 0x64(a0)\n"
" fsd fs6, 0x6c(a0)\n"
" fsd fs7, 0x74(a0)\n"
" fsd fs8, 0x7c(a0)\n"
" fsd fs9, 0x84(a0)\n"
" fsd fs10, 0x8c(a0)\n"
" fsd fs11, 0x94(a0)\n"
" fld fs0, 0x3c(a1)\n"
" fld fs1, 0x44(a1)\n"
" fld fs2, 0x4c(a1)\n"
" fld fs3, 0x54(a1)\n"
" fld fs4, 0x5c(a1)\n"
" fld fs5, 0x64(a1)\n"
" fld fs6, 0x6c(a1)\n"
" fld fs7, 0x74(a1)\n"
" fld fs8, 0x7c(a1)\n"
" fld fs9, 0x84(a1)\n"
" fld fs10, 0x8c(a1)\n"
" fld fs11, 0x94(a1)\n"
#elif __riscv_flen == 32
" fsw fs0, 0x3c(a0)\n"
" fsw fs1, 0x40(a0)\n"
" fsw fs2, 0x44(a0)\n"
" fsw fs3, 0x48(a0)\n"
" fsw fs4, 0x4c(a0)\n"
" fsw fs5, 0x50(a0)\n"
" fsw fs6, 0x54(a0)\n"
" fsw fs7, 0x58(a0)\n"
" fsw fs8, 0x5c(a0)\n"
" fsw fs9, 0x60(a0)\n"
" fsw fs10, 0x64(a0)\n"
" fsw fs11, 0x68(a0)\n"
" flw fs0, 0x3c(a1)\n"
" flw fs1, 0x40(a1)\n"
" flw fs2, 0x44(a1)\n"
" flw fs3, 0x48(a1)\n"
" flw fs4, 0x4c(a1)\n"
" flw fs5, 0x50(a1)\n"
" flw fs6, 0x54(a1)\n"
" flw fs7, 0x58(a1)\n"
" flw fs8, 0x5c(a1)\n"
" flw fs9, 0x60(a1)\n"
" flw fs10, 0x64(a1)\n"
" flw fs11, 0x68(a1)\n"
Implementation of RV32I, RV32F, and RV32D Support for Tina Coroutine
The implementation of coroutine context switching for RV32I, RV32F, and RV32D architectures in Tina. Each implementation handles saving and restoring the CPU state, including general-purpose and floating-point registers (if applicable), during coroutine switches.
RV32I (No Floating-Point Extensions)
This implementation provides support for the RV32I architecture in Tina, which involves saving and restoring general-purpose registers during coroutine context switching. It is specifically designed for RV32I without floating-point extensions.
Key Details
- Stack Space Allocation: 56 bytes (
0x38
) are allocated to save general-purpose registers.
- Registers Saved:
- General-purpose registers:
ra
, s0
to s11
.
- Alignment: The stack is aligned to a 16-byte boundary using
andi
.
Code Implementation
#define TINA_ABI_riscv32i (__riscv && __riscv_xlen == 32 && !__riscv_flen)
#elif TINA_ABI_riscv32i
asm("_tina_init_stack:");
asm(" addi sp, sp, -0x38");
asm(" sw sp, (a1)");
asm(" sw ra, 0x34(sp)");
asm(" sw s0, 0x30(sp)");
asm(" sw s1, 0x2C(sp)");
asm(" sw s2, 0x28(sp)");
asm(" sw s3, 0x24(sp)");
asm(" sw s4, 0x20(sp)");
asm(" sw s5, 0x1C(sp)");
asm(" sw s6, 0x18(sp)");
asm(" sw s7, 0x14(sp)");
asm(" sw s8, 0x10(sp)");
asm(" sw s9, 0x0C(sp)");
asm(" sw s10, 0x08(sp)");
asm(" sw s11, 0x04(sp)");
asm(" andi a2, a2, ~0xF");
asm(" mv sp, a2");
asm(" mv ra, x0");
asm(" tail _tina_start");
asm("_tina_swap:");
asm(" addi sp, sp, -0x38");
asm(" sw sp, (a0)");
asm(" sw ra, 0x34(sp)");
asm(" sw s0, 0x30(sp)");
asm(" sw s1, 0x2C(sp)");
asm(" sw s2, 0x28(sp)");
asm(" sw s3, 0x24(sp)");
asm(" sw s4, 0x20(sp)");
asm(" sw s5, 0x1C(sp)");
asm(" sw s6, 0x18(sp)");
asm(" sw s7, 0x14(sp)");
asm(" sw s8, 0x10(sp)");
asm(" sw s9, 0x0C(sp)");
asm(" sw s10, 0x08(sp)");
asm(" sw s11, 0x04(sp)");
asm(" lw sp, (a1)");
asm(" lw ra, 0x34(sp)");
asm(" lw s0, 0x30(sp)");
asm(" lw s1, 0x2C(sp)");
asm(" lw s2, 0x28(sp)");
asm(" lw s3, 0x24(sp)");
asm(" lw s4, 0x20(sp)");
asm(" lw s5, 0x1C(sp)");
asm(" lw s6, 0x18(sp)");
asm(" lw s7, 0x14(sp)");
asm(" lw s8, 0x10(sp)");
asm(" lw s9, 0x0C(sp)");
asm(" lw s10, 0x08(sp)");
asm(" lw s11, 0x04(sp)");
asm(" addi sp, sp, 0x38");
asm(" mv a0, a2");
asm(" ret");
RV32F
This implementation provides support for the RV32F architecture in Tina, which includes saving and restoring general-purpose registers and single-precision floating-point registers during coroutine context switching.
Key Details
- Stack Space Allocation: 104 bytes (
0x68
) are allocated to save general-purpose and floating-point registers.
- Registers Saved:
- General-purpose registers:
ra
, s0
to s11
.
- Floating-point registers:
fs0
to fs11
.
- Alignment: The stack is aligned to a 16-byte boundary using
andi
.
Code Implementation
#define TINA_ABI_riscv32f (__riscv && __riscv_xlen == 32 && __riscv_flen == 32)
#elif TINA_ABI_riscv32f
asm("_tina_init_stack:");
asm(" addi sp, sp, -0x68");
asm(" sw sp, (a1)");
asm(" sw ra, 0x64(sp)");
asm(" sw s0, 0x60(sp)");
asm(" sw s1, 0x5C(sp)");
asm(" sw s2, 0x58(sp)");
asm(" sw s3, 0x54(sp)");
asm(" sw s4, 0x50(sp)");
asm(" sw s5, 0x4C(sp)");
asm(" sw s6, 0x48(sp)");
asm(" sw s7, 0x44(sp)");
asm(" sw s8, 0x40(sp)");
asm(" sw s9, 0x3C(sp)");
asm(" sw s10, 0x38(sp)");
asm(" sw s11, 0x34(sp)");
asm(" fsw fs0, 0x30(sp)");
asm(" fsw fs1, 0x2C(sp)");
asm(" fsw fs2, 0x28(sp)");
asm(" fsw fs3, 0x24(sp)");
asm(" fsw fs4, 0x20(sp)");
asm(" fsw fs5, 0x1C(sp)");
asm(" fsw fs6, 0x18(sp)");
asm(" fsw fs7, 0x14(sp)");
asm(" fsw fs8, 0x10(sp)");
asm(" fsw fs9, 0x0C(sp)");
asm(" fsw fs10, 0x08(sp)");
asm(" fsw fs11, 0x04(sp)");
asm(" andi a2, a2, ~0xF");
asm(" mv sp, a2");
asm(" mv ra, x0");
asm(" tail _tina_start");
asm("_tina_swap:");
asm(" addi sp, sp, -0x68");
asm(" sw sp, (a0)");
asm(" sw ra, 0x64(sp)");
asm(" sw s0, 0x60(sp)");
asm(" sw s1, 0x5C(sp)");
asm(" sw s2, 0x58(sp)");
asm(" sw s3, 0x54(sp)");
asm(" sw s4, 0x50(sp)");
asm(" sw s5, 0x4C(sp)");
asm(" sw s6, 0x48(sp)");
asm(" sw s7, 0x44(sp)");
asm(" sw s8, 0x40(sp)");
asm(" sw s9, 0x3C(sp)");
asm(" sw s10, 0x38(sp)");
asm(" sw s11, 0x34(sp)");
asm(" fsw fs0, 0x30(sp)");
asm(" fsw fs1, 0x2C(sp)");
asm(" fsw fs2, 0x28(sp)");
asm(" fsw fs3, 0x24(sp)");
asm(" fsw fs4, 0x20(sp)");
asm(" fsw fs5, 0x1C(sp)");
asm(" fsw fs6, 0x18(sp)");
asm(" fsw fs7, 0x14(sp)");
asm(" fsw fs8, 0x10(sp)");
asm(" fsw fs9, 0x0C(sp)");
asm(" fsw fs10, 0x08(sp)");
asm(" fsw fs11, 0x04(sp)");
asm(" lw sp, (a1)");
asm(" lw ra, 0x64(sp)");
asm(" lw s0, 0x60(sp)");
asm(" lw s1, 0x5C(sp)");
asm(" lw s2, 0x58(sp)");
asm(" lw s3, 0x54(sp)");
asm(" lw s4, 0x50(sp)");
asm(" lw s5, 0x4C(sp)");
asm(" lw s6, 0x48(sp)");
asm(" lw s7, 0x44(sp)");
asm(" lw s8, 0x40(sp)");
asm(" lw s9, 0x3C(sp)");
asm(" lw s10, 0x38(sp)");
asm(" lw s11, 0x34(sp)");
asm(" flw fs0, 0x30(sp)");
asm(" flw fs1, 0x2C(sp)");
asm(" flw fs2, 0x28(sp)");
asm(" flw fs3, 0x24(sp)");
asm(" flw fs4, 0x20(sp)");
asm(" flw fs5, 0x1C(sp)");
asm(" flw fs6, 0x18(sp)");
asm(" flw fs7, 0x14(sp)");
asm(" flw fs8, 0x10(sp)");
asm(" flw fs9, 0x0C(sp)");
asm(" flw fs10, 0x08(sp)");
asm(" flw fs11, 0x04(sp)");
asm(" addi sp, sp, 0x68");
asm(" mv a0, a2");
asm(" ret");
RV32D
This implementation adds support for the RV32D architecture in Tina, which includes saving and restoring general-purpose registers and double-precision floating-point registers during coroutine context switching.
Key Details
- Stack Space Allocation: 156 bytes (
0x9C
) are allocated to save general-purpose and double-precision floating-point registers.
- Registers Saved:
- General-purpose registers:
ra
, s0
to s11
(4 bytes each).
- Double-precision floating-point registers:
fs0
to fs11
(8 bytes each).
- Alignment: The stack is aligned to a 16-byte boundary using
andi
.
Code Implementation
#define TINA_ABI_riscv32d (__riscv && __riscv_xlen == 32 && __riscv_flen == 64)
#elif TINA_ABI_riscv32d
asm("_tina_init_stack:");
asm(" addi sp, sp, -0x9C");
asm(" sw sp, (a1)");
asm(" sw ra, 0x98(sp)");
asm(" sw s0, 0x94(sp)");
asm(" sw s1, 0x90(sp)");
asm(" sw s2, 0x8C(sp)");
asm(" sw s3, 0x88(sp)");
asm(" sw s4, 0x84(sp)");
asm(" sw s5, 0x80(sp)");
asm(" sw s6, 0x7C(sp)");
asm(" sw s7, 0x78(sp)");
asm(" sw s8, 0x74(sp)");
asm(" sw s9, 0x70(sp)");
asm(" sw s10, 0x6C(sp)");
asm(" sw s11, 0x68(sp)");
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" andi a2, a2, ~0xF");
asm(" mv sp, a2");
asm(" mv ra, x0");
asm(" tail _tina_start");
asm("_tina_swap:");
asm(" addi sp, sp, -0x9C");
asm(" sw sp, (a0)");
asm(" sw ra, 0x98(sp)");
asm(" sw s0, 0x94(sp)");
asm(" sw s1, 0x90(sp)");
asm(" sw s2, 0x8C(sp)");
asm(" sw s3, 0x88(sp)");
asm(" sw s4, 0x84(sp)");
asm(" sw s5, 0x80(sp)");
asm(" sw s6, 0x7C(sp)");
asm(" sw s7, 0x78(sp)");
asm(" sw s8, 0x74(sp)");
asm(" sw s9, 0x70(sp)");
asm(" sw s10, 0x6C(sp)");
asm(" sw s11, 0x68(sp)");
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" lw sp, (a1)");
asm(" lw ra, 0x98(sp)");
asm(" lw s0, 0x94(sp)");
asm(" lw s1, 0x90(sp)");
asm(" lw s2, 0x8C(sp)");
asm(" lw s3, 0x88(sp)");
asm(" lw s4, 0x84(sp)");
asm(" lw s5, 0x80(sp)");
asm(" lw s6, 0x7C(sp)");
asm(" lw s7, 0x78(sp)");
asm(" lw s8, 0x74(sp)");
asm(" lw s9, 0x70(sp)");
asm(" lw s10, 0x6C(sp)");
asm(" lw s11, 0x68(sp)");
asm(" fld fs0, 0x60(sp)");
asm(" fld fs1, 0x58(sp)");
asm(" fld fs2, 0x50(sp)");
asm(" fld fs3, 0x48(sp)");
asm(" fld fs4, 0x40(sp)");
asm(" fld fs5, 0x38(sp)");
asm(" fld fs6, 0x30(sp)");
asm(" fld fs7, 0x28(sp)");
asm(" fld fs8, 0x20(sp)");
asm(" fld fs9, 0x18(sp)");
asm(" fld fs10, 0x10(sp)");
asm(" fld fs11, 0x08(sp)");
asm(" addi sp, sp, 0x9C");
asm(" mv a0, a2");
asm(" ret");
Preliminary work
https://github.com/riscv-collab/riscv-gnu-toolchain.git
Setup
rv32emu
https://github.com/sysprog21/rv32emu/tree/master
Setup
verify
verify RV32I (without floating-point extensions)
Always write comments in English!
rv32i.c
Compilation
Use the following command to compile the program:
Output
verify RV32F
rv32f.c
Compilation
Use the following command to compile the program:
Output
verify RV32D
rv32d.c
Compilation
Use the following command to compile the program:
Output
Contributing Back to Tina


Merged!
Reference