RV32 support for Tina coroutine

黃守維

Task Description

My task is to add RV32I support to the Tina project using inline assembly for extensions. Additionally, I need to:

Consider preserving RV32F/RV32D (floating-point) registers during execution.
Validate the implementation on rv32emu or the Spike emulator.
Develop test programs (self-written) to ensure the proper functionality of coroutines.
Finally, contribute the code back to the Tina project.

Tina

Symmetric Coroutines:

The primary mechanism for symmetric coroutines is the use of TINA_swap().
This allows switching between two fibers, including the main program.

Asymmetric Coroutines:

Asymmetric coroutines use:
- TINA_resume() to enter a sub-coroutine.
- TINA_yield() to return control to the caller.

Assembly Architecture in Tina

In Tina, the tina.h file uses multiple #ifdef directives to distinguish between different platforms, such as:

x86
ARM
RISCV (RV64)

Context switching is implemented using:

Inline assembly
Dedicated assembly blocks for platform-specific functionality.

Analysis of RISC-V RV32I + F/D Registers

General Registers (Integer Registers)

RISC-V RV32I includes 32 general-purpose registers: x0 to x31, each 32-bit wide.
During coroutine switching, the handling of saved registers and temporary registers can vary depending on the OS ABI conventions.
To ensure the program's CPU state is fully restored after a coroutine switch, it is common practice to save and restore all registers that might be modified.

Floating-Point Registers (FPU Registers)

If RV32F or RV32D is supported, there are an additional 32 floating-point registers: f0 to f31.
- Each register is 32-bit wide (RV32F) or 64-bit wide (RV32D).

Considerations for Coroutine Switching:

To prevent disrupting floating-point operations between coroutine switches:
- All floating-point registers must be saved as part of the coroutine context.
- These registers should be restored when switching back to the coroutine.

Reference the Tina RV64 & the Minicoro RV32I Approach

Referencing the Tina RV64 Approach

The original RV64 approach uses assembly blocks to:

Initialize the stack when starting a new coroutine.
Save and restore both general-purpose and floating-point registers.

Below is an example of how the RV64 code handles this process:

#define TINA_ABI_aarch32 (__ARM_EABI__ && __GNUC__) 
#define TINA_ABI_aarch64 (__aarch64__ && __GNUC__)
#define TINA_ABI_i386 ((__i386__ && __GNUC__) || (_M_IX86 && _MSC_VER))
#define TINA_ABI_SysV_AMD64 (__amd64__ && __GNUC__ && (__unix__ || __APPLE__ || __HAIKU__))
#define TINA_ABI_WIN64 ((__WIN64__ && __GNUC__) || (_M_AMD64 && _MSC_VER))
#define TINA_ABI_riscv64gc (__riscv && __riscv_xlen == 64 && __riscv_flen == 64)
// ...

// 64bit riscv w/ 64 bit floats
	// push s0-s11, fs0-fs11
	asm("_tina_init_stack:");
	asm("  addi sp, sp, -0xD0");
	asm("  sd  sp, (a1)");
	asm("  sd  ra,   0xC8(sp)");
	asm("  sd  s0,   0xC0(sp)");
	asm("  sd  s1,   0xB8(sp)");
	asm("  sd  s2,   0xB0(sp)");
	asm("  sd  s3,   0xA8(sp)");
	asm("  sd  s4,   0xA0(sp)");
	asm("  sd  s5,   0x98(sp)");
	asm("  sd  s6,   0x90(sp)");
	asm("  sd  s7,   0x88(sp)");
	asm("  sd  s8,   0x80(sp)");
	asm("  sd  s9,   0x78(sp)");
	asm("  sd  s10,  0x70(sp)");
	asm("  sd  s11,  0x68(sp)");
	asm("  fsd fs0,  0x60(sp)");
	asm("  fsd fs1,  0x58(sp)");
	asm("  fsd fs2,  0x50(sp)");
	asm("  fsd fs3,  0x48(sp)");
	asm("  fsd fs4,  0x40(sp)");
	asm("  fsd fs5,  0x38(sp)");
	asm("  fsd fs6,  0x30(sp)");
	asm("  fsd fs7,  0x28(sp)");
	asm("  fsd fs8,  0x20(sp)");
	asm("  fsd fs9,  0x18(sp)");
	asm("  fsd fs10, 0x10(sp)");
	asm("  fsd fs11, 0x08(sp)");
	asm("  andi a2, a2, ~0xF");
	asm("  mv sp, a2");
	asm("  mv ra, x0");
	asm("  tail _tina_start");
	
	asm("_tina_swap:");
	asm("  addi sp, sp, -0xD0");
	asm("  sd sp, (a0)");
	asm("  sd  ra,   0xC8(sp)");
	asm("  sd  s0,   0xC0(sp)");
	asm("  sd  s1,   0xB8(sp)");
	asm("  sd  s2,   0xB0(sp)");
	asm("  sd  s3,   0xA8(sp)");
	asm("  sd  s4,   0xA0(sp)");
	asm("  sd  s5,   0x98(sp)");
	asm("  sd  s6,   0x90(sp)");
	asm("  sd  s7,   0x88(sp)");
	asm("  sd  s8,   0x80(sp)");
	asm("  sd  s9,   0x78(sp)");
	asm("  sd  s10,  0x70(sp)");
	asm("  sd  s11,  0x68(sp)");
	asm("  fsd fs0,  0x60(sp)");
	asm("  fsd fs1,  0x58(sp)");
	asm("  fsd fs2,  0x50(sp)");
	asm("  fsd fs3,  0x48(sp)");
	asm("  fsd fs4,  0x40(sp)");
	asm("  fsd fs5,  0x38(sp)");
	asm("  fsd fs6,  0x30(sp)");
	asm("  fsd fs7,  0x28(sp)");
	asm("  fsd fs8,  0x20(sp)");
	asm("  fsd fs9,  0x18(sp)");
	asm("  fsd fs10, 0x10(sp)");
	asm("  fsd fs11, 0x08(sp)");

	asm("  ld sp, (a1)");
	asm("  ld  ra,   0xC8(sp)");
	asm("  ld  s0,   0xC0(sp)");
	asm("  ld  s1,   0xB8(sp)");
	asm("  ld  s2,   0xB0(sp)");
	asm("  ld  s3,   0xA8(sp)");
	asm("  ld  s4,   0xA0(sp)");
	asm("  ld  s5,   0x98(sp)");
	asm("  ld  s6,   0x90(sp)");
	asm("  ld  s7,   0x88(sp)");
	asm("  ld  s8,   0x80(sp)");
	asm("  ld  s9,   0x78(sp)");
	asm("  ld  s10,  0x70(sp)");
	asm("  ld  s11,  0x68(sp)");
	asm("  fld fs0,  0x60(sp)");
	asm("  fld fs1,  0x58(sp)");
	asm("  fld fs2,  0x50(sp)");
	asm("  fld fs3,  0x48(sp)");
	asm("  fld fs4,  0x40(sp)");
	asm("  fld fs5,  0x38(sp)");
	asm("  fld fs6,  0x30(sp)");
	asm("  fld fs7,  0x28(sp)");
	asm("  fld fs8,  0x20(sp)");
	asm("  fld fs9,  0x18(sp)");
	asm("  fld fs10, 0x10(sp)");
	asm("  fld fs11, 0x08(sp)");
	asm("  addi sp, sp, 0xD0");
	asm("  mv a0, a2");
	asm("  ret");

Referencing the Minicoro RV32I Approach

Minicoro provides a lightweight and efficient implementation for coroutine context switching, including support for RISC-V RV32I architecture. The approach leverages inline assembly to save and restore both general-purpose and floating-point registers, ensuring seamless coroutine operation. Below is an example of how the registers are managed in Minicoro, highlighting its simplicity and adherence to RISC-V conventions.

Key Features of Minicoro's Approach:

General-Purpose Register Management:
- Saves and restores s0 through s11, ra, and sp to maintain the coroutine's execution context.
Floating-Point Register Support:
- Includes conditional handling for RV32F and RV32D extensions, preserving floating-point computation results across coroutine switches.
Scalable Design:
- Uses preprocessor directives like __riscv_xlen and __riscv_flen to adapt the implementation to different configurations (e.g., RV32I, RV32F, RV32D).

Below is an example of how the RV32I code handles this process:

#elif __riscv_xlen == 32
    "  sw s0, 0x00(a0)\n"
    "  sw s1, 0x04(a0)\n"
    "  sw s2, 0x08(a0)\n"
    "  sw s3, 0x0c(a0)\n"
    "  sw s4, 0x10(a0)\n"
    "  sw s5, 0x14(a0)\n"
    "  sw s6, 0x18(a0)\n"
    "  sw s7, 0x1c(a0)\n"
    "  sw s8, 0x20(a0)\n"
    "  sw s9, 0x24(a0)\n"
    "  sw s10, 0x28(a0)\n"
    "  sw s11, 0x2c(a0)\n"
    "  sw ra, 0x30(a0)\n"
    "  sw ra, 0x34(a0)\n" /* pc */
    "  sw sp, 0x38(a0)\n"
    #ifdef __riscv_flen
    #if __riscv_flen == 64
    "  fsd fs0, 0x3c(a0)\n"
    "  fsd fs1, 0x44(a0)\n"
    "  fsd fs2, 0x4c(a0)\n"
    "  fsd fs3, 0x54(a0)\n"
    "  fsd fs4, 0x5c(a0)\n"
    "  fsd fs5, 0x64(a0)\n"
    "  fsd fs6, 0x6c(a0)\n"
    "  fsd fs7, 0x74(a0)\n"
    "  fsd fs8, 0x7c(a0)\n"
    "  fsd fs9, 0x84(a0)\n"
    "  fsd fs10, 0x8c(a0)\n"
    "  fsd fs11, 0x94(a0)\n"
    "  fld fs0, 0x3c(a1)\n"
    "  fld fs1, 0x44(a1)\n"
    "  fld fs2, 0x4c(a1)\n"
    "  fld fs3, 0x54(a1)\n"
    "  fld fs4, 0x5c(a1)\n"
    "  fld fs5, 0x64(a1)\n"
    "  fld fs6, 0x6c(a1)\n"
    "  fld fs7, 0x74(a1)\n"
    "  fld fs8, 0x7c(a1)\n"
    "  fld fs9, 0x84(a1)\n"
    "  fld fs10, 0x8c(a1)\n"
    "  fld fs11, 0x94(a1)\n"
    #elif __riscv_flen == 32
    "  fsw fs0, 0x3c(a0)\n"
    "  fsw fs1, 0x40(a0)\n"
    "  fsw fs2, 0x44(a0)\n"
    "  fsw fs3, 0x48(a0)\n"
    "  fsw fs4, 0x4c(a0)\n"
    "  fsw fs5, 0x50(a0)\n"
    "  fsw fs6, 0x54(a0)\n"
    "  fsw fs7, 0x58(a0)\n"
    "  fsw fs8, 0x5c(a0)\n"
    "  fsw fs9, 0x60(a0)\n"
    "  fsw fs10, 0x64(a0)\n"
    "  fsw fs11, 0x68(a0)\n"
    "  flw fs0, 0x3c(a1)\n"
    "  flw fs1, 0x40(a1)\n"
    "  flw fs2, 0x44(a1)\n"
    "  flw fs3, 0x48(a1)\n"
    "  flw fs4, 0x4c(a1)\n"
    "  flw fs5, 0x50(a1)\n"
    "  flw fs6, 0x54(a1)\n"
    "  flw fs7, 0x58(a1)\n"
    "  flw fs8, 0x5c(a1)\n"
    "  flw fs9, 0x60(a1)\n"
    "  flw fs10, 0x64(a1)\n"
    "  flw fs11, 0x68(a1)\n"

Implementation of RV32I, RV32F, and RV32D Support for Tina Coroutine

The implementation of coroutine context switching for RV32I, RV32F, and RV32D architectures in Tina. Each implementation handles saving and restoring the CPU state, including general-purpose and floating-point registers (if applicable), during coroutine switches.

RV32I (No Floating-Point Extensions)

This implementation provides support for the RV32I architecture in Tina, which involves saving and restoring general-purpose registers during coroutine context switching. It is specifically designed for RV32I without floating-point extensions.

Key Details

Stack Space Allocation: 56 bytes (0x38) are allocated to save general-purpose registers.
Registers Saved:
- General-purpose registers: ra, s0 to s11.
Alignment: The stack is aligned to a 16-byte boundary using andi.

Code Implementation

// RV32I without floating-point extensions
#define TINA_ABI_riscv32i (__riscv && __riscv_xlen == 32 && !__riscv_flen)

#elif TINA_ABI_riscv32i
    asm("_tina_init_stack:");
    asm("  addi sp, sp, -0x38");         // Allocate stack space
    asm("  sw sp, (a1)");                // Save stack pointer
    asm("  sw ra,   0x34(sp)");          // Save return address
    asm("  sw s0,   0x30(sp)");          // Save s0
    asm("  sw s1,   0x2C(sp)");          // Save s1
    asm("  sw s2,   0x28(sp)");          // Save s2
    asm("  sw s3,   0x24(sp)");          // Save s3
    asm("  sw s4,   0x20(sp)");          // Save s4
    asm("  sw s5,   0x1C(sp)");          // Save s5
    asm("  sw s6,   0x18(sp)");          // Save s6
    asm("  sw s7,   0x14(sp)");          // Save s7
    asm("  sw s8,   0x10(sp)");          // Save s8
    asm("  sw s9,   0x0C(sp)");          // Save s9
    asm("  sw s10,  0x08(sp)");          // Save s10
    asm("  sw s11,  0x04(sp)");          // Save s11
    asm("  andi a2, a2, ~0xF");          // Align stack
    asm("  mv sp, a2");
    asm("  mv ra, x0");
    asm("  tail _tina_start");

    asm("_tina_swap:");
    asm("  addi sp, sp, -0x38");         // Allocate stack space
    asm("  sw sp, (a0)");                // Save stack pointer
    asm("  sw ra,   0x34(sp)");          // Save return address
    asm("  sw s0,   0x30(sp)");          // Save s0
    asm("  sw s1,   0x2C(sp)");          // Save s1
    asm("  sw s2,   0x28(sp)");          // Save s2
    asm("  sw s3,   0x24(sp)");          // Save s3
    asm("  sw s4,   0x20(sp)");          // Save s4
    asm("  sw s5,   0x1C(sp)");          // Save s5
    asm("  sw s6,   0x18(sp)");          // Save s6
    asm("  sw s7,   0x14(sp)");          // Save s7
    asm("  sw s8,   0x10(sp)");          // Save s8
    asm("  sw s9,   0x0C(sp)");          // Save s9
    asm("  sw s10,  0x08(sp)");          // Save s10
    asm("  sw s11,  0x04(sp)");          // Save s11
    asm("  lw sp, (a1)");                // Restore stack pointer
    asm("  lw ra,   0x34(sp)");          // Restore return address
    asm("  lw s0,   0x30(sp)");          // Restore s0
    asm("  lw s1,   0x2C(sp)");          // Restore s1
    asm("  lw s2,   0x28(sp)");          // Restore s2
    asm("  lw s3,   0x24(sp)");          // Restore s3
    asm("  lw s4,   0x20(sp)");          // Restore s4
    asm("  lw s5,   0x1C(sp)");          // Restore s5
    asm("  lw s6,   0x18(sp)");          // Restore s6
    asm("  lw s7,   0x14(sp)");          // Restore s7
    asm("  lw s8,   0x10(sp)");          // Restore s8
    asm("  lw s9,   0x0C(sp)");          // Restore s9
    asm("  lw s10,  0x08(sp)");          // Restore s10
    asm("  lw s11,  0x04(sp)");          // Restore s11
    asm("  addi sp, sp, 0x38");          // Deallocate stack space
    asm("  mv a0, a2");                  // Set return value to a2
    asm("  ret");                        // Return

RV32F

This implementation provides support for the RV32F architecture in Tina, which includes saving and restoring general-purpose registers and single-precision floating-point registers during coroutine context switching.

Key Details

Stack Space Allocation: 104 bytes (0x68) are allocated to save general-purpose and floating-point registers.
Registers Saved:
- General-purpose registers: ra, s0 to s11.
- Floating-point registers: fs0 to fs11.
Alignment: The stack is aligned to a 16-byte boundary using andi.

Code Implementation

#define TINA_ABI_riscv32f (__riscv && __riscv_xlen == 32 && __riscv_flen == 32)

#elif TINA_ABI_riscv32f
    // 32-bit CPU + Single-Precision FPU (RV32F)
    asm("_tina_init_stack:");
    asm("  addi sp, sp, -0x68");       // Allocate stack space
    asm("  sw  sp, (a1)");            // Save stack pointer

    // Save general-purpose registers (ra, s0-s11)
    asm("  sw   ra,   0x64(sp)");   
    asm("  sw   s0,   0x60(sp)");    
    asm("  sw   s1,   0x5C(sp)");  
    asm("  sw   s2,   0x58(sp)");     
    asm("  sw   s3,   0x54(sp)");   
    asm("  sw   s4,   0x50(sp)");    
    asm("  sw   s5,   0x4C(sp)");     
    asm("  sw   s6,   0x48(sp)");   
    asm("  sw   s7,   0x44(sp)");    
    asm("  sw   s8,   0x40(sp)");  
    asm("  sw   s9,   0x3C(sp)");     
    asm("  sw   s10,  0x38(sp)");    
    asm("  sw   s11,  0x34(sp)");    

    // Save single-precision floating-point registers (fs0-fs11)
    asm("  fsw  fs0,  0x30(sp)");
    asm("  fsw  fs1,  0x2C(sp)");
    asm("  fsw  fs2,  0x28(sp)");
    asm("  fsw  fs3,  0x24(sp)");
    asm("  fsw  fs4,  0x20(sp)");
    asm("  fsw  fs5,  0x1C(sp)");
    asm("  fsw  fs6,  0x18(sp)");
    asm("  fsw  fs7,  0x14(sp)");
    asm("  fsw  fs8,  0x10(sp)");
    asm("  fsw  fs9,  0x0C(sp)");
    asm("  fsw  fs10, 0x08(sp)");
    asm("  fsw  fs11, 0x04(sp)");

    asm("  andi a2, a2, ~0xF");       // Align stack
    asm("  mv sp, a2");               // Set stack pointer
    asm("  mv ra, x0");               // Clear return address
    asm("  tail _tina_start");        // Jump to coroutine start

    asm("_tina_swap:");
    asm("  addi sp, sp, -0x68");      // Allocate stack space
    asm("  sw sp, (a0)");             // Save stack pointer

    // Save general-purpose registers
    asm("  sw   ra,   0x64(sp)");
    asm("  sw   s0,   0x60(sp)");
    asm("  sw   s1,   0x5C(sp)");
    asm("  sw   s2,   0x58(sp)");
    asm("  sw   s3,   0x54(sp)");
    asm("  sw   s4,   0x50(sp)");
    asm("  sw   s5,   0x4C(sp)");
    asm("  sw   s6,   0x48(sp)");
    asm("  sw   s7,   0x44(sp)");
    asm("  sw   s8,   0x40(sp)");
    asm("  sw   s9,   0x3C(sp)");
    asm("  sw   s10,  0x38(sp)");
    asm("  sw   s11,  0x34(sp)");

    // Save single-precision floating-point registers
    asm("  fsw  fs0,  0x30(sp)");
    asm("  fsw  fs1,  0x2C(sp)");
    asm("  fsw  fs2,  0x28(sp)");
    asm("  fsw  fs3,  0x24(sp)");
    asm("  fsw  fs4,  0x20(sp)");
    asm("  fsw  fs5,  0x1C(sp)");
    asm("  fsw  fs6,  0x18(sp)");
    asm("  fsw  fs7,  0x14(sp)");
    asm("  fsw  fs8,  0x10(sp)");
    asm("  fsw  fs9,  0x0C(sp)");
    asm("  fsw  fs10, 0x08(sp)");
    asm("  fsw  fs11, 0x04(sp)");

    asm("  lw sp, (a1)");             // Restore stack pointer

    // Restore general-purpose registers
    asm("  lw   ra,   0x64(sp)");
    asm("  lw   s0,   0x60(sp)");
    asm("  lw   s1,   0x5C(sp)");
    asm("  lw   s2,   0x58(sp)");
    asm("  lw   s3,   0x54(sp)");
    asm("  lw   s4,   0x50(sp)");
    asm("  lw   s5,   0x4C(sp)");
    asm("  lw   s6,   0x48(sp)");
    asm("  lw   s7,   0x44(sp)");
    asm("  lw   s8,   0x40(sp)");
    asm("  lw   s9,   0x3C(sp)");
    asm("  lw   s10,  0x38(sp)");
    asm("  lw   s11,  0x34(sp)");

    // Restore single-precision floating-point registers
    asm("  flw  fs0,  0x30(sp)");
    asm("  flw  fs1,  0x2C(sp)");
    asm("  flw  fs2,  0x28(sp)");
    asm("  flw  fs3,  0x24(sp)");
    asm("  flw  fs4,  0x20(sp)");
    asm("  flw  fs5,  0x1C(sp)");
    asm("  flw  fs6,  0x18(sp)");
    asm("  flw  fs7,  0x14(sp)");
    asm("  flw  fs8,  0x10(sp)");
    asm("  flw  fs9,  0x0C(sp)");
    asm("  flw  fs10, 0x08(sp)");
    asm("  flw  fs11, 0x04(sp)");

    asm("  addi sp, sp, 0x68");       // Deallocate stack space
    asm("  mv a0, a2");               // Set return value
    asm("  ret");                     // Return to caller

RV32D

This implementation adds support for the RV32D architecture in Tina, which includes saving and restoring general-purpose registers and double-precision floating-point registers during coroutine context switching.

Key Details

Stack Space Allocation: 156 bytes (0x9C) are allocated to save general-purpose and double-precision floating-point registers.
Registers Saved:
- General-purpose registers: ra, s0 to s11 (4 bytes each).
- Double-precision floating-point registers: fs0 to fs11 (8 bytes each).
Alignment: The stack is aligned to a 16-byte boundary using andi.

Code Implementation

#define TINA_ABI_riscv32d (__riscv && __riscv_xlen == 32 && __riscv_flen == 64)

#elif TINA_ABI_riscv32d
    // 32-bit CPU + Double-Precision FPU (RV32D)
    asm("_tina_init_stack:");
    asm("  addi sp, sp, -0x9C");         // Allocate stack space
    asm("  sw  sp, (a1)");               // Save stack pointer

    // Save general-purpose registers (ra, s0-s11)
    asm("  sw  ra,   0x98(sp)");         
    asm("  sw  s0,   0x94(sp)");         
    asm("  sw  s1,   0x90(sp)");
    asm("  sw  s2,   0x8C(sp)");       
    asm("  sw  s3,   0x88(sp)");      
    asm("  sw  s4,   0x84(sp)");        
    asm("  sw  s5,   0x80(sp)");       
    asm("  sw  s6,   0x7C(sp)");        
    asm("  sw  s7,   0x78(sp)");      
    asm("  sw  s8,   0x74(sp)");       
    asm("  sw  s9,   0x70(sp)");      
    asm("  sw  s10,  0x6C(sp)");        
    asm("  sw  s11,  0x68(sp)");        

    // Save double-precision floating-point registers (fs0-fs11)
    asm("  fsd fs0,  0x60(sp)");       
    asm("  fsd fs1,  0x58(sp)");     
    asm("  fsd fs2,  0x50(sp)");     
    asm("  fsd fs3,  0x48(sp)");      
    asm("  fsd fs4,  0x40(sp)");      
    asm("  fsd fs5,  0x38(sp)");       
    asm("  fsd fs6,  0x30(sp)");      
    asm("  fsd fs7,  0x28(sp)");       
    asm("  fsd fs8,  0x20(sp)");       
    asm("  fsd fs9,  0x18(sp)");        
    asm("  fsd fs10, 0x10(sp)");      
    asm("  fsd fs11, 0x08(sp)");       

    asm("  andi a2, a2, ~0xF");          // Align stack
    asm("  mv sp, a2");                  // Set stack pointer
    asm("  mv ra, x0");                  // Clear return address
    asm("  tail _tina_start");           // Jump to coroutine start

    asm("_tina_swap:");
    asm("  addi sp, sp, -0x9C");         // Allocate stack space
    asm("  sw sp, (a0)");                // Save stack pointer

    // Save general-purpose registers
    asm("  sw  ra,   0x98(sp)");
    asm("  sw  s0,   0x94(sp)");
    asm("  sw  s1,   0x90(sp)");
    asm("  sw  s2,   0x8C(sp)");
    asm("  sw  s3,   0x88(sp)");
    asm("  sw  s4,   0x84(sp)");
    asm("  sw  s5,   0x80(sp)");
    asm("  sw  s6,   0x7C(sp)");
    asm("  sw  s7,   0x78(sp)");
    asm("  sw  s8,   0x74(sp)");
    asm("  sw  s9,   0x70(sp)");
    asm("  sw  s10,  0x6C(sp)");
    asm("  sw  s11,  0x68(sp)");

    // Save double-precision floating-point registers
    asm("  fsd fs0,  0x60(sp)");
    asm("  fsd fs1,  0x58(sp)");
    asm("  fsd fs2,  0x50(sp)");
    asm("  fsd fs3,  0x48(sp)");
    asm("  fsd fs4,  0x40(sp)");
    asm("  fsd fs5,  0x38(sp)");
    asm("  fsd fs6,  0x30(sp)");
    asm("  fsd fs7,  0x28(sp)");
    asm("  fsd fs8,  0x20(sp)");
    asm("  fsd fs9,  0x18(sp)");
    asm("  fsd fs10, 0x10(sp)");
    asm("  fsd fs11, 0x08(sp)");

    asm("  lw sp, (a1)");                // Restore stack pointer

    // Restore general-purpose registers
    asm("  lw  ra,   0x98(sp)");
    asm("  lw  s0,   0x94(sp)");
    asm("  lw  s1,   0x90(sp)");
    asm("  lw  s2,   0x8C(sp)");
    asm("  lw  s3,   0x88(sp)");
    asm("  lw  s4,   0x84(sp)");
    asm("  lw  s5,   0x80(sp)");
    asm("  lw  s6,   0x7C(sp)");
    asm("  lw  s7,   0x78(sp)");
    asm("  lw  s8,   0x74(sp)");
    asm("  lw  s9,   0x70(sp)");
    asm("  lw  s10,  0x6C(sp)");
    asm("  lw  s11,  0x68(sp)");

    // Restore double-precision floating-point registers
    asm("  fld fs0,  0x60(sp)");
    asm("  fld fs1,  0x58(sp)");
    asm("  fld fs2,  0x50(sp)");
    asm("  fld fs3,  0x48(sp)");
    asm("  fld fs4,  0x40(sp)");
    asm("  fld fs5,  0x38(sp)");
    asm("  fld fs6,  0x30(sp)");
    asm("  fld fs7,  0x28(sp)");
    asm("  fld fs8,  0x20(sp)");
    asm("  fld fs9,  0x18(sp)");
    asm("  fld fs10, 0x10(sp)");
    asm("  fld fs11, 0x08(sp)");

    asm("  addi sp, sp, 0x9C");          // Deallocate stack space
    asm("  mv a0, a2");                  // Set return value
    asm("  ret");                        // Return to caller

Preliminary work

RISC-V GNU Compiler Toolchain

https://github.com/riscv-collab/riscv-gnu-toolchain.git

Setup

$ git clone https://github.com/riscv/riscv-gnu-toolchain

$ sudo apt-get install autoconf automake autotools-dev curl python3 python3-pip python3-tomli libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git cmake libglib2.0-dev libslirp-dev

$ ./configure --prefix=/opt/riscv --enable-multilib --with-multilib-generator="rv32i-ilp32--"

$ sudo make -j$(nproc)

rv32emu

https://github.com/sysprog21/rv32emu/tree/master

Setup

$ git clone https://github.com/sysprog21/rv32emu.git

$ sudo apt install libsdl2-dev libsdl2-mixer-dev

$ make

verify

verify RV32I (without floating-point extensions)

Always write comments in English!

rv32i.c

#define TINA_IMPLEMENTATION
#include "tina.h"

#include <stdio.h>
#include <stdint.h>

// 簡單的協程函式，用來測試切換及整數運算
static void* fiberA(tina* coro, void* val) {
    printf("Fiber A start.\n");

    int32_t x = 10;
    x += 20; // x = 30
    printf("Fiber A integer x = %d\n", x);

    // 使用 tina_yield() 暫停自己
    val = tina_yield(coro, (void*)1);

    x *= 2; // x = 60
    printf("Fiber A after yield, x = %d\n", x);
    printf("Fiber A done.\n");

    return val; // 回傳給最終 resume
}

int main() {
    // 準備一段堆疊給協程
    static char stackA[64 * 1024];

    // 初始化纖程
    tina* coroA = tina_init(stackA, sizeof(stackA), fiberA, NULL);

    // 第一次進入 A
    printf("Main: resume fiber A.\n");
    void* ret = tina_resume(coroA, NULL);
    printf("Main: fiber A yield. ret = %ld\n", (long)ret);

    // 再次進入 A
    printf("Main: resume fiber A again.\n");
    ret = tina_resume(coroA, NULL);
    printf("Main: fiber A ended. ret = %ld\n", (long)ret);

    return 0;
}

Compilation

Use the following command to compile the program:

$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 rv32i.c -o rv32i

$ rv32emu ./rv32i

Output

Main: resume fiber A.
Fiber A start.
Fiber A integer x = 30
Main: fiber A yield. ret = 1
Main: resume fiber A again.
Fiber A after yield, x = 60
Fiber A done.
Main: fiber A ended. ret = 0
inferior exit code 0

verify RV32F

rv32f.c

#define TINA_IMPLEMENTATION
#include "tina.h"

#include <stdio.h>

// 簡單的協程函式，用來測試切換及浮點運算
static void* fiberA(tina* coro, void* val){
    printf("Fiber A start.\n");

    float x = 12.59;
    x += 2.71828; // x = 15.308280
    printf("Fiber A double x = %f\n", x);

    // 使用 tina_yield() 暫停自己
    val = tina_yield(coro, (void*)1);

    x *= 3.0; // x = 45.924839
    printf("Fiber A after yield, x = %f\n", x);
    printf("Fiber A done.\n");
    return val; // 回傳給最終 resume
}

int main(){
    // 準備一段堆疊給協程
    static char stackA[64 * 1024];

    // 初始化纖程
    tina* coroA = tina_init(stackA, sizeof(stackA), fiberA, NULL);

    // 第一次進入 A
    printf("Main: resume fiber A.\n");
    void* ret = tina_resume(coroA, NULL);
    printf("Main: fiber A yield. ret = %ld\n", (long)ret);

    // 再次進入 A
    printf("Main: resume fiber A again.\n");
    ret = tina_resume(coroA, NULL);
    printf("Main: fiber A ended. ret = %ld\n", (long)ret);

    return 0;
}

Compilation

Use the following command to compile the program:

$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 rv32f.c -o rv32f

$ rv32emu ./rv32f

Output

Main: resume fiber A.
Fiber A start.
Fiber A double x = 15.308280
Main: fiber A yield. ret = 1
Main: resume fiber A again.
Fiber A after yield, x = 45.924839
Fiber A done.
Main: fiber A ended. ret = 0
inferior exit code 0

verify RV32D

rv32d.c

#define TINA_IMPLEMENTATION
#include "tina.h"

#include <stdio.h>

// 簡單的協程函式，用來測試切換及浮點運算
static void* fiberA(tina* coro, void* val){
    printf("Fiber A start.\n");

    double x = 3.14;
    x += 2.71828; // x = 5.85828
    printf("Fiber A double x = %f\n", x);

    // 使用 tina_yield() 暫停自己
    val = tina_yield(coro, (void*)1);

    x *= 2.0; // x = 11.71656
    printf("Fiber A after yield, x = %f\n", x);
    printf("Fiber A done.\n");
    return val; // 回傳給最終 resume
}

int main(){
    // 準備一段堆疊給協程
    static char stackA[64 * 1024];

    // 初始化纖程
    tina* coroA = tina_init(stackA, sizeof(stackA), fiberA, NULL);

    // 第一次進入 A
    printf("Main: resume fiber A.\n");
    void* ret = tina_resume(coroA, NULL);
    printf("Main: fiber A yield. ret = %ld\n", (long)ret);

    // 再次進入 A
    printf("Main: resume fiber A again.\n");
    ret = tina_resume(coroA, NULL);
    printf("Main: fiber A ended. ret = %ld\n", (long)ret);

    return 0;
}

Compilation

Use the following command to compile the program:

$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 rv32d.c -o rv32d

$ rv32emu ./rv32d

Output

Main: resume fiber A.
Fiber A start.
Fiber A double x = 5.858280
Main: fiber A yield. ret = 1
Main: resume fiber A again.
Fiber A after yield, x = 11.716560
Fiber A done.
Main: fiber A ended. ret = 0
inferior exit code 0

Contributing Back to Tina

Merged!