# RV32 support for Tina coroutine
> 黃守維
[GitHub](https://github.com/28530367/Tina/tree/feature/rv32i-support)
## Task Description
My task is to add **RV32I support** to the [Tina](https://github.com/slembcke/Tina) project using **inline assembly** for extensions. Additionally, I need to:
- **Consider preserving RV32F/RV32D (floating-point) registers** during execution.
- Validate the implementation on **rv32emu** or the **Spike emulator**.
- Develop test programs (self-written) to ensure the proper functionality of coroutines.
- Finally, contribute the code back to the **Tina** project.
## Tina
### Symmetric Coroutines:
- The primary mechanism for symmetric coroutines is the use of `TINA_swap()`.
- This allows switching between two fibers, including the main program.
### Asymmetric Coroutines:
- Asymmetric coroutines use:
- `TINA_resume()` to enter a sub-coroutine.
- `TINA_yield()` to return control to the caller.
### Assembly Architecture in Tina
In **Tina**, the `tina.h` file uses multiple `#ifdef` directives to distinguish between different platforms, such as:
- **x86**
- **ARM**
- **RISCV (RV64)**
Context switching is implemented using:
- **Inline assembly**
- **Dedicated assembly blocks** for platform-specific functionality.
## Analysis of RISC-V RV32I + F/D Registers
### General Registers (Integer Registers)
- RISC-V RV32I includes **32 general-purpose registers**: `x0` to `x31`, each 32-bit wide.
- During coroutine switching, the handling of **saved registers** and **temporary registers** can vary depending on the OS ABI conventions.
- To ensure the program's **CPU state is fully restored** after a coroutine switch, it is common practice to save and restore all registers that might be modified.
### Floating-Point Registers (FPU Registers)
- If **RV32F** or **RV32D** is supported, there are an additional **32 floating-point registers**: `f0` to `f31`.
- Each register is **32-bit** wide (RV32F) or **64-bit** wide (RV32D).
### Considerations for Coroutine Switching:
- To prevent disrupting floating-point operations between coroutine switches:
- All floating-point registers must be **saved** as part of the coroutine context.
- These registers should be **restored** when switching back to the coroutine.
## Reference the Tina RV64 & the Minicoro RV32I Approach
### Referencing the Tina RV64 Approach
The original RV64 approach uses assembly blocks to:
1. **Initialize the stack** when starting a new coroutine.
2. **Save and restore** both general-purpose and floating-point registers.
Below is an example of how the RV64 code handles this process:
```c
#define TINA_ABI_aarch32 (__ARM_EABI__ && __GNUC__)
#define TINA_ABI_aarch64 (__aarch64__ && __GNUC__)
#define TINA_ABI_i386 ((__i386__ && __GNUC__) || (_M_IX86 && _MSC_VER))
#define TINA_ABI_SysV_AMD64 (__amd64__ && __GNUC__ && (__unix__ || __APPLE__ || __HAIKU__))
#define TINA_ABI_WIN64 ((__WIN64__ && __GNUC__) || (_M_AMD64 && _MSC_VER))
#define TINA_ABI_riscv64gc (__riscv && __riscv_xlen == 64 && __riscv_flen == 64)
// ...
// 64bit riscv w/ 64 bit floats
// push s0-s11, fs0-fs11
asm("_tina_init_stack:");
asm(" addi sp, sp, -0xD0");
asm(" sd sp, (a1)");
asm(" sd ra, 0xC8(sp)");
asm(" sd s0, 0xC0(sp)");
asm(" sd s1, 0xB8(sp)");
asm(" sd s2, 0xB0(sp)");
asm(" sd s3, 0xA8(sp)");
asm(" sd s4, 0xA0(sp)");
asm(" sd s5, 0x98(sp)");
asm(" sd s6, 0x90(sp)");
asm(" sd s7, 0x88(sp)");
asm(" sd s8, 0x80(sp)");
asm(" sd s9, 0x78(sp)");
asm(" sd s10, 0x70(sp)");
asm(" sd s11, 0x68(sp)");
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" andi a2, a2, ~0xF");
asm(" mv sp, a2");
asm(" mv ra, x0");
asm(" tail _tina_start");
asm("_tina_swap:");
asm(" addi sp, sp, -0xD0");
asm(" sd sp, (a0)");
asm(" sd ra, 0xC8(sp)");
asm(" sd s0, 0xC0(sp)");
asm(" sd s1, 0xB8(sp)");
asm(" sd s2, 0xB0(sp)");
asm(" sd s3, 0xA8(sp)");
asm(" sd s4, 0xA0(sp)");
asm(" sd s5, 0x98(sp)");
asm(" sd s6, 0x90(sp)");
asm(" sd s7, 0x88(sp)");
asm(" sd s8, 0x80(sp)");
asm(" sd s9, 0x78(sp)");
asm(" sd s10, 0x70(sp)");
asm(" sd s11, 0x68(sp)");
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" ld sp, (a1)");
asm(" ld ra, 0xC8(sp)");
asm(" ld s0, 0xC0(sp)");
asm(" ld s1, 0xB8(sp)");
asm(" ld s2, 0xB0(sp)");
asm(" ld s3, 0xA8(sp)");
asm(" ld s4, 0xA0(sp)");
asm(" ld s5, 0x98(sp)");
asm(" ld s6, 0x90(sp)");
asm(" ld s7, 0x88(sp)");
asm(" ld s8, 0x80(sp)");
asm(" ld s9, 0x78(sp)");
asm(" ld s10, 0x70(sp)");
asm(" ld s11, 0x68(sp)");
asm(" fld fs0, 0x60(sp)");
asm(" fld fs1, 0x58(sp)");
asm(" fld fs2, 0x50(sp)");
asm(" fld fs3, 0x48(sp)");
asm(" fld fs4, 0x40(sp)");
asm(" fld fs5, 0x38(sp)");
asm(" fld fs6, 0x30(sp)");
asm(" fld fs7, 0x28(sp)");
asm(" fld fs8, 0x20(sp)");
asm(" fld fs9, 0x18(sp)");
asm(" fld fs10, 0x10(sp)");
asm(" fld fs11, 0x08(sp)");
asm(" addi sp, sp, 0xD0");
asm(" mv a0, a2");
asm(" ret");
```
### Referencing the Minicoro RV32I Approach
Minicoro provides a lightweight and efficient implementation for coroutine context switching, including support for **RISC-V RV32I** architecture. The approach leverages inline assembly to save and restore both general-purpose and floating-point registers, ensuring seamless coroutine operation. Below is an example of how the registers are managed in Minicoro, highlighting its simplicity and adherence to RISC-V conventions.
#### Key Features of Minicoro's Approach:
1. **General-Purpose Register Management**:
- Saves and restores `s0` through `s11`, `ra`, and `sp` to maintain the coroutine's execution context.
2. **Floating-Point Register Support**:
- Includes conditional handling for `RV32F` and `RV32D` extensions, preserving floating-point computation results across coroutine switches.
3. **Scalable Design**:
- Uses preprocessor directives like `__riscv_xlen` and `__riscv_flen` to adapt the implementation to different configurations (e.g., RV32I, RV32F, RV32D).
Below is an example of how the RV32I code handles this process:
```c
#elif __riscv_xlen == 32
" sw s0, 0x00(a0)\n"
" sw s1, 0x04(a0)\n"
" sw s2, 0x08(a0)\n"
" sw s3, 0x0c(a0)\n"
" sw s4, 0x10(a0)\n"
" sw s5, 0x14(a0)\n"
" sw s6, 0x18(a0)\n"
" sw s7, 0x1c(a0)\n"
" sw s8, 0x20(a0)\n"
" sw s9, 0x24(a0)\n"
" sw s10, 0x28(a0)\n"
" sw s11, 0x2c(a0)\n"
" sw ra, 0x30(a0)\n"
" sw ra, 0x34(a0)\n" /* pc */
" sw sp, 0x38(a0)\n"
#ifdef __riscv_flen
#if __riscv_flen == 64
" fsd fs0, 0x3c(a0)\n"
" fsd fs1, 0x44(a0)\n"
" fsd fs2, 0x4c(a0)\n"
" fsd fs3, 0x54(a0)\n"
" fsd fs4, 0x5c(a0)\n"
" fsd fs5, 0x64(a0)\n"
" fsd fs6, 0x6c(a0)\n"
" fsd fs7, 0x74(a0)\n"
" fsd fs8, 0x7c(a0)\n"
" fsd fs9, 0x84(a0)\n"
" fsd fs10, 0x8c(a0)\n"
" fsd fs11, 0x94(a0)\n"
" fld fs0, 0x3c(a1)\n"
" fld fs1, 0x44(a1)\n"
" fld fs2, 0x4c(a1)\n"
" fld fs3, 0x54(a1)\n"
" fld fs4, 0x5c(a1)\n"
" fld fs5, 0x64(a1)\n"
" fld fs6, 0x6c(a1)\n"
" fld fs7, 0x74(a1)\n"
" fld fs8, 0x7c(a1)\n"
" fld fs9, 0x84(a1)\n"
" fld fs10, 0x8c(a1)\n"
" fld fs11, 0x94(a1)\n"
#elif __riscv_flen == 32
" fsw fs0, 0x3c(a0)\n"
" fsw fs1, 0x40(a0)\n"
" fsw fs2, 0x44(a0)\n"
" fsw fs3, 0x48(a0)\n"
" fsw fs4, 0x4c(a0)\n"
" fsw fs5, 0x50(a0)\n"
" fsw fs6, 0x54(a0)\n"
" fsw fs7, 0x58(a0)\n"
" fsw fs8, 0x5c(a0)\n"
" fsw fs9, 0x60(a0)\n"
" fsw fs10, 0x64(a0)\n"
" fsw fs11, 0x68(a0)\n"
" flw fs0, 0x3c(a1)\n"
" flw fs1, 0x40(a1)\n"
" flw fs2, 0x44(a1)\n"
" flw fs3, 0x48(a1)\n"
" flw fs4, 0x4c(a1)\n"
" flw fs5, 0x50(a1)\n"
" flw fs6, 0x54(a1)\n"
" flw fs7, 0x58(a1)\n"
" flw fs8, 0x5c(a1)\n"
" flw fs9, 0x60(a1)\n"
" flw fs10, 0x64(a1)\n"
" flw fs11, 0x68(a1)\n"
```
## Implementation of RV32I, RV32F, and RV32D Support for Tina Coroutine
The implementation of coroutine context switching for **RV32I**, **RV32F**, and **RV32D** architectures in Tina. Each implementation handles saving and restoring the CPU state, including general-purpose and floating-point registers (if applicable), during coroutine switches.
### RV32I (No Floating-Point Extensions)
This implementation provides support for the **RV32I architecture** in Tina, which involves saving and restoring **general-purpose registers** during coroutine context switching. It is specifically designed for RV32I without floating-point extensions.
#### Key Details
- **Stack Space Allocation**: 56 bytes (`0x38`) are allocated to save general-purpose registers.
- **Registers Saved**:
- General-purpose registers: `ra`, `s0` to `s11`.
- **Alignment**: The stack is aligned to a 16-byte boundary using `andi`.
#### Code Implementation
```c
// RV32I without floating-point extensions
#define TINA_ABI_riscv32i (__riscv && __riscv_xlen == 32 && !__riscv_flen)
#elif TINA_ABI_riscv32i
asm("_tina_init_stack:");
asm(" addi sp, sp, -0x38"); // Allocate stack space
asm(" sw sp, (a1)"); // Save stack pointer
asm(" sw ra, 0x34(sp)"); // Save return address
asm(" sw s0, 0x30(sp)"); // Save s0
asm(" sw s1, 0x2C(sp)"); // Save s1
asm(" sw s2, 0x28(sp)"); // Save s2
asm(" sw s3, 0x24(sp)"); // Save s3
asm(" sw s4, 0x20(sp)"); // Save s4
asm(" sw s5, 0x1C(sp)"); // Save s5
asm(" sw s6, 0x18(sp)"); // Save s6
asm(" sw s7, 0x14(sp)"); // Save s7
asm(" sw s8, 0x10(sp)"); // Save s8
asm(" sw s9, 0x0C(sp)"); // Save s9
asm(" sw s10, 0x08(sp)"); // Save s10
asm(" sw s11, 0x04(sp)"); // Save s11
asm(" andi a2, a2, ~0xF"); // Align stack
asm(" mv sp, a2");
asm(" mv ra, x0");
asm(" tail _tina_start");
asm("_tina_swap:");
asm(" addi sp, sp, -0x38"); // Allocate stack space
asm(" sw sp, (a0)"); // Save stack pointer
asm(" sw ra, 0x34(sp)"); // Save return address
asm(" sw s0, 0x30(sp)"); // Save s0
asm(" sw s1, 0x2C(sp)"); // Save s1
asm(" sw s2, 0x28(sp)"); // Save s2
asm(" sw s3, 0x24(sp)"); // Save s3
asm(" sw s4, 0x20(sp)"); // Save s4
asm(" sw s5, 0x1C(sp)"); // Save s5
asm(" sw s6, 0x18(sp)"); // Save s6
asm(" sw s7, 0x14(sp)"); // Save s7
asm(" sw s8, 0x10(sp)"); // Save s8
asm(" sw s9, 0x0C(sp)"); // Save s9
asm(" sw s10, 0x08(sp)"); // Save s10
asm(" sw s11, 0x04(sp)"); // Save s11
asm(" lw sp, (a1)"); // Restore stack pointer
asm(" lw ra, 0x34(sp)"); // Restore return address
asm(" lw s0, 0x30(sp)"); // Restore s0
asm(" lw s1, 0x2C(sp)"); // Restore s1
asm(" lw s2, 0x28(sp)"); // Restore s2
asm(" lw s3, 0x24(sp)"); // Restore s3
asm(" lw s4, 0x20(sp)"); // Restore s4
asm(" lw s5, 0x1C(sp)"); // Restore s5
asm(" lw s6, 0x18(sp)"); // Restore s6
asm(" lw s7, 0x14(sp)"); // Restore s7
asm(" lw s8, 0x10(sp)"); // Restore s8
asm(" lw s9, 0x0C(sp)"); // Restore s9
asm(" lw s10, 0x08(sp)"); // Restore s10
asm(" lw s11, 0x04(sp)"); // Restore s11
asm(" addi sp, sp, 0x38"); // Deallocate stack space
asm(" mv a0, a2"); // Set return value to a2
asm(" ret"); // Return
```
### RV32F
This implementation provides support for the **RV32F architecture** in Tina, which includes saving and restoring **general-purpose registers** and **single-precision floating-point registers** during coroutine context switching.
#### Key Details
- **Stack Space Allocation**: 104 bytes (`0x68`) are allocated to save general-purpose and floating-point registers.
- **Registers Saved**:
- General-purpose registers: `ra`, `s0` to `s11`.
- Floating-point registers: `fs0` to `fs11`.
- **Alignment**: The stack is aligned to a 16-byte boundary using `andi`.
#### Code Implementation
```c
#define TINA_ABI_riscv32f (__riscv && __riscv_xlen == 32 && __riscv_flen == 32)
#elif TINA_ABI_riscv32f
// 32-bit CPU + Single-Precision FPU (RV32F)
asm("_tina_init_stack:");
asm(" addi sp, sp, -0x68"); // Allocate stack space
asm(" sw sp, (a1)"); // Save stack pointer
// Save general-purpose registers (ra, s0-s11)
asm(" sw ra, 0x64(sp)");
asm(" sw s0, 0x60(sp)");
asm(" sw s1, 0x5C(sp)");
asm(" sw s2, 0x58(sp)");
asm(" sw s3, 0x54(sp)");
asm(" sw s4, 0x50(sp)");
asm(" sw s5, 0x4C(sp)");
asm(" sw s6, 0x48(sp)");
asm(" sw s7, 0x44(sp)");
asm(" sw s8, 0x40(sp)");
asm(" sw s9, 0x3C(sp)");
asm(" sw s10, 0x38(sp)");
asm(" sw s11, 0x34(sp)");
// Save single-precision floating-point registers (fs0-fs11)
asm(" fsw fs0, 0x30(sp)");
asm(" fsw fs1, 0x2C(sp)");
asm(" fsw fs2, 0x28(sp)");
asm(" fsw fs3, 0x24(sp)");
asm(" fsw fs4, 0x20(sp)");
asm(" fsw fs5, 0x1C(sp)");
asm(" fsw fs6, 0x18(sp)");
asm(" fsw fs7, 0x14(sp)");
asm(" fsw fs8, 0x10(sp)");
asm(" fsw fs9, 0x0C(sp)");
asm(" fsw fs10, 0x08(sp)");
asm(" fsw fs11, 0x04(sp)");
asm(" andi a2, a2, ~0xF"); // Align stack
asm(" mv sp, a2"); // Set stack pointer
asm(" mv ra, x0"); // Clear return address
asm(" tail _tina_start"); // Jump to coroutine start
asm("_tina_swap:");
asm(" addi sp, sp, -0x68"); // Allocate stack space
asm(" sw sp, (a0)"); // Save stack pointer
// Save general-purpose registers
asm(" sw ra, 0x64(sp)");
asm(" sw s0, 0x60(sp)");
asm(" sw s1, 0x5C(sp)");
asm(" sw s2, 0x58(sp)");
asm(" sw s3, 0x54(sp)");
asm(" sw s4, 0x50(sp)");
asm(" sw s5, 0x4C(sp)");
asm(" sw s6, 0x48(sp)");
asm(" sw s7, 0x44(sp)");
asm(" sw s8, 0x40(sp)");
asm(" sw s9, 0x3C(sp)");
asm(" sw s10, 0x38(sp)");
asm(" sw s11, 0x34(sp)");
// Save single-precision floating-point registers
asm(" fsw fs0, 0x30(sp)");
asm(" fsw fs1, 0x2C(sp)");
asm(" fsw fs2, 0x28(sp)");
asm(" fsw fs3, 0x24(sp)");
asm(" fsw fs4, 0x20(sp)");
asm(" fsw fs5, 0x1C(sp)");
asm(" fsw fs6, 0x18(sp)");
asm(" fsw fs7, 0x14(sp)");
asm(" fsw fs8, 0x10(sp)");
asm(" fsw fs9, 0x0C(sp)");
asm(" fsw fs10, 0x08(sp)");
asm(" fsw fs11, 0x04(sp)");
asm(" lw sp, (a1)"); // Restore stack pointer
// Restore general-purpose registers
asm(" lw ra, 0x64(sp)");
asm(" lw s0, 0x60(sp)");
asm(" lw s1, 0x5C(sp)");
asm(" lw s2, 0x58(sp)");
asm(" lw s3, 0x54(sp)");
asm(" lw s4, 0x50(sp)");
asm(" lw s5, 0x4C(sp)");
asm(" lw s6, 0x48(sp)");
asm(" lw s7, 0x44(sp)");
asm(" lw s8, 0x40(sp)");
asm(" lw s9, 0x3C(sp)");
asm(" lw s10, 0x38(sp)");
asm(" lw s11, 0x34(sp)");
// Restore single-precision floating-point registers
asm(" flw fs0, 0x30(sp)");
asm(" flw fs1, 0x2C(sp)");
asm(" flw fs2, 0x28(sp)");
asm(" flw fs3, 0x24(sp)");
asm(" flw fs4, 0x20(sp)");
asm(" flw fs5, 0x1C(sp)");
asm(" flw fs6, 0x18(sp)");
asm(" flw fs7, 0x14(sp)");
asm(" flw fs8, 0x10(sp)");
asm(" flw fs9, 0x0C(sp)");
asm(" flw fs10, 0x08(sp)");
asm(" flw fs11, 0x04(sp)");
asm(" addi sp, sp, 0x68"); // Deallocate stack space
asm(" mv a0, a2"); // Set return value
asm(" ret"); // Return to caller
```
### RV32D
This implementation adds support for the **RV32D architecture** in Tina, which includes saving and restoring **general-purpose registers** and **double-precision floating-point registers** during coroutine context switching.
#### Key Details
- **Stack Space Allocation**: 156 bytes (`0x9C`) are allocated to save general-purpose and double-precision floating-point registers.
- **Registers Saved**:
- General-purpose registers: `ra`, `s0` to `s11` (4 bytes each).
- Double-precision floating-point registers: `fs0` to `fs11` (8 bytes each).
- **Alignment**: The stack is aligned to a 16-byte boundary using `andi`.
#### Code Implementation
```c
#define TINA_ABI_riscv32d (__riscv && __riscv_xlen == 32 && __riscv_flen == 64)
#elif TINA_ABI_riscv32d
// 32-bit CPU + Double-Precision FPU (RV32D)
asm("_tina_init_stack:");
asm(" addi sp, sp, -0x9C"); // Allocate stack space
asm(" sw sp, (a1)"); // Save stack pointer
// Save general-purpose registers (ra, s0-s11)
asm(" sw ra, 0x98(sp)");
asm(" sw s0, 0x94(sp)");
asm(" sw s1, 0x90(sp)");
asm(" sw s2, 0x8C(sp)");
asm(" sw s3, 0x88(sp)");
asm(" sw s4, 0x84(sp)");
asm(" sw s5, 0x80(sp)");
asm(" sw s6, 0x7C(sp)");
asm(" sw s7, 0x78(sp)");
asm(" sw s8, 0x74(sp)");
asm(" sw s9, 0x70(sp)");
asm(" sw s10, 0x6C(sp)");
asm(" sw s11, 0x68(sp)");
// Save double-precision floating-point registers (fs0-fs11)
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" andi a2, a2, ~0xF"); // Align stack
asm(" mv sp, a2"); // Set stack pointer
asm(" mv ra, x0"); // Clear return address
asm(" tail _tina_start"); // Jump to coroutine start
asm("_tina_swap:");
asm(" addi sp, sp, -0x9C"); // Allocate stack space
asm(" sw sp, (a0)"); // Save stack pointer
// Save general-purpose registers
asm(" sw ra, 0x98(sp)");
asm(" sw s0, 0x94(sp)");
asm(" sw s1, 0x90(sp)");
asm(" sw s2, 0x8C(sp)");
asm(" sw s3, 0x88(sp)");
asm(" sw s4, 0x84(sp)");
asm(" sw s5, 0x80(sp)");
asm(" sw s6, 0x7C(sp)");
asm(" sw s7, 0x78(sp)");
asm(" sw s8, 0x74(sp)");
asm(" sw s9, 0x70(sp)");
asm(" sw s10, 0x6C(sp)");
asm(" sw s11, 0x68(sp)");
// Save double-precision floating-point registers
asm(" fsd fs0, 0x60(sp)");
asm(" fsd fs1, 0x58(sp)");
asm(" fsd fs2, 0x50(sp)");
asm(" fsd fs3, 0x48(sp)");
asm(" fsd fs4, 0x40(sp)");
asm(" fsd fs5, 0x38(sp)");
asm(" fsd fs6, 0x30(sp)");
asm(" fsd fs7, 0x28(sp)");
asm(" fsd fs8, 0x20(sp)");
asm(" fsd fs9, 0x18(sp)");
asm(" fsd fs10, 0x10(sp)");
asm(" fsd fs11, 0x08(sp)");
asm(" lw sp, (a1)"); // Restore stack pointer
// Restore general-purpose registers
asm(" lw ra, 0x98(sp)");
asm(" lw s0, 0x94(sp)");
asm(" lw s1, 0x90(sp)");
asm(" lw s2, 0x8C(sp)");
asm(" lw s3, 0x88(sp)");
asm(" lw s4, 0x84(sp)");
asm(" lw s5, 0x80(sp)");
asm(" lw s6, 0x7C(sp)");
asm(" lw s7, 0x78(sp)");
asm(" lw s8, 0x74(sp)");
asm(" lw s9, 0x70(sp)");
asm(" lw s10, 0x6C(sp)");
asm(" lw s11, 0x68(sp)");
// Restore double-precision floating-point registers
asm(" fld fs0, 0x60(sp)");
asm(" fld fs1, 0x58(sp)");
asm(" fld fs2, 0x50(sp)");
asm(" fld fs3, 0x48(sp)");
asm(" fld fs4, 0x40(sp)");
asm(" fld fs5, 0x38(sp)");
asm(" fld fs6, 0x30(sp)");
asm(" fld fs7, 0x28(sp)");
asm(" fld fs8, 0x20(sp)");
asm(" fld fs9, 0x18(sp)");
asm(" fld fs10, 0x10(sp)");
asm(" fld fs11, 0x08(sp)");
asm(" addi sp, sp, 0x9C"); // Deallocate stack space
asm(" mv a0, a2"); // Set return value
asm(" ret"); // Return to caller
```
## Preliminary work
### RISC-V GNU Compiler Toolchain
> [https://github.com/riscv-collab/riscv-gnu-toolchain.git](https://github.com/riscv-collab/riscv-gnu-toolchain.git)
#### Setup
```
$ git clone https://github.com/riscv/riscv-gnu-toolchain
$ sudo apt-get install autoconf automake autotools-dev curl python3 python3-pip python3-tomli libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git cmake libglib2.0-dev libslirp-dev
$ ./configure --prefix=/opt/riscv --enable-multilib --with-multilib-generator="rv32i-ilp32--"
$ sudo make -j$(nproc)
```
### rv32emu
> [https://github.com/sysprog21/rv32emu/tree/master](https://github.com/sysprog21/rv32emu/tree/master)
#### Setup
```
$ git clone https://github.com/sysprog21/rv32emu.git
$ sudo apt install libsdl2-dev libsdl2-mixer-dev
$ make
```
## verify
### verify RV32I (without floating-point extensions)
:::danger
Always write comments in English!
:::
#### rv32i.c
```c
#define TINA_IMPLEMENTATION
#include "tina.h"
#include <stdio.h>
#include <stdint.h>
// 簡單的協程函式,用來測試切換及整數運算
static void* fiberA(tina* coro, void* val) {
printf("Fiber A start.\n");
int32_t x = 10;
x += 20; // x = 30
printf("Fiber A integer x = %d\n", x);
// 使用 tina_yield() 暫停自己
val = tina_yield(coro, (void*)1);
x *= 2; // x = 60
printf("Fiber A after yield, x = %d\n", x);
printf("Fiber A done.\n");
return val; // 回傳給最終 resume
}
int main() {
// 準備一段堆疊給協程
static char stackA[64 * 1024];
// 初始化纖程
tina* coroA = tina_init(stackA, sizeof(stackA), fiberA, NULL);
// 第一次進入 A
printf("Main: resume fiber A.\n");
void* ret = tina_resume(coroA, NULL);
printf("Main: fiber A yield. ret = %ld\n", (long)ret);
// 再次進入 A
printf("Main: resume fiber A again.\n");
ret = tina_resume(coroA, NULL);
printf("Main: fiber A ended. ret = %ld\n", (long)ret);
return 0;
}
```
#### Compilation
> Use the following command to compile the program:
```bash
$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 rv32i.c -o rv32i
$ rv32emu ./rv32i
```
#### Output
```
Main: resume fiber A.
Fiber A start.
Fiber A integer x = 30
Main: fiber A yield. ret = 1
Main: resume fiber A again.
Fiber A after yield, x = 60
Fiber A done.
Main: fiber A ended. ret = 0
inferior exit code 0
```
### verify RV32F
#### rv32f.c
```c
#define TINA_IMPLEMENTATION
#include "tina.h"
#include <stdio.h>
// 簡單的協程函式,用來測試切換及浮點運算
static void* fiberA(tina* coro, void* val){
printf("Fiber A start.\n");
float x = 12.59;
x += 2.71828; // x = 15.308280
printf("Fiber A double x = %f\n", x);
// 使用 tina_yield() 暫停自己
val = tina_yield(coro, (void*)1);
x *= 3.0; // x = 45.924839
printf("Fiber A after yield, x = %f\n", x);
printf("Fiber A done.\n");
return val; // 回傳給最終 resume
}
int main(){
// 準備一段堆疊給協程
static char stackA[64 * 1024];
// 初始化纖程
tina* coroA = tina_init(stackA, sizeof(stackA), fiberA, NULL);
// 第一次進入 A
printf("Main: resume fiber A.\n");
void* ret = tina_resume(coroA, NULL);
printf("Main: fiber A yield. ret = %ld\n", (long)ret);
// 再次進入 A
printf("Main: resume fiber A again.\n");
ret = tina_resume(coroA, NULL);
printf("Main: fiber A ended. ret = %ld\n", (long)ret);
return 0;
}
```
#### Compilation
> Use the following command to compile the program:
```bash
$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 rv32f.c -o rv32f
$ rv32emu ./rv32f
```
#### Output
```
Main: resume fiber A.
Fiber A start.
Fiber A double x = 15.308280
Main: fiber A yield. ret = 1
Main: resume fiber A again.
Fiber A after yield, x = 45.924839
Fiber A done.
Main: fiber A ended. ret = 0
inferior exit code 0
```
### verify RV32D
#### rv32d.c
```c
#define TINA_IMPLEMENTATION
#include "tina.h"
#include <stdio.h>
// 簡單的協程函式,用來測試切換及浮點運算
static void* fiberA(tina* coro, void* val){
printf("Fiber A start.\n");
double x = 3.14;
x += 2.71828; // x = 5.85828
printf("Fiber A double x = %f\n", x);
// 使用 tina_yield() 暫停自己
val = tina_yield(coro, (void*)1);
x *= 2.0; // x = 11.71656
printf("Fiber A after yield, x = %f\n", x);
printf("Fiber A done.\n");
return val; // 回傳給最終 resume
}
int main(){
// 準備一段堆疊給協程
static char stackA[64 * 1024];
// 初始化纖程
tina* coroA = tina_init(stackA, sizeof(stackA), fiberA, NULL);
// 第一次進入 A
printf("Main: resume fiber A.\n");
void* ret = tina_resume(coroA, NULL);
printf("Main: fiber A yield. ret = %ld\n", (long)ret);
// 再次進入 A
printf("Main: resume fiber A again.\n");
ret = tina_resume(coroA, NULL);
printf("Main: fiber A ended. ret = %ld\n", (long)ret);
return 0;
}
```
#### Compilation
> Use the following command to compile the program:
```bash
$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 rv32d.c -o rv32d
$ rv32emu ./rv32d
```
#### Output
```
Main: resume fiber A.
Fiber A start.
Fiber A double x = 5.858280
Main: fiber A yield. ret = 1
Main: resume fiber A again.
Fiber A after yield, x = 11.716560
Fiber A done.
Main: fiber A ended. ret = 0
inferior exit code 0
```
## Contributing Back to Tina


Merged!
## Reference
- [https://github.com/edubart/minicoro](https://github.com/edubart/minicoro)
- [https://github.com/sysprog21/rv32emu/tree/master](https://github.com/sysprog21/rv32emu/tree/master)
- [https://github.com/riscv-collab/riscv-gnu-toolchain.git](https://github.com/riscv-collab/riscv-gnu-toolchain.git)
- [https://github.com/slembcke/Tina](https://github.com/slembcke/Tina)