RV32I Interpreter

# RV32I Interpreter > [source code](https://github.com/jin11109/rv32i-interpreter) ## Introduction This discussion focuses on [CA Quiz 3 Problem B](https://hackmd.io/@sysprog/arch2025-quiz3-sol#Problem-B), which involves implementing a simple **RV32I interpreter** written in C. The interpreter can load a binary file and sequentially **fetch**, **decode**, and **execute** the 32-bit RISC-V instructions it contains. Currently, it supports the following instruction: - **I-type** - Integer computation: `addi`, `xori`, `ori` - Load: `lw` - **U-type**: `auipc` - **S-type**: `sw` - **Other**: `ecall` Our task is to analyze and correct the issues in the original code, discuss the `jal` and `jalr` instructions based on the RISC-V specification, and finally add support for `jal`. ## Interpreter Overview ### 1. Initialization Before entering the main execution loop, the interpreter must initialize all necessary components such as the **CPU**, **register file**, and **RAM**. --- #### `cpu_new()`: This function allocates and initializes a new `cpu_t` instance. Several registers require special initialization: - **`x0` register** — Always holds the constant value `0`. - **`x2` (stack pointer, `sp`)** — Initialized to the **top of the RAM**. - **`pc` (program counter)** — Points to the **start of the RAM**, where the binary code begins. ```c cpu_t* cpu_new(uint8_t* code, size_t len) { cpu_t* cpu = malloc(sizeof(cpu_t)); memset(cpu->regs, 0, sizeof(cpu->regs)); cpu->regs[0] = 0; /* Stack pointer */ cpu->regs[2] = RAM_BASE + RAM_SIZE; cpu->pc = RAM_BASE; cpu->ram = ram_new(code, len); return cpu; } ``` --- #### `ram_new()`: This function allocates **2 MiB** of memory and copies the binary content into the beginning of the memory region. ```c ram_t* ram_new(uint8_t* code, size_t len) { ram_t* ram = malloc(sizeof(ram_t)); ram->mem = malloc(RAM_SIZE); // 2 MiB memset(ram->mem, 0, RAM_SIZE); memcpy(ram->mem, code, len); return ram; } ``` --- First, the interpreter reads the binary file and performs the initialization process. It then enters a loop that continuously fetches and executes instructions from memory. Each iteration of this loop follows the standard fetch–decode–execute cycle, which defines the fundamental behavior of an interpreter. #### Main Loop ```c while (1) { uint32_t raw = cpu_fetch(cpu); cpu->pc += 4; if (cpu->pc > code_size) break; /* cpu_decode() is called within this function */ cpu_execute(cpu, raw); if (!cpu->pc) break; } ``` --- ### 2. Fetch Stage Recall that the binary instructions were loaded into the **beginning of the RAM**. Therefore, `cpu_fetch()` simply calls `ram_load32()` to read a 32-bit instruction from memory: ```c static uint32_t ram_load32(ram_t* mem, uint32_t addr) { size_t index = addr - RAM_BASE; return ((uint32_t)mem->mem[index] | ((uint32_t)mem->mem[index + 1] << 8) | ((uint32_t)mem->mem[index + 2] << 16) | ((uint32_t)mem->mem[index + 3] << 24)); } ``` Since the RAM is **byte-addressable**, the function reads four bytes individually and combines them in **little-endian** order to form a 32-bit instruction. --- ### 3. Decode Stage All RISC-V instruction formats share consistent field positions. Regardless of the instruction type, fields like `opcode`, `rd`, `rs1`, `rs2`, and `funct3` always occupy the same bit positions. Therefore, we can decode these common fields directly, while leaving the **immediate value** (`imm`) for later stages, as its layout differs by instruction type. ![RISC-V instruction format](https://hackmd.io/_uploads/SJ_xSGT0gl.png) ```c void cpu_decode(uint32_t raw, insn_t* inst) { uint8_t opcode = raw & OPCODE_MASK; uint8_t rd = (raw >> RD_SHIFT) & REG_ADDR_MASK; uint8_t rs1 = (raw >> RS1_SHIFT) & REG_ADDR_MASK; uint8_t rs2 = (raw >> RS2_SHIFT) & REG_ADDR_MASK; uint8_t funct3 = (raw >> FUNCT3_SHIFT) & FUNCT3_MASK; inst->opcode = opcode; inst->rd = rd; inst->rs1 = rs1; inst->rs2 = rs2; inst->funct3 = funct3; } ``` --- ### 4. Execute Stage The **execution stage** uses a `switch` statement to select the appropriate behavior based on the decoded `opcode`. Each case performs the corresponding operation in C. ```c void cpu_execute(cpu_t* cpu, uint32_t raw) { cpu_decode(raw, &inst); cpu->regs[0] = 0; switch (inst.opcode) { case INTEGER_COMP_RI: { int32_t imm = i_imm(raw); switch (inst.funct3) { case XORI: cpu->regs[inst.rd] = cpu->regs[inst.rs1] ^ imm; break; case ORI: cpu->regs[inst.rd] = cpu->regs[inst.rs1] | imm; break; case ADDI: cpu->regs[inst.rd] = cpu->regs[inst.rs1] + imm; break; default: fatal("Unknown FUNCT3 0x%x for opcode 0x%x at PC 0x%08lx\n", inst.funct3, inst.opcode, cpu->pc); break; } break; } case LOAD: { int32_t imm = i_imm(raw); uint32_t addr = cpu->regs[inst.rs1] + imm; if (inst.funct3 != LW) fatal("Unknown FUNCT3 for LOAD: 0x%x\n", inst.funct3); cpu->regs[inst.rd] = cpu_load(cpu, addr); break; } case STORE: { int32_t imm = s_imm(raw); uint32_t addr = cpu->regs[inst.rs1] + imm; if (inst.funct3 != SW) fatal("Unknown FUNCT3 for STORE: 0x%x\n", inst.funct3); cpu_store(cpu, addr, cpu->regs[inst.rs2]); break; } case JALR: { int32_t imm = i_imm(raw); uint32_t target = (cpu->regs[inst.rs1] + imm) & ~1; cpu->regs[inst.rd] = cpu->pc; cpu->pc = target; return; /* Do not increment PC */ } case AUIPC: { int32_t imm = u_imm(raw); cpu->regs[inst.rd] = cpu->pc + imm; break; } case ECALL: if (raw == 0x100073) break; ecall_handler(cpu); break; default: fatal("Illegal instruction: opcode 0x%x\n", inst.opcode); } } ``` --- ### Missing Immediate Decoder In the original Problem B source code, the **I-type immediate decoder** was not implemented. To fix this, we added the following helper function: ```c static inline int32_t i_imm(uint32_t raw) { /* imm[11:0] = inst[31:20] */ int32_t imm = (raw >> 20) & 0xfff; /* If immediate is negative, perform sign extension */ if (imm & 0x800) imm |= 0xfffff000; return imm; } ``` This ensures that I-type instructions such as `addi`, `lw`, and `jalr` can correctly extract and sign-extend their 12-bit immediate values. ## Unconditional Jumps This section discusses the implementation of unconditional jump instructions and cross-references the RISC-V specification [**RV32I Base Integer Instruction Set, Version 2.1 — §6.1 Unconditional Jumps**](https://docs.riscv.org/reference/isa/unpriv/rv32.html#unconditional-jumps). --- ### `JALR` Instruction #### Original Implementation The original source code handling the `JALR` instruction is as follows: ```c case JALR: { int32_t imm = i_imm(raw); uint32_t target = (cpu->regs[inst.rs1] + imm) & ~1; cpu->regs[inst.rd] = cpu->pc; cpu->pc = target; return; /* Don't increment PC */ } ``` --- #### Target Address Calculation In this implementation, the target address is calculated by directly adding the sign-extended 12-bit immediate to the register operand (`rs1`), then clearing the least-significant bit (LSB): > *The target address is obtained by adding the sign-extended 12-bit I-immediate to register rs1, then setting the least-significant bit of the result to zero.* Although this method simplifies hardware and conforms to the specification, it omits an explicit alignment check before clearing the LSB. Conceptually, one might expect an alignment check to ensure the target address is word-aligned (four-byte boundary). However, the RISC-V specification explicitly allows this simplification, emphasizing its design rationale: > *Clearing the least-significant bit when calculating the JALR target address both simplifies the hardware slightly and allows the low bit of function pointers to be used to store auxiliary information.* Even so, an implementation may still wish to verify that the final target address is legal or within valid memory bounds: ```c check_addr(target); ``` --- #### Misalignment Exception Ambiguity :::warning I am confuse that in the specifications says >*The JAL and JALR instructions will generate an instruction-address-misaligned exception if the target address is not aligned to a four-byte boundary.* But, JALR clear the least-significant bit of target address, it is impossible for this misalignment exception to occur. ::: --- #### Register Assignment According to the RISC-V specification: > *JAL stores the address of the instruction following the jump (**PC + 4**) into register rd.* In our implementation, the program counter (`pc`) has already advanced by 4 bytes during the main fetch-decode-execute loop. Therefore, we directly store the current `pc` value (which already points to the next instruction) in `rd`, without incrementing it again. If the instruction pipeline requires the *previous* PC, it must explicitly subtract 4 bytes to obtain the address of the jump instruction. --- ### Comparison with `JAL` #### Target Address Calculation The `JAL` instruction uses a different immediate encoding format (20 bits instead of 12). We decode the immediate field according to the specification: ```c static inline int32_t j_imm(uint32_t raw) { /** * imm[20|10:5] = inst[31:25], * imm[4:1,11] = inst[24:20], * imm[19:12] = inst[19:12] */ uint32_t imm_10_5 = (raw >> 25) & 0x3f; uint32_t imm_20 = (raw >> 31) & 0x1; uint32_t imm_4_1 = (raw >> 21) & 0xf; uint32_t imm_11 = (raw >> 20) & 0x1; uint32_t imm_19_12 = (raw >> 12) & 0xff; int32_t imm = (imm_20 << 20) | (imm_19_12 << 12) | (imm_11 << 11) | (imm_10_5 << 5) | (imm_4_1 << 1); if (imm & 0x80000) imm |= 0xfff00000; /* sign extension */ return imm; } ``` > *The offset is sign-extended and added to the address of the jump instruction to form the jump target address. Jumps can therefore target a ±1 MiB range.* As discussed in the previous section, we adjust the `pc` value because it has already been incremented in the main loop: ```c int32_t imm = j_imm(raw); uint32_t target = (cpu->pc - 4 + imm); ``` Unlike `JALR`, the `JAL` instruction **does not clear the least-significant bit**. Therefore, the implementation must explicitly check whether the computed target address is aligned to a 4-byte boundary: ```c if (target & 0x1u) fatal("instruction-address-misaligned\n"); ``` Additionally, we perform a legality check to ensure the target address is within valid memory: ```c check_addr(target); ``` --- #### Register Assignment > *The address of the instruction following the jump (PC + 4) is written to register rd.* Thus, our implementation becomes: ```c cpu->regs[inst.rd] = cpu->pc; cpu->pc = target; ``` --- ### Summary of Key Differences | Feature | `JAL` | `JALR` | | ------------------- | ------------------------------------- | --------------------------------------------- | | Immediate format | 20-bit (J-type) | 12-bit (I-type) | | Base address | PC | Register `rs1` | | LSB handling | Must check 4-byte alignment | Clears LSB automatically | | Target range | ±1 MiB | 32-bit absolute range (via `auipc + jalr`) | | Exception condition | Misaligned target may raise exception | Theoretically impossible (LSB always cleared) | ## Test To verify the instructions, we build a simple test program that prints `ok\n` on the screen. ```asm .data msg_o: .word 111 msg_k: .word 107 msg_change_line: .word 10 .text main: jal ra, print_msg # Make the system call to exit the program addi a0, x0, 0 # exit(0) addi a7, x0, 10 ecall print_msg: # We don't have system call to print the string # Make the system call to print every character "ok\n" addi a7, x0, 11 lui s0, %hi(msg_o) addi s0, s0, %lo(msg_o) lw a0, 0(s0) ecall addi s0, s0, 4 lw a0, 0(s0) ecall addi s0, s0, 4 lw a0, 0(s0) ecall jalr x0, ra, 0 ``` **Linker Script:** Since our simulator starts execution from the first line of the binary, we manually place the `.data` section after the `.text` section to ensure the instruction area is located before the data area. ```linker SECTIONS { .text : { *(.text*) } .data : { *(.data*) } } ``` **Result:** After building and running the test program on our simulator: ```bash $ make check gcc -Wall -O2 -o q3-b q3-b.c riscv-none-elf-as -o test.o test.s riscv-none-elf-ld -T linker.ld -o test.elf test.o riscv-none-elf-ld: warning: test.elf has a LOAD segment with RWX permissions riscv-none-elf-objcopy -O binary test.elf test ./q3-b ./test ok ```