# Term Project: Implement A (atomic) extension for [srv32](https://github.com/kuopinghsu/srv32)^MIT^ contributed by < [geniuseric](https://github.com/geniuseric) > ## “A” Standard Extension **1. Usage of Atomic Instructions** - The detailed information is described in [riscv-isa-manual](https://five-embeddev.com/riscv-isa-manual/latest/a.html). - “A” extension contains instructions that atomically read-modify-write memory to support synchronization between multiple RISC-V harts running in the same memory space. - The two forms of atomic instruction provided are load-reserved/store-conditional instructions and atomic fetch-and-op memory instructions. **2. Load-Reserved/Store-Conditional Instructions** ![](https://i.imgur.com/d0QyTK3.png) - ***LR.W*** loads a word from the address in **rs1**, places the sign-extended value in **rd**, and registers a reservation set—a set of bytes that subsumes the bytes in the addressed word. - ***SC.W*** conditionally writes a word in **rs2** to the address in **rs1**: the ***SC.W*** succeeds only if the reservation is still valid and the reservation set contains the bytes being written. - If the ***SC.W*** succeeds, the instruction writes the word in **rs2** to memory, and it writes zero to **rd**. If the ***SC.W*** fails, the instruction does not write to memory, and it writes a nonzero value to **rd**. - Regardless of success or failure, executing an ***SC.W*** instruction invalidates any reservation held by this hart. **3. Atomic Memory Operations ![](https://i.imgur.com/eoMKboY.png) - The atomic memory operation (***AMO***) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an **R-type** instruction format. - These ***AMO*** instructions atomically load a data value from the address in **rs1**, place the value into register **rd**, apply a binary operator to the loaded value and the original value in **rs2**, then store the result back to the address in **rs1**. ## Implementation **1. RISC-V Card** - [RISC-V Card](https://github.com/jameslzhu/riscv-card/blob/master/riscv-card.pdf) is an unofficial reference sheet for RISC-V. - RV32A Atomic Extension is listed below. ![](https://i.imgur.com/YynEFWw.png) **2. Modification** - The code is written in [Term_Project](https://github.com/geniuseric/Computer_Architecture/tree/master/Term_Project). - I only implement **A** extension for instruction set simulator (**ISS**). Therefore, I can compare results of **ISS** and register transfer language (**RTL**). - Modify code in sw/common/Makefile.common: ```-march=rv32ima``` ```c ifeq ($(rv32c), 1) ARCH := -march=rv32imac -mabi=ilp32 else ARCH := -march=rv32ima -mabi=ilp32 endif ``` - Add code in tools/opcode.h (1) Add **opcode** of ***AMO***: ```OP_AMO = 0x2F``` ```c enum { OP_AUIPC = 0x17, // U-type OP_LUI = 0x37, // U-type OP_JAL = 0x6f, // J-type OP_JALR = 0x67, // I-type OP_BRANCH = 0x63, // B-type OP_LOAD = 0x03, // I-type OP_STORE = 0x23, // S-type OP_ARITHI = 0x13, // I-type OP_ARITHR = 0x33, // R-type OP_FENCE = 0x0f, OP_SYSTEM = 0x73, OP_AMO = 0x2F }; ``` (2) Add **funct5** of ***AMO*** ```c enum{ OP_LR = 0x2, OP_CL = 0x3, OP_AMOSWAP = 0x1, OP_AMOADD = 0x0, OP_AMOAND = 0xC, OP_AMOOR = 0xA, OP_AMOXOR = 0x4, OP_AMOMAX = 0x14, OP_AMOMIN = 0x10 }; ``` - Add code in tools/rvsim.c (1) Add **reservation** of ***AMO*** ```c int reserve_valid = 0; unsigned int reserve_set; ``` (2) Add **operation** of ***AMO*** ```c case OP_AMO: { // R-Type TIME_LOG; TRACE_LOG "%08x %08x", pc, inst.inst TRACE_END; int address; address = regs[inst.r.rs1]; // address = R[rs1] if (address < DMEM_BASE || address > DMEM_BASE+DMEM_SIZE) break; int memdata; address = DVA2PA(address); memdata = dmem[address/4]; // memdata = M[R[rs1]] int regdata; regdata = regs[inst.r.rs2]; // regdata = R[rs2] int res; switch(inst.r.func7 >> 2) { case OP_LR: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] reserve_set = memdata; // set = M[R[rs1]] reserve_valid = 1; // valid = 1 break; case OP_CL: if (singleram) CYCLE_ADD(1); if (reserve_valid == 1 && reserve_set == memdata){ dmem[address/4] = regdata; // M[R[rs1]] = R[rs2] regs[inst.r.rd] = 0; // R[rd] = 0 } else{ regs[inst.r.rd] = 1; // R[rd] = 1 } reserve_valid = 0; // valid = 0 break; case OP_AMOSWAP: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] dmem[address/4] = regdata; // M[R[rs1]] = R[rs2] break; case OP_AMOADD: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] dmem[address/4] = memdata + regdata; // M[R[rs1]] = M[R[rs1]] + R[rs2] break; case OP_AMOAND: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] dmem[address/4] = memdata & regdata; // M[R[rs1]] = M[R[rs1]] & R[rs2] break; case OP_AMOOR: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] dmem[address/4] = memdata | regdata; // M[R[rs1]] = M[R[rs1]] | R[rs2] break; case OP_AMOXOR: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] dmem[address/4] = memdata ^ regdata; // M[R[rs1]] = M[R[rs1]] ^ R[rs2] break; case OP_AMOMAX: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] res = memdata > regdata ? memdata : regdata; // result = max(M[R[rs1]], R[rs2]) dmem[address/4] = res; // M[R[rs1]] = result break; case OP_AMOMIN: if (singleram) CYCLE_ADD(1); regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]] res = memdata < regdata ? memdata : regdata; // result = min(M[R[rs1]], R[rs2]) dmem[address/4] = res; // M[R[rs1]] = result break; default: printf("Unknown instruction at PC 0x%08x\n", pc); TRAP(TRAP_INST_ILL, inst.inst); continue; } break; } ``` ## Verification **1. Show Result** - STORE Instruction In order to verify the correct result of the simulator, I have to print something in the terminal. The way to print is using ```STORE``` instruction with the address given as ```MMIO_PUTC```. The instruction prints a character through ```putchar()```. The code below is described in tools/rvsim.c. ```c #define MMIO_PUTC 0x9000001c case OP_STORE: { // S-Type ... switch(address) { case MMIO_PUTC: putchar((char)data); fflush(stdout); break; ... ``` - ASCII Code ```putchar()``` takes integer input which follows ```ASCII``` format. The table below shows the relationship of Decimal, Hex and Char. ![](https://i.imgur.com/n5o6wPg.png) **2. Load-Reserved/Store-Conditional Instructions** - Assembly Code This code is written in sw/atomic1. It tests ```lr.w``` and ```sc.w``` instructions. The expected result is ```Result 2 + 5 = 7```. ```asm= .data array: .word 2, 5 string: .word 82, 101, 115, 117, 108, 116, 32, 50, 32, 43, 32, 53, 32, 61, 32 size: .word 15 base: .word 48 .text .global main main: la s0, array # load address of array lr.w t0, (s0) # load first word lw t1, 4(s0) # load second word add t0, t0, t1 # add two words sc.w t1, t0, (s0) # store result bne t1, x0, end # jump to end lui s0, 0x90000 # load immediate la s1, string # load address of string lw t1, size # load word of string size print_str: lw t2, 0(s1) # load word sw t2, 28(s0) # print word addi t1, t1, -1 # size = size - 1 beq t1, x0, print_res # jump to print_int addi s1, s1, 4 # address = address + 4 j print_str # jump to print_str print_res: lw t1, base # load word of base add t1, t0, t1 # add result and base sw t1, 28(s0) # print a word li t1, 10 # load immediate of next line sw t1, 28(s0) # print next line ret # exit end: ret # exit ``` - Console Output (1) RTL can't understand ```lr.w``` and ```sc.w``` instructions. It doesn't give the expected result. (2) ISS successfully executes ```lr.w``` and ```sc.w``` instructions. It gives the expected result. ```c= airobots@airobots-System-Product-Name:~/srv32$ make atomic1 make[1]: Entering directory '/home/airobots/srv32/sw' make -C common make[2]: Entering directory '/home/airobots/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/airobots/srv32/sw/common' make[2]: Entering directory '/home/airobots/srv32/sw/atomic1' riscv-none-embed-gcc -O3 -Wall -march=rv32ima -mabi=ilp32 -nostartfiles -nostdlib -L../common -o atomic1.elf atomic1.s -lc -lm -lgcc -lsys -T ../common/default.ld riscv-none-embed-objcopy -j .text -O binary atomic1.elf imem.bin riscv-none-embed-objcopy -j .data -O binary atomic1.elf dmem.bin riscv-none-embed-objcopy -O binary atomic1.elf memory.bin riscv-none-embed-objdump -d atomic1.elf > atomic1.dis riscv-none-embed-readelf -a atomic1.elf > atomic1.symbol make[2]: Leaving directory '/home/airobots/srv32/sw/atomic1' make[1]: Leaving directory '/home/airobots/srv32/sw' make[1]: Entering directory '/home/airobots/srv32/sim' Illegal instruction at PC 0x00000044 Illegal instruction at PC 0x00000050 Excuting 114 instructions, 138 cycles, 1.210 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.007 s Simulation cycles: 149 Simulation speed : 0.0212857 MHz make[1]: Leaving directory '/home/airobots/srv32/sim' make[1]: Entering directory '/home/airobots/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/atomic1/atomic1.elf Result 2 + 5 = 7 Excuting 125 instructions, 161 cycles, 1.288 CPI Program terminate Simulation statistics ===================== Simulation time : 0.000 s Simulation cycles: 161 Simulation speed : 2.300 MHz make[1]: Leaving directory '/home/airobots/srv32/tools' ``` **3. Atomic Memory Operations** - Assembly Code This code is written in sw/atomic2. It tests ```amoswap.w.aq``` and ```amoswap.w.rl``` instructions. The expected result is ```Critical section```. ```asm= .data lock: .word 0 string: .word 67, 114, 105, 116, 105, 99, 97, 108, 32, 115, 101, 99, 116, 105, 111, 110, 10 size: .word 17 .text .global main main: la s0, lock # load address of lock li t0, 1 # load swap value again: lw t1, 0(s0) # load word of lock bne t1, x0, again # jump to again amoswap.w.aq t1, t0, (s0) # aquire lock bne t1, x0, again # jump to again lui s1, 0x90000 # load immediate la s2, string # load address of string lw t1, size # load word of string size critical: lw t0, 0(s2) # load word sw t0, 28(s1) # print word addi t1, t1, -1 # size = size - 1 beq t1, x0, end # jump to end addi s2, s2, 4 # address = address + 4 j critical # jump to critical end: amoswap.w.rl x0, x0, (s0) # release lock ret # exit ``` - Console Output (1) RTL can't understand ```amoswap.w.aq``` and ```amoswap.w.rl``` instructions. However, the lock value is initally set to 0. The program still executes critical section but doesn't set lock value to 1. It gives the expected result but the process is wrong. (2) ISS successfully executes ```amoswap.w.aq``` and ```amoswap.w.rl``` instructions. It gives the expected result. ```c= airobots@airobots-System-Product-Name:~/srv32$ make atomic2 make[1]: Entering directory '/home/airobots/srv32/sw' make -C common make[2]: Entering directory '/home/airobots/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/airobots/srv32/sw/common' make[2]: Entering directory '/home/airobots/srv32/sw/atomic2' riscv-none-embed-gcc -O3 -Wall -march=rv32ima -mabi=ilp32 -nostartfiles -nostdlib -L../common -o atomic2.elf atomic2.s -lc -lm -lgcc -lsys -T ../common/default.ld riscv-none-embed-objcopy -j .text -O binary atomic2.elf imem.bin riscv-none-embed-objcopy -j .data -O binary atomic2.elf dmem.bin riscv-none-embed-objcopy -O binary atomic2.elf memory.bin riscv-none-embed-objdump -d atomic2.elf > atomic2.dis riscv-none-embed-readelf -a atomic2.elf > atomic2.symbol make[2]: Leaving directory '/home/airobots/srv32/sw/atomic2' make[1]: Leaving directory '/home/airobots/srv32/sw' make[1]: Entering directory '/home/airobots/srv32/sim' Illegal instruction at PC 0x00000050 Critical section Illegal instruction at PC 0x00000084 Excuting 220 instructions, 276 cycles, 1.254 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.012 s Simulation cycles: 287 Simulation speed : 0.0239167 MHz make[1]: Leaving directory '/home/airobots/srv32/sim' make[1]: Entering directory '/home/airobots/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/atomic2/atomic2.elf Critical section Excuting 132 instructions, 172 cycles, 1.303 CPI Program terminate Simulation statistics ===================== Simulation time : 0.000 s Simulation cycles: 172 Simulation speed : 2.098 MHz make[1]: Leaving directory '/home/airobots/srv32/tools' ```