owned this note
owned this note
Published
Linked with GitHub
# Term Project: Implement A (atomic) extension for [srv32](https://github.com/kuopinghsu/srv32)^MIT^
contributed by < [geniuseric](https://github.com/geniuseric) >
## “A” Standard Extension
**1. Usage of Atomic Instructions**
- The detailed information is described in [riscv-isa-manual](https://five-embeddev.com/riscv-isa-manual/latest/a.html).
- “A” extension contains instructions that atomically read-modify-write memory to support synchronization between multiple RISC-V harts running in the same memory space.
- The two forms of atomic instruction provided are load-reserved/store-conditional instructions and atomic fetch-and-op memory instructions.
**2. Load-Reserved/Store-Conditional Instructions**

- ***LR.W*** loads a word from the address in **rs1**, places the sign-extended value in **rd**, and registers a reservation set—a set of bytes that subsumes the bytes in the addressed word.
- ***SC.W*** conditionally writes a word in **rs2** to the address in **rs1**: the ***SC.W*** succeeds only if the reservation is still valid and the reservation set contains the bytes being written.
- If the ***SC.W*** succeeds, the instruction writes the word in **rs2** to memory, and it writes zero to **rd**. If the ***SC.W*** fails, the instruction does not write to memory, and it writes a nonzero value to **rd**.
- Regardless of success or failure, executing an ***SC.W*** instruction invalidates any reservation held by this hart.
**3. Atomic Memory Operations

- The atomic memory operation (***AMO***) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an **R-type** instruction format.
- These ***AMO*** instructions atomically load a data value from the address in **rs1**, place the value into register **rd**, apply a binary operator to the loaded value and the original value in **rs2**, then store the result back to the address in **rs1**.
## Implementation
**1. RISC-V Card**
- [RISC-V Card](https://github.com/jameslzhu/riscv-card/blob/master/riscv-card.pdf) is an unofficial reference sheet for RISC-V.
- RV32A Atomic Extension is listed below.

**2. Modification**
- The code is written in [Term_Project](https://github.com/geniuseric/Computer_Architecture/tree/master/Term_Project).
- I only implement **A** extension for instruction set simulator (**ISS**). Therefore, I can compare results of **ISS** and register transfer language (**RTL**).
- Modify code in sw/common/Makefile.common: ```-march=rv32ima```
```c
ifeq ($(rv32c), 1)
ARCH := -march=rv32imac -mabi=ilp32
else
ARCH := -march=rv32ima -mabi=ilp32
endif
```
- Add code in tools/opcode.h
(1) Add **opcode** of ***AMO***: ```OP_AMO = 0x2F```
```c
enum {
OP_AUIPC = 0x17, // U-type
OP_LUI = 0x37, // U-type
OP_JAL = 0x6f, // J-type
OP_JALR = 0x67, // I-type
OP_BRANCH = 0x63, // B-type
OP_LOAD = 0x03, // I-type
OP_STORE = 0x23, // S-type
OP_ARITHI = 0x13, // I-type
OP_ARITHR = 0x33, // R-type
OP_FENCE = 0x0f,
OP_SYSTEM = 0x73,
OP_AMO = 0x2F
};
```
(2) Add **funct5** of ***AMO***
```c
enum{
OP_LR = 0x2,
OP_CL = 0x3,
OP_AMOSWAP = 0x1,
OP_AMOADD = 0x0,
OP_AMOAND = 0xC,
OP_AMOOR = 0xA,
OP_AMOXOR = 0x4,
OP_AMOMAX = 0x14,
OP_AMOMIN = 0x10
};
```
- Add code in tools/rvsim.c
(1) Add **reservation** of ***AMO***
```c
int reserve_valid = 0;
unsigned int reserve_set;
```
(2) Add **operation** of ***AMO***
```c
case OP_AMO: { // R-Type
TIME_LOG; TRACE_LOG "%08x %08x", pc, inst.inst TRACE_END;
int address;
address = regs[inst.r.rs1]; // address = R[rs1]
if (address < DMEM_BASE || address > DMEM_BASE+DMEM_SIZE)
break;
int memdata;
address = DVA2PA(address);
memdata = dmem[address/4]; // memdata = M[R[rs1]]
int regdata;
regdata = regs[inst.r.rs2]; // regdata = R[rs2]
int res;
switch(inst.r.func7 >> 2) {
case OP_LR:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
reserve_set = memdata; // set = M[R[rs1]]
reserve_valid = 1; // valid = 1
break;
case OP_CL:
if (singleram) CYCLE_ADD(1);
if (reserve_valid == 1 && reserve_set == memdata){
dmem[address/4] = regdata; // M[R[rs1]] = R[rs2]
regs[inst.r.rd] = 0; // R[rd] = 0
}
else{
regs[inst.r.rd] = 1; // R[rd] = 1
}
reserve_valid = 0; // valid = 0
break;
case OP_AMOSWAP:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
dmem[address/4] = regdata; // M[R[rs1]] = R[rs2]
break;
case OP_AMOADD:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
dmem[address/4] = memdata + regdata; // M[R[rs1]] = M[R[rs1]] + R[rs2]
break;
case OP_AMOAND:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
dmem[address/4] = memdata & regdata; // M[R[rs1]] = M[R[rs1]] & R[rs2]
break;
case OP_AMOOR:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
dmem[address/4] = memdata | regdata; // M[R[rs1]] = M[R[rs1]] | R[rs2]
break;
case OP_AMOXOR:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
dmem[address/4] = memdata ^ regdata; // M[R[rs1]] = M[R[rs1]] ^ R[rs2]
break;
case OP_AMOMAX:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
res = memdata > regdata ? memdata : regdata; // result = max(M[R[rs1]], R[rs2])
dmem[address/4] = res; // M[R[rs1]] = result
break;
case OP_AMOMIN:
if (singleram) CYCLE_ADD(1);
regs[inst.r.rd] = memdata; // R[rd] = M[R[rs1]]
res = memdata < regdata ? memdata : regdata; // result = min(M[R[rs1]], R[rs2])
dmem[address/4] = res; // M[R[rs1]] = result
break;
default:
printf("Unknown instruction at PC 0x%08x\n", pc);
TRAP(TRAP_INST_ILL, inst.inst);
continue;
}
break;
}
```
## Verification
**1. Show Result**
- STORE Instruction
In order to verify the correct result of the simulator, I have to print something in the terminal. The way to print is using ```STORE``` instruction with the address given as ```MMIO_PUTC```. The instruction prints a character through ```putchar()```. The code below is described in tools/rvsim.c.
```c
#define MMIO_PUTC 0x9000001c
case OP_STORE: { // S-Type
...
switch(address) {
case MMIO_PUTC:
putchar((char)data);
fflush(stdout);
break;
...
```
- ASCII Code
```putchar()``` takes integer input which follows ```ASCII``` format. The table below shows the relationship of Decimal, Hex and Char.

**2. Load-Reserved/Store-Conditional Instructions**
- Assembly Code
This code is written in sw/atomic1. It tests ```lr.w``` and ```sc.w``` instructions. The expected result is ```Result 2 + 5 = 7```.
```asm=
.data
array: .word 2, 5
string: .word 82, 101, 115, 117, 108, 116, 32, 50, 32, 43, 32, 53, 32, 61, 32
size: .word 15
base: .word 48
.text
.global main
main:
la s0, array # load address of array
lr.w t0, (s0) # load first word
lw t1, 4(s0) # load second word
add t0, t0, t1 # add two words
sc.w t1, t0, (s0) # store result
bne t1, x0, end # jump to end
lui s0, 0x90000 # load immediate
la s1, string # load address of string
lw t1, size # load word of string size
print_str:
lw t2, 0(s1) # load word
sw t2, 28(s0) # print word
addi t1, t1, -1 # size = size - 1
beq t1, x0, print_res # jump to print_int
addi s1, s1, 4 # address = address + 4
j print_str # jump to print_str
print_res:
lw t1, base # load word of base
add t1, t0, t1 # add result and base
sw t1, 28(s0) # print a word
li t1, 10 # load immediate of next line
sw t1, 28(s0) # print next line
ret # exit
end:
ret # exit
```
- Console Output
(1) RTL can't understand ```lr.w``` and ```sc.w``` instructions. It doesn't give the expected result.
(2) ISS successfully executes ```lr.w``` and ```sc.w``` instructions. It gives the expected result.
```c=
airobots@airobots-System-Product-Name:~/srv32$ make atomic1
make[1]: Entering directory '/home/airobots/srv32/sw'
make -C common
make[2]: Entering directory '/home/airobots/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/airobots/srv32/sw/common'
make[2]: Entering directory '/home/airobots/srv32/sw/atomic1'
riscv-none-embed-gcc -O3 -Wall -march=rv32ima -mabi=ilp32 -nostartfiles -nostdlib -L../common -o atomic1.elf atomic1.s -lc -lm -lgcc -lsys -T ../common/default.ld
riscv-none-embed-objcopy -j .text -O binary atomic1.elf imem.bin
riscv-none-embed-objcopy -j .data -O binary atomic1.elf dmem.bin
riscv-none-embed-objcopy -O binary atomic1.elf memory.bin
riscv-none-embed-objdump -d atomic1.elf > atomic1.dis
riscv-none-embed-readelf -a atomic1.elf > atomic1.symbol
make[2]: Leaving directory '/home/airobots/srv32/sw/atomic1'
make[1]: Leaving directory '/home/airobots/srv32/sw'
make[1]: Entering directory '/home/airobots/srv32/sim'
Illegal instruction at PC 0x00000044
Illegal instruction at PC 0x00000050
Excuting 114 instructions, 138 cycles, 1.210 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.007 s
Simulation cycles: 149
Simulation speed : 0.0212857 MHz
make[1]: Leaving directory '/home/airobots/srv32/sim'
make[1]: Entering directory '/home/airobots/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/atomic1/atomic1.elf
Result 2 + 5 = 7
Excuting 125 instructions, 161 cycles, 1.288 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.000 s
Simulation cycles: 161
Simulation speed : 2.300 MHz
make[1]: Leaving directory '/home/airobots/srv32/tools'
```
**3. Atomic Memory Operations**
- Assembly Code
This code is written in sw/atomic2. It tests ```amoswap.w.aq``` and ```amoswap.w.rl``` instructions. The expected result is ```Critical section```.
```asm=
.data
lock: .word 0
string: .word 67, 114, 105, 116, 105, 99, 97, 108, 32, 115, 101, 99, 116, 105, 111, 110, 10
size: .word 17
.text
.global main
main:
la s0, lock # load address of lock
li t0, 1 # load swap value
again:
lw t1, 0(s0) # load word of lock
bne t1, x0, again # jump to again
amoswap.w.aq t1, t0, (s0) # aquire lock
bne t1, x0, again # jump to again
lui s1, 0x90000 # load immediate
la s2, string # load address of string
lw t1, size # load word of string size
critical:
lw t0, 0(s2) # load word
sw t0, 28(s1) # print word
addi t1, t1, -1 # size = size - 1
beq t1, x0, end # jump to end
addi s2, s2, 4 # address = address + 4
j critical # jump to critical
end:
amoswap.w.rl x0, x0, (s0) # release lock
ret # exit
```
- Console Output
(1) RTL can't understand ```amoswap.w.aq``` and ```amoswap.w.rl``` instructions. However, the lock value is initally set to 0. The program still executes critical section but doesn't set lock value to 1. It gives the expected result but the process is wrong.
(2) ISS successfully executes ```amoswap.w.aq``` and ```amoswap.w.rl``` instructions. It gives the expected result.
```c=
airobots@airobots-System-Product-Name:~/srv32$ make atomic2
make[1]: Entering directory '/home/airobots/srv32/sw'
make -C common
make[2]: Entering directory '/home/airobots/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/airobots/srv32/sw/common'
make[2]: Entering directory '/home/airobots/srv32/sw/atomic2'
riscv-none-embed-gcc -O3 -Wall -march=rv32ima -mabi=ilp32 -nostartfiles -nostdlib -L../common -o atomic2.elf atomic2.s -lc -lm -lgcc -lsys -T ../common/default.ld
riscv-none-embed-objcopy -j .text -O binary atomic2.elf imem.bin
riscv-none-embed-objcopy -j .data -O binary atomic2.elf dmem.bin
riscv-none-embed-objcopy -O binary atomic2.elf memory.bin
riscv-none-embed-objdump -d atomic2.elf > atomic2.dis
riscv-none-embed-readelf -a atomic2.elf > atomic2.symbol
make[2]: Leaving directory '/home/airobots/srv32/sw/atomic2'
make[1]: Leaving directory '/home/airobots/srv32/sw'
make[1]: Entering directory '/home/airobots/srv32/sim'
Illegal instruction at PC 0x00000050
Critical section
Illegal instruction at PC 0x00000084
Excuting 220 instructions, 276 cycles, 1.254 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.012 s
Simulation cycles: 287
Simulation speed : 0.0239167 MHz
make[1]: Leaving directory '/home/airobots/srv32/sim'
make[1]: Entering directory '/home/airobots/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/atomic2/atomic2.elf
Critical section
Excuting 132 instructions, 172 cycles, 1.303 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.000 s
Simulation cycles: 172
Simulation speed : 2.098 MHz
make[1]: Leaving directory '/home/airobots/srv32/tools'
```