[Toc]
# RISC-V edition
Reduced Instruction Set Computer ([RISC](https://en.wikipedia.org/wiki/Reduced_instruction_set_computer)): MIPS, ARM, RISC-V
Instruction Set Architecture ([ISA](https://en.wikipedia.org/wiki/Instruction_set_architecture)), the repertorie of instructions of a computer. Different computer have different instruction set but with many aspects in common.
Another two popular instrction set: MIPS and ARM.
__four principle of computer instruction__
1. simplicity favors regularity 簡單有利於規律性
2. smaller is faster 越小越快
3. good design demands compromise 綜合考慮
4. make the common fast 常用指令速度快
## resister
| Number | Name | Usage |
| ------ | ------- | --------------------------------- |
| 0 | x0 | the constant value 0 |
| 1 | x1(ra) | return address(link register) |
| 2 | x2(sp) | stack pointer |
| 3 | x3(gp) | global pointer |
| 4 | x4(tp) | thread pointer(local) |
| 5-7 | x5-x7 | temporary registers |
| 8-9 | x8-x9 | saved registers |
| 10-17 | x10-x17 | function auguments / return value |
| 18-27 | x18-x27 | saved registers |
| 28-31 | x28-x31 | temporary registers |
comparison with MIPS
| Number | Name | Usage |
| ------ | --------- | -------------------------------------------- |
| 0 | $zero | the constant value 0 |
| 1 | $at | reserved for assembler |
| 2-3 | $v0 - $v1 | first and second return values, respectively |
| 4-7 | $a0 - $a3 | first four arguments to functions |
| 8-15 | $t0 - $t7 | temporary registers |
| 16-23 | $s0 - $s7 | saved registers |
| 24-25 | $t8, $t9 | temporary registers |
| 26-27 | $k0, $k1 | reserved for kernel (operating system) |
| 28 | $gp | global pointer |
| 29 | $sp | stack pointer |
| 30 | $fp | frame pointer |
| 31 | $ra | return address |
## instruction formats
32-bits for an instruction

:::success
* ==__R-type__== funct(7)+rs2(5)+rs1(5)+funct(3)+rd(5)+opcode(7)
for arithemetic instruction format
* ==__I-type__== immediate(12)+rs1(5)+funct(3)+rd(5)+opcode(7)
for loads and immediate arithemetic
* ==__S-type__== immediate(7)+rs2(5)+rs1(5)+funct(3)+immediate(5)+opcode(7)
for stores and conditional branches
* ==__U-type__== immedtate(20)+rd(5)+opconde(7)
for upper immediate format
:::
_funct:_ the additional opcode field.
_rs1:_ the first register source operand.
_rs2:_ the second register source operand.
_rd:_ the register destination operand. It gets the results of the _operaton.
_opcode:_ the basic operation of the instruction.
## arithmetic instruction
* ```add $rd, $rs1, $rs2```
$rd = $rs1 + $rs2
* ```addu $rd, $rs1, $rs2``` (unsigned number)
$rd = $rs1 + $rs2
* ```addi $rd, $rs1, immed```
$rd = $rs1 + immed
* ```sub $rd, $rs1, $rs2```
$rd = $rs1 - $rs2
* ```subu $rd, $rs1, $rs2``` (unsigned number)
$rd = $rs1 - $rs2
* ```subi $rd, $rs1, immed```
$rd = $rs1 - immed
==EXAMPLE==
the C code of ```f = (g + h) + (i + j)```, and the variables of ```f, g, h, i, j``` are assigned to the register x19, x20, ... , x23 respectively.
```
add x5, x20, x21
add x6, x22, x23
sub x19, x5, x6
```
_*note: x5 and x6 are temporary registers._
---
1. unsigned binary intergers
range: 0 to +2<sup>n</sup> -1, using 32 bits, 0 to +4,294,967,295
EXAPMLE: 0000 0000 0000 0000 0000 0000 0000 1011<sub>2</sub>
= 11<sub>10</sub>
2. __2's complement__ signed integers
bit 31 is __sign bit__: 1 for negative numbers, 0 for non-negative numbers
range: -2<sup>n-1</sup> to +2<sup>n-1</sup> -1, using 32 bits, -2,147,483,648 to +2,147,483,647
EXAMPLE:
1111 1111 1111 1111 1111 1111 1111 1100<sub>2</sub>
= - 0000 0000 0000 0000 0000 0000 0000 0011<sub>2</sub> + 1<sub>10</sub>
= -(3<sub>10</sub> + 1<sub>10</sub>) = -4~10~
_most-negative_: 1000 0000 ... 0000
_most-positive_: 0111 1111 ... 1111
---
__hexadecimal__

## memory
__register operands__
RISC-V has 32x32-bit (RISCV-32), 32x64-bit (RISCV-64) register file variants.
| OS | pointers | int | long int | long long int |
| ----------------- | -------- | ------- | -------- | ------------- |
| Microsoft Windows | 64 bits | 32 bits | 32 bits | 64 bits |
| Linux, most Unix | 64 bits | 32 bits | 64 bits | 64 bits |
> 32-bit data is called a =="word"==(or 4 bytes).
> 64-bit data is called a "double word".
> __"byte address"__ means that the index points to a byte of memory.
>
> 32 general-purpose temporary registers, each temporary register has 32 bits or 4 bytes called a word.
> 32個通用型暫存器,每個暫存器有32bits稱一個word
>
> 2<sup>32</sup> bytes with addressed for 0 to 2<sup>32</sup> -1
> words are aligned in memory, so address must be a multiple of 4
* x5-x7, x28-x31 for temporary values
* x8-x9, x18-x27 for saved variables
| | register | memory |
| ------------ | -------- | ------ |
| capacity | small | large |
| access speed | fast | slow |
1. Since register have stored in ==processor==, the access speed is faster than memory.
2. Registers provide compiler some temporary registers when the compiler turn the high-level language into the low-level language, which could reduce the information communciation between processor and memory.
3. Register can improve code density.
---
* ```lw $rd = offset($rs1)```
_load word, memory to register_, $rd = $rs1[offset], the unit of offset is byte
* ```sw $rd = offset($rs1)```
_store word, register to memory_, $rd = $rs1[offset], the unit of offset is byte
__endian__
1. ==Big Endian (MIPS)==: most significant byte at least addredd of a word
最高位位元組放在最低記憶體位置上
2. ==Little Endian (RISC-V)==: least significant byte at least addredd of a word
最低位位元組放在最低記憶體位置上
3. ARM supports both endians
| | big endian | little endian |
|:---:|:----------------- | ----------------- |
| 0 | most significant | least significant |
| 1 | ... | ... |
| 2 | | |
| 3 | | |
| 4 | least significant | most significant |
| 5 | | |
| 6 | | |
| 7 | | |
| 8 | | |
```c
#include <stdio.h>
typedef union
{
unsigned long l;
unsigned char c[4];
} x;
int main()
{
x y;
y.l = 0x12345678;
if (y.c[0] == 0x78 && y.c[1] == 0x56 && y.c[2] == 0x34 && y.c[3] == 0x12)
printf("little endian\n");
else if (y.c[0] == 0x12 && y.c[1] == 0x34 && y.c[2] == 0x56 && y.c[3] == 0x78)
printf("big endian\n");
else
printf("unknown\n");
printf("address - 0x%lx:\n", y.l);
for (int i = 0; i < 4; i++)
printf("%p: 0x%02X\n", &y.c[i], y.c[i]);
}
```
RESULT
> CPU: Apple M1
> OS: MacOS Sonoma 14.0
> Compiler: gcc-13 (Homebrew GCC 13.2.0) 13.2.0
```zsh
$ gcc -std=c11 endian.c -o endian
little endian
address - 0x12345678:
0x16bacf270: 0x78
0x16bacf271: 0x56
0x16bacf272: 0x34
0x16bacf273: 0x12
```
==EXAMPLE1==
the C code of ```g = h + A[8]```, and the variables of ```g``` is assigned to the register of x20, ```h``` is assigned to the x21 and the base address of the array of int, ```A```, is in x22.
```
lw x9, 32(x22) # a word is 4 bytes(8x4=32)
add x20, x21, x9
```
==EXAMPLE2==
the C code of ```A[12] = h + A[8]```, and the variable of ```h``` is assigned to the register of x21 and base the address of the array of int, ```A```, is in x22
```
lw x9, 32(x22)
add x9 x21, x9
sw x9, 48(x22)
```
==EXAMPLE3== swap function
turn the C code segment
```c
swap(int v[], int k){
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
```
into RISC-V code.
array ```v[]``` is assigned in x10, ```k``` in x11, ```temp``` in x5
```
slli x6, x11, 2
add x6, x10, x6
lw x5, 0(x6)
lw x7, 4(x6)
sw x7, 0(x6)
sw x5, 4(x6)
jalr x0, 0(x1)
```
---
* ```lwu $rd = offset($rs1)``` (unsigned number)
$rd = $rs1[offset]
* ```swu $rd = offset($rs1)``` (unsigned number)
$rd = $rs1[offset]
## representing instruction in the computer
==EXAMPLE1==
turn ```add x9, x20, x21``` into machine code
:::info
_*note: the instruction format of ==R-type== is funct(7)+rs2(5)+rs1(5)+funct(3)+rd(5)+opcode(7)_
:::
the opcode of ```add``` is 51, the number of ```x9``` is 9, ```x20``` is 20 and ```x21``` is 21. and they're corresponding to op, rd, rs1, rs2.
| funct7 | rs2 | rs1 | funct3 | rd | opcode |
| --------- | ------- | ------- | ------ | ------- | --------- |
| 0b0000000 | 0b10101 | 0b10100 | 0b000 | 0b01001 | 0b0110011 |
which is 0x015A04B3.
==EXAMPLE2==
turn the C code segment```A[30] = h + A[30] +1``` into RISC-V code and machine code
if ```x10``` has the base of the array ```A``` and x21 corresponds to ```h```
```
lw x9, 120(x10) # offset = 30 x 4 bytes
add x9, x21, x9
addi x9, x9, 1
sw x9, 120(x10)
```
:::info
_*note1: the instruction format of ==I-type== is immediate(12)+rs1(5)+funct(3)+rd(5)+opcode(7)_
_*note2: immediate for constant operand, of offset added to base address_
_*note3: the instruction format of ==S-type== is immediate(7)+rs2(5)+rs1(5)+funct(3)+immediate(5)+opcode(7)_
:::
the machine code of ```lw``` is (I-type)
| immediate | rs1 | funct3 | rd | opcode |
| ------------------ | ---------- | ------- | --------- | ----------- |
| 120=0b000011110000 | 10=0b01010 | 2=0b010 | 9=0b01001 | 3=0b0000011 |
the machine code of ```add``` is (R-type)
| funct7 | rs2 | rs1 | funct3 | rd | opcode |
| ----------- | --------- | ---------- | ------- | --------- | ------------ |
| 0=0b0000000 | 9=0b01001 | 21=0b10101 | 0=0b000 | 9=0b01001 | 51=0b0110011 |
the machine code of ```addi``` is (I-type)
| immediate | rs1 | funct3 | rd | opcode |
| ---------------- | --------- | ------- | --------- | ------------ |
| 1=0b000000000001 | 9=0b01001 | 0=0b000 | 9=0b01001 | 19=0b0010011 |
the machine code of ```sw``` is (S-type)
| immediate | rs2 | rs1 | funct3 | immediate | opcode |
| ----------- | --------- | ---------- | ------- | ---------- | ------------ |
| 3=0b0000011 | 9=0b01001 | 10=0b01010 | 2=0b010 | 24=0b11000 | 35=0b0100011 |
---
:::danger
* __How to access an array element with displacement > 2^15^-1?__

:::
## bitwise / logical instruction
* ```sll $rd, $rs1, $rs2```
$rd = $rs1 << $rs2
* ```slli $rd, $rs1, immed```
slli by i bits multiple 2^i^
* ```srl $rd, $rs1, $rs2```
$rd = $rs1 >> $rs2
* ```srli $rd, $rs1, immed```
* ```sra $rd, $rs1, $rs2```
shift right arithemetic, $rd = $rs1 >> $rs2
* ```srai $rd, $rs1, immed```
* ```and $rd, $rs1, $rs2```
* ```andi $rd, $rs1, immed```
* ```or $rd, $rs1, $rs2```
* ```ori $rd, $rs1, $immed```
* ```xor $rd, $rs1, $rs2```
* ```xori $rd, $rs1, $immed```
replaced NOT operation used in other ISAs, set some bits to 1, leave others unchanged
==EXAMPLE==
```xor x9, x10, x11```
| name | machine code |
| ---- | --------------------------------------- |
| x9 | ```1111 1111 1111 1111 1111 0010 0011 1111``` |
| x10 | ```0000 0000 0000 0000 0000 1101 1100 0000``` |
| x11 | ```1111 1111 1111 1111 1111 1111 1111 1111``` |
## conditional branch
* ```beq $rs1, $rs2, $L1```
branch if equal, if($rs1 == $rs2) goto $L1
* ```bnq $rs1, $rs2, $L1```
branch if not equal, if($rs1 != $rs2) goto $L1
* ```blt $rs1, $rs2, $L1```
branch if less than, if($rs1 < $rs2) goto $L1
* ```bltu $rs1, $rs2, $L1```
branch if less than, unsigned, if($rs1 < $rs2) goto $L1
* ```bge $rs1, $rs2, $L1```
branch if greater or equal, if($rs1 >= $rs2) goto $L1
* ```bgeu $rs1, $rs2, $L1```
branch if greater or equal, unsigned, if($rs1 >= $rs2) goto $L1
==EXAMPLE==
turn the C code segment
```c
if (i == j) f = g + h;
else f = g - h;
```
into RISC-V code.
the variables ```f, g, h, i, j``` corresponding to x19, x20, ... , x23
```
bne x22, x23, Else # not(i == j), jump to Else
add x19, x20, x21
beq x0, x0, Exit # unconditional
Else: sub x19, x20, x21
Exit:
```
_*note: when the C code use "==", than the RISC-V code use "bne(!=)", vice versa._
==EXAMPLE==
turn the C code segment```if (a >= b) a += 1;``` into RISC-V code. ```a``` in x22, and ```b``` in x23.
```
blt x23, x22, Exit
addi x22, x22, 1
Exit:
```
:::info
Notice the signed and unsigned symbol
for example:
x22 = ```1111 1111 1111 1111 1111 1111 1111 1111```
x23 = ```0000 0000 0000 0000 0000 0000 0000 0001```
then
1. ```blt x22, x23, L1 # signed```
x22 < 23, since -1~10~ < +1~10~
2. ```bltu x22, x23, L1 #unsigned```
x22 > x23, since +4,294,957,295~10~ > +1~10~
:::
## loop
==EXAMPLE==
turn the C code segment
```c
while(save[i] == k)
i += 1;
```
into RISC-V code.
the variables ```i``` and ```k``` in x22 and x24, address of ```save``` in x25
```
Loop: slli x10, x22, 2 # word to byte address
add x10, x10, x25
lw x9, 0(x10)
bne x9, x24, Exit
addi x22, x22, 1
beq x0, x0, Loop
Exit:
```
| block | address |
| --------- | ------- |
| ... | ... |
| save[i+2] | x25+8 |
| save[i+1] | x25+4 |
| save[i] | x25 |
_*note: shift left 1 byte(int), 1 byte = 4 bits = 2^2^ bits, ```slli $rs, $rd, 2```_
## function call
in the execution of a procedure, the program must follow these step:
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.
RISC-V provides some register for function call
* ```x10-17``` eight parameter registers in which to pass parameters or return
values.
* ```x1```: one return address register
> ```x5-x7, x28-x31```: temporary register
> ```x8-x9, x18-x27```: saved register
---
* ```jal $x1, $L1```
procedure call, address of following instruction put in x1, and jump and link to the L1
* ```jalr x0, 0(x1)```
procedure return, jump to 0 + address in x1
use x0 as rd, which could not be changed
==EXAMPLE==
turn the C code segment
```c
int leaf_example(int g, int h, int i, int j){
int f;
f = (g + h) - (i + j);
return f;
}
```
into RISC-V code.
the arguments in ```g, h, i ,j``` in x10, x11, x12, x12 respectively, the ```f``` in x20(saved register), and the x5 and x6 for temporary register
```
leaf_example:
addi sp, sp, -12 # move down the stack pointer
sw x5, 8(sp) # push
sw x6, 4(sp) # push
sw x20, 0(sp) # push
add x5, x10, x11
add x6, x12, x13
sub x20, x5, x6
addi x10, x20, 0 # x10 is the return value of f
lw x20, 0(sp) # pop
lw x6, 4(sp) # pop
lw x5, 8(sp) # pop
addi sp, sp, 12
jalr x0, 0(x1)
```

## recursive
non-leaf procedure which call other procedure
the C code segment
```c
int fact(int n){
if (n < 1)
return 1;
else
return n * fact(n-1);
}
```
argument ```n``` in x10 and result in x10
the RISC-V code
```
fact:
addi sp, sp -8
sw x1, 4(sp)
sw x10, 0(sp)
addi x5, x10, -1
bge x5, x0, L1 # if (n - 1) >= 0, goto L1
addi x10, x10, 1 # else, return 1
addi sp, sp, 8 # pop
jalr x0, 0(x1) # return to caller
L1:
addi x10, x10, -1 # n-1
jal x1, fact # call n-1
addi x6, x10, 0 # move result of fact(n - 1) to x6
lw x10, 0(sp)
lw x1, 4(sp)
addi sp, sp, 8
mul x10, x10, x6 # return n * fact(n - 1)
jalr x0, 0(x1)
```


## memory layout

_stack_: automatic storgae for procedure calls
_caller_: the program provides the necessary parameter values.
_callee_: a procedure that executes a series of intructions based on parameters provided by the caller and then returns the control to the caller.
```c
#include <stdio.h>
const int global_x = 1; // data (read-only area)
int global_y = 1; // data (read-write area)
int global_z; // bss
int main()
{
const static int x = 1; // data (read-only area)
static int y = 1; // data (read-write area)
static int z; // bss
int w = 1; // stack
char *buf = (char *)malloc(sizeof(char) * 100); // heap
// ...
free(buf);
return 0;
}
```
## dealing with ASCII character
a character of ASCII is 8-bits length, we could not just use ```lw``` ans ```sw```. RISC-V provides some instruction below:
* ```lb $rd = offset($rs1)```
load byte, $rd = $rs1[offset]
* ```lbu $rd = offset($rs1)``` (unsigned number)
* ```sb $rd = offset($rs1)```
store byte, $rd = $rs1[offset]
==EXAMPLE==
turn the C code segment
```c
void strcpy (char x[], char y[]){
size_t i;
i = 0;
while ((x[i] = y[i]) != ‘\0’) /* copy & test byte */
i += 1;
}
```
into RISC-V code.
the base addressed for array ```x``` and ```y``` are in x10 and x11, while ```i``` is in x19.
```
strcpy:
addi sp, sp, -4
sw x19, 0(sp)
add x19, x0, x0 # i = 0 + 0
L1:
add x5, x19, x11 # address of y[i] in x5
lbu x6, 0(x5)
add x7, x19, x10
sb x6, 0(x7)
beq x6, x0, L2
addi x19, x19, 1
jal x0, L1
L2:
lw x19, 0(sp)
addi sp, sp , 4
jalr x0, 0(x1)
```
## dealing with unicode character
by defalut unicode, a character takes 16-bits which is a half-word.
* ```lh $rd = offset($rs1)```
load half word, $rd = $rs1[offset]
* ```lhu $rd = offset($rs1)``` (unsigned number)
* ```sh $rd = offset($rs1)```
:::info
__COMPARISON__: load and store(unsighed)
* load / store / load unsigned byte: ```lb/sb/lbu rd, offset($rs1)```
* load / store / load unsigned byte halfword: ```lh/sh/lhu rd, offset($rs1)```
* load / store / load unsigned byte word: ```lw/sw/lwu rd, offset($rs1)```
:::
## for the occasional 32-bits constant
* ```lui $rd, immed``` which load the upper __20-bits__
copy 20 bits constant to bits [31:12] of $rd
clear right 12-bits [11:0] of $rd to 0
:::info
_*note: using a new instuction format for upper immediate format,
==U-type== immedtate(20)+rd(5)+opconde(7)_
:::
for example:
how to load ```0000 0000 0011 1101 0000 0101 0000 0000``` in x19?
```
lui x19, 976 #0b00000000001111010000 = 976
addi x19, x19, 1280 #0b010100000000 = 1280
```
The final value in register x19 is the desired value:
```00000000 00111101 00000101 00000000```
## branch addressing
:::info
_*note:_
__S-type__: immediate(7)+rs2(5)+rs1(5)+funct(3)+immediate(5)+opcode(7)
_for stores and conditional branches_
__U-type__: immedtate(20)+rd(5)+opconde(7)
_for upper immediate format and jump addressing_
:::
* ==__SB-type__== immed[12]+immed[10:5]+rs2(5)+rs1(5)+funct(3)+immed[4:1]+immed[11]+opcode(7)
The address uses an unusual encoding, which simplifies datapath design but complicates assembly.
most branch targets are near branch
__PC-relative addressing = PC + immed x 2__
* ==__UJ-type__== immed[20]+immed[10:1]+immed[11]+immed[19:12]+rd(5)+opcode(7)
for long jump, e.g., to 32-bits absolute address
:::success
lui: load address [31:12] to temp register
jalr: add address [11:0] and jump to address
:::
==EXAMPLE==
the C code segment
```c
while(save[i] == k)
i += 1;
```
the RISC-V code
the variables ```i``` and ```k``` in x22 and x24, address of ```save``` in x25
```
Loop: slli x10, x22, 2 # word to byte address
add x10, x10, x25
lw x9, 0(x10)
bne x9, x24, Exit
addi x22, x22, 1
beq x0, x0, Loop
Exit:
```
assuming the loop at location 80000
the machine code will be:
| address | immed7 | rs2 | rs1 | funct3 | rd | opcode |
| ------- | ------- | ----------- | ----------- | ------ | ----------- | ------- |
| 80000 | 0000000 | 00010=2~2~ | 10110=22~2~ | 001 | 01010=10~2~ | 0010011 |
| 80004 | 0000000 | 11001=25~2~ | 01010=10~2~ | 000 | 01010=10~2~ | 0110011 |
| 80008 | 0000000 | 00000 | 01010=10~2~ | 011 | 01001=9~2~ | 0000011 |
| 80012 | 0000000 | 11000=24~2~ | 01001=9~2~ | 001 | 01100=12~2~ | 1100011 |
| 80016 | 0000000 | 00001=1~2~ | 10110=22~2~ | 000 | 10110=22~2~ | 0010011 |
| 80020 | 0000000 | 00000 | 00000 | 000 | 01101=13~2~ | 1100011 |
| 80024 | | | | | | |
The ```bne``` instruction on the fourth line adds 3 words or 12 bytes to the address of the instruction, specifying the branch destination relative to the branch instruction (12 + 80012) and not using the full destination address (80024).
The branch instruction on the last line does a similar calculation for a backwards branch (−20 + 80020), corresponding to the label Loop.

## RISCV addressing mode summary
1. _immediate addressing_: where the operand is a constant within the instruction itself
2. _register addressig_: where the operand is a register
3. _base on displacement addressing_: where the operand is at the memory location whose address is the sum of a register and a constant in the instruction
4. _**PC-relative** addressing_: where the branch address is the sum of the PC and a constant in the instruction

==EXAMPLE==
given a branch in register ```x10``` being equal to zero
```beq x10, x0, L1```
replace it by a pair of instructions that offers a much greater branching distance. these instruction replace the short-address conditional branch:
```
bne x10, x0, L2
jal x0, L1
L2:
```

* decoding machine code
what is the ```0x00578833``` corresponding to the machine instruction?
```0b0000 0000 0101 0111 1000 1000 0011 0011```
| funct7 | rs2 | rs1 | funct3 | rd | opcode |
| ------- | ----- | ----- | ------ | ----- | ------- |
| 0000000 | 00101 | 01111 | 000 | 10000 | 0110011 |
hence, this is ```add x16, x16. x5```

## parallelism and synchronization
* Data Race: two memory accesses from a data race if they are from different threads to the same location, at least one is a write, and they occur one after another.
資料為非同步時,資料可能被同時執行的process錯誤覆寫。
* RISC-V provides```lr.w rd, (rs1)``` and ```sc.w rd, (rs1), (rs2)```, called _load-reserve word_ and _store-conditional word_
:::info
if the contents of the memory location specified by the load-reserved are changed before the store-conditional to the same address occurs, then the store-conditional fails and does not write the value to memory.
The store-conditional is defined to both store the value of a (presumably different) register in memory and to change the value of another register to a 0 if it succeeds and to a nonzero value if it fails.
Thus, sc.w specifies three registers: one to hold the address, one to indicate whether the atomic operation failed or succeeded, and one to hold the value to be stored in memory if it succeeded.
Since the load-reserved returns the initial value, and the store-conditional returns 0 only if it succeeds, the following sequence implements an atomic exchange on the memory location specified by the contents of x20:
:::
==EXAMPLE==
```
again: lr.w x10, (x20) // load reserve
sc.w x11, x23, (x20) // store conditional
bne x11, x0, again // branch if store fails (0)
addi x23, x10, 0 // put loaded value in x23
```
## c compiling process

### compiler and assembler (object file)
the compiler transforms the C program into an _assembly language program_
object file is a combination of machine code, instructions, data, and information needed to place instructions properly in memory.
to produce the binary version of each instruction in the assembly language program, the assemblr must determine the addressed corresponding to all lebels.
Assemblers keep track of labels used in branches and data transfer instruvtions an a ==symbol table==.
the object file for UNIX system typically contains six distinct pieces:
* the _header_ describes the size and position of the other pieces of the object file.
* the _text segment_ contains the machine code.
* the _static data segment_ contans data allocated for the life of the program.
* the _relocation information_ identifies instrution and data words that depend on absolute address when the program is loaded into memory.
* the _symbol table_ the remaining labels that are not defined, such as external references.
* the _debugging information_ contains a concise description of how the modules were compiled so that a debugger can associate machine instructions with C source files and make data structure readable.
### linker
which takes all the independently assembled machine language program and stutches them together.
three steps for linker:
1. place code and data modules symbolically in memory
2. determine the address of data and instruction labels
3. patch both the internal and externel references
the linker uses the relocation information and symbol table in each object module to resolve all undefined labels.
such references occur in branch instructions, jump instructions, and data addresses, so the job of this program is much like that of an editor: it finds the old addresses and replaces them with the new addresses.
the reason a linker is useful is that it is much faster to patch code than it is to recompile and reassemble.
the linker produces an executable file that can be run.
typically, executable file has the same format as an object file, except that it contains no unresolved reference.
---
==EXAMPLE==
link the two object file below. the example will show how a linker work.
there're instructions that refer to the addresses of procedures A and B and the instructions that refer to the addresses of data words X and Y.
_*note: we show the instructions in assembly language just to make the example understandable; in reality, the instructions would be numbers._
| ==object file header== | | | |
| -------------------------- | --------------------------- | ----------------------------- | --------------------------- |
| | name | Procedure A | |
| | test size | 100~hex~ | |
| | data size | 20~hex~ | |
| ==test segment== | address | instruction | |
| | 0 | ```lw x10, 0(x3)``` | |
| | 4 | ```jal x1, 0``` | |
| | ... | ... | |
| | ... | ... | |
| ==data segment== | <font color=orange>0</font> | (<font color=orange>X</font>) | |
| | ... | ... | |
| ==relocation information== | address | instruction type | dependency |
| | 0 | ```lw``` | <font color=orange>X</font> |
| | 4 | ```jal``` | <font color=orange>B</font> |
| ==symbol table== | label | address | |
| | <font color=orange>X</font> | - | |
| | <font color=orange>B</font> | - | |
| ==object file header== | | | |
| -------------------------- | --------------------------- | ----------------------------- | --------------------------- |
| | name | Procedure B | |
| | test size | 200~hex~ | |
| | data size | 30~hex~ | |
| ==test segment== | address | instruction | |
| | 0 | ```sw x11, 0(x3)``` | |
| | 4 | ```jal x1, 0``` | |
| | ... | ... | |
| | ... | ... | |
| ==data segment== | <font color=orange>0</font> | (<font color=orange>Y</font>) | |
| | ... | ... | |
| ==relocation information== | address | instruction type | dependency |
| | 0 | ```sw``` | <font color=orange>Y</font> |
| | 4 | ```jal``` | <font color=orange>A</font> |
| ==symbol table== | label | address | |
| | <font color=orange>Y</font> | - | |
| | <font color=orange>A</font> | - | |
AFTER LINK
==executable file header==
| text size | data size |
| --------- | --------- |
| 300~hex~ | 50~hex~ |
==text segment==
| address | instruction |
| ------------------------ | -------------------- |
| 0000 0000 0040 0000~hex~ | ```lw x0 , 0(x3)``` |
| 0000 0000 0040 0004~hex~ | ```jal x1, 252``` |
| ... | ... |
| 0000 0000 0040 0100~hex~ | ```sw x11, 32(x3)``` |
| 0000 0000 0040 0104~hex~ | ```jal x1, -260``` |
| ... | |
==data segment==
| address | |
| ------------------------ | --------------------------- |
| 0000 0000 0000 1000~hex~ | <font color=orange>X</font> |
| ... | ... |
| 0000 0000 0040 1020~hex~ | <font color=orange>Y</font> |
| ... | ... |
1. The jump and link instructions use PC-relative addressing. Thus, for the `jal` at address `40 0004`~hex~ to go to `40 0100`~hex~ (the address of procedure `B`), it must put (`40 0100`~hex~ – `40 0004`~hex~) or `252`~ten~ in its address field. Similarly, since `40 0000`~hex~ is the address of procedure `A`, the `jal` at `40 0104`~hex~ gets the negative number `-260`~ten~ (`40 0000`~hex~ – `40 0104`~hex~) in its address field.
2. The load addresses are harder because they are relative to a base register. This example uses `x3` as the base register, assuming it is initialized to `0000 0000 1000 0000`~hex~. To get the address `0000 0000 1000 0000`~hex~ (the address of word `X`), we place 0ten in the address field of `lw` at address `40 0000`~hex~. Similarly, we place `20`~hex~ in the address field of `sw` at address `40 0100`~hex~ to get the address `0000 0000 1000 0020`~hex~ (the address of doubleword `Y`).
3. Store addresses are handled just like load addresses, except that their S-type instruction format represents immediates differently than loads’ I-type format. We place `32`~ten~ in the address field of `sw `at address `40 0100`~hex~ to get the address `0000 0000 1000 0020`~hex~ (the address of word `Y`).
### loader
The loader follows these steps in UNIX systems:
1. Reads the executable file header to determine size of the text and data segments.
2. Creates an address space large enough for the text and data.
3. Copies the instructions and data from the executable file into memory.
4. Copies the parameters (if any) to the main program onto the stack.
5. Initializes the processor registers and sets the stack pointer to the first free location.
6. Branches to a start-up routine that copies the parameters into the argument registers and calls the main routine of the program. When the main routine returns, the start-up routine terminates the program with an exit system call.
## a c sort example
turn the C code segment
```c
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
void sort(int v[], int n) // v[1, n]
{
int i, j;
for (i = 0; i < n; ++i)
for (j = i - 1; j >= 0 && v[j] > v[j + 1]; --j)
swap(v, j);
}
```
into RISC-V code.
swap: v in `x10`, k in `x11`, temp in `x5`
sort: v in `x10`, n in `x11`, i in`x19`, j in `x20`
```
swap:
slli x6, x11, 2 // reg x6 = k * 4
add x6, x10, x6 // reg x6 = v + (k * 4)
lw x5, 0(x6) // reg x5 (temp) = v[k]
lw x7, 4(x6) // reg x7 = v[k + 1]
sw x7, 0(x6) // v[k] = reg x7
sw x5, 4(x6) // v[k+1] = reg x5 (temp)
jalr x0, 0(x1) // return to calling routine
```
```
sort:
addi sp, sp, -20 // make room on stack for 5 regs
sw x1, 16(sp) // save x1 on stack
sw x22, 12(sp) // save x22 on stack
sw x21, 8(sp) // save x21 on stack
sw x20, 4(sp) // save x20 on stack
sw x19, 0(sp) // save x19 on stack
addi x21, x10, 0 // copy parameter x10 into x21
addi x22, x11, 0 // copy parameter x11 into x22
addi x19, x0, 0 // i = 0
for1tst:
bge x19, x22, exit1 // go to exit1 if i >= n
addi x19, x22, -1 // j = i - 1
for2tst:
blt x20, x0, exit2 // go to exit2 if j < 0
slli x5, x20, 2 // x5 = j * 4
add x5, x21, x5 // x5 = v + (j * 4)
lw x6, 0(x5) // x6 = v[j]
lw x7, 4(x5) // x7 = v[j+1]
ble x6, x7, exit2 // go to exit2 if x6 < x7
addi x10, x21, 0 // first swap parameter is v
addi x11, x20, 0 // second swap parameter is j
jal x1, swap // call swap
addi x20, x20, -1 // j for2tst
jal x0, for2tst // go to for2tst
exit2:
addi x19, x19, 1 // i += 1
jal x0, for1tst // go to for1tst
exit1:
lw x19, 0(sp) // restore x19 from stack
lw x20, 4(sp) // restore x20 from stack
lw x21, 8(sp) // restore x21 from stack
lw x22, 12(sp) // restore x22 from stack
lw x1, 16(sp) // restore return address from stack
addi sp, sp, 20 // restore stack pointer
jalr x0, 0(x1) // return to calling routine
```

## array vs pointer

## fallacies and pitfalls
* fallacies:
1. powerful instrucion leads higher performance
* fewer instructions required
* but complex instructions are hard to implement
* compilers are good at making fast code from simple instruction
2. use assembly code for high performance
* but modern compilers are better at dealing with modern processors
* more lines of code may cause more errors and less productivity
* pitfalls:
1. sequential **words** are not at sequential addresses
* increment by **4**, not 1
2. keeping a pointer to an automatic variable after procedure returns
