Assignment1 RV32I Simulator

# Assignment1 RV32I Simulator ###### tags: `Computer Architecture` **Table of Contents** [TOC] About convolution --- In mathematics, convolution is a mathematical operation on two functions (f and g) that produces a third function expressing how the shape of one is modified by the other. It is also an important concept in the field of discrete-time signal processing. An example of LTI system is that an output signal y[n] can be calculated as a input signal x[n] convolve with an impulse response of the system h[n] : ![](https://i.imgur.com/A9Ws7SO.png) And it can also be abbreviated as follows : ![](https://i.imgur.com/unTS9e3.png) In this assignment, I'm going to implement a function of convolution sum which can convolve two finite-length integer sequences. To simplify the function, I restrict two integer sequences to be equal length. ![link text](https://i.imgur.com/pBNQ07R.png) Where A[], B[] : input integer array C[] : output integer array N+1 : length of A[], B[], C[] C code implementation --- There are two loops and one conditional branch(if-else) in this implementation. ```clike= #include<stdio.h> int main() { int A[5] = {10,12,31,17,25}; int B[5] = {3,8,78,91,71}; int N = 4; int C[5]; for(int i=0; i<N+1; i++) { for(int j=0; j<i+1; j++){ if(!j){ C[i] = A[j]*B[i-j]; }else{ C[i] = C[i]+(A[j]*B[i-j]); } } printf("C[%d] = %d\n", i, C[i]); } return 0; } ``` Assembly code implementation --- I convert C code to assembly code manually. The assembly code contains four functional parts. **1. Store arrays to memory** Initialize arrays and variable. **2. Loops** Indices i and j decide which values in arrays are going to participate the calculation. **3. else and then** j is 0 means that the multiply operation only has one term, so execute "C[i] = A[0]*B[i]". Otherwise, execute "C[i] = C[i]+A[0]*B[i]" to accumlate multiply operation to the result. **4. printResult** Environment calls are defined by Ripes simulator. Put the correspond value into a0 and a7 registers to activate particular system calls you want. ```clike= .data str1: .string "C[" str2: .string "]=" str3: .string "\n" # s0:address of A[] # s1:address of B[] # s2:address of C[] # t0:i # t1:j # t2:terminate value of loopi # t3:terminate value of loopj .text main: add s0, zero, zero # initialize the base address of A[] addi s1, s0, 20 # set the base address of B[] addi s2, s1, 20 # set the base address of C[] li t0, 10 # A[] = {10,12,31,17,25} sw t0, 0(s0) li t0, 12 sw t0, 4(s0) li t0, 31 sw t0, 8(s0) li t0, 17 sw t0, 12(s0) li t0, 25 sw t0, 16(s0) li t0, 3 # B[] = {3,8,78,91,71} sw t0, 0(s1) li t0, 8 sw t0, 4(s1) li t0, 78 sw t0, 8(s1) li t0, 91 sw t0, 12(s1) li t0, 71 sw t0, 16(s1) addi t0, zero, 0 # i=0 li t2, 4 addi t2, t2, 1 # t2=N+1 loopi: bge t0, t2, exit # if i>=N+1, then exit loop i addi t1, zero, 0 # j=0 loopj: addi t3, t0, 1 # t3=i+1 bge t1, t3, endi # if j>=i+1, then exit loop j beq t1, zero, then # if j==0,then C[i] = A[j]*B[i-j] else: slli t4, t1, 2 # t4=j*4 slli t5, t0, 2 # t5=i*4 sub t6, t5, t4 # t6=(i-j)*4 add s3, s2, t5 # s3=address of C[i] add s4, s1, t6 # s4=address of B[i-j] add s5, s0, t4 # s5=address of A[j] lw s6, 0(s5) # s6=A[i] lw s7, 0(s4) # s7=B[i-j] lw s9, 0(s3) # s9=C[i] mul t4, s6, s7 # t4=A[i]*B[i-j] add s9, s9, t4 # t5=C[i]+A[i]*B[i-j] sw s9, 0(s3) # C[i]=C[i]+A[i]*B[i-j] j endj # end of j loop then: slli t4, t1, 2 # t4=j*4 slli t5, t0, 2 # t5=i*4 sub t6, t5, t4 # t6=(i-j)*4 add s3, s2, t5 # s3=address of C[i] add s4, s1, t6 # s4=address of B[i-j] add s5, s0, t4 # s5=address of A[j] lw s6, 0(s5) # s6=A[i] lw s7, 0(s4) # s7=B[i-j] mul t4, s6, s7 # t4=A[i]*B[i-j] sw t4, 0(s3) # C[i]=A[i]*B[i-j] endj: addi t1, t1, 1 # j = j+1 j loopj endi: la a0, str1 # print string 'C[' li a7, 4 ecall mv a0, t0 # print integer i li a7, 1 ecall la a0, str2 # print string ']=' li a7, 4 ecall lw s8, 0(s3) # print integer C[i] mv a0, s8 li a7, 1 ecall la a0, str3 # print string '\n' li a7, 4 ecall addi t0, t0, 1 # i = i+1 j loopi exit: li a7, 10 # Halts the simulator ecall ``` Results --- Using Visual studio to execute the C code, the convolution sum of arrays A[] = {10,12,31,17,25} and B[] = {3,8,78,91,71} is array C[]. Each element in C[] is as follows: ![](https://i.imgur.com/Fe3Eeg3.png) Result of executing assembly code on Ripes is same as C code!:smile: ![](https://i.imgur.com/XyFqQ1A.png) How RSIC-V assembly code works on Ripes --- - **Store elements of array to memory** For example, the first integer to store to memory is 10. So the instruction "sw x5 0(s0)" store value 10 to the A[0] (register s0 stores the base address of A[]). ![](https://i.imgur.com/GXMvLzP.png) *** When instruction "sw x5 0(s0)" is at memory access stage, memory write will be enable by setting signal line "Wr en" to 1 (as the green dot in data memory below). ![](https://i.imgur.com/TeU7FWL.png) *** And the value of the first block of memory is 0x0000000a, which is 10 in decimal representation. ![](https://i.imgur.com/4OGthxC.png) *** After setup the A[] and B[] in memory, the memory looks like picture below, which A[] = {10,12,31,17,25} and B[] = {3,8,78,91,71} ![](https://i.imgur.com/w7IccHN.png) *** - **Loops** In the first i loop, when instruction "bge x5, x7, exit" is at the execution(EX) stage, the decision of whether to branch or not is made by ALU. ![](https://i.imgur.com/vDEFMRi.png) *** And the Ripes simulator automatically flushs the previous two stages to avoid branch harzard. ![](https://i.imgur.com/ImuYbcD.png) *** In next cycle, since value of register x5 is smaller than x7, branch will not be taken in this case, so the next instruction "bge x6, x28, endi" will come into pipeline. ![](https://i.imgur.com/MjTyf9G.png) ![](https://i.imgur.com/mIhFuOD.png) *** - **else and then** When the instruction "beq x6, x0, then" is at the execution stage(EX), the if-else branch decision in the loop j is made. ![](https://i.imgur.com/OYbSmOa.png) *** In this case, register x6 is equal to zero. Therefore, The program counter(PC) will be set to the address of the label "then" at the same time(as the green dot in the PC block). ![](https://i.imgur.com/NHCdy5p.png) *** In the next cycle, the next insruction will be fetched is "slli x29, x6, 2". That means when "if" condition is true, "then" condition is going to execute "slli x29, x6, 2" and its following instructions. ![](https://i.imgur.com/2umgV39.png) *** In the previous cycle, notice that there were two instructions already at the instrcution fetch(IF) and instruction decode(ID) stage respectively. They are "slli x29, x6, 2" and "slli x30, x5, 2". Now they must be flushed since the branch is taken. ![](https://i.imgur.com/blRC3mD.png) *** - **printResult** The instruction "ecall" will evoke OS to do the I/O stuff. In Ripes, you put the corresponding value into register a0 and a7 before "ecall" can help. In my code, I tried to print out integers and strings to the console. ![](https://i.imgur.com/OBjROWt.png) *** x86-64 vs RISC-V calling convention --- Here is an example reference to CS:App 3th edition: ```clike= //This is an example from CS:App 3th edition Fig 3.29 #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { call_proc(); return 0; } long call_proc(){ long x1 = 1; int x2 = 2; short x3 = 3; char x4 = 4; proc(x1, &x1, x2, &x2, x3, &x3, x4, &x4); return (x1+x2)*(x3-x4); } void proc(long a1, long *a1p, int a2, int *a2p, short a3, short *a3p, char a4, char *a4p) { *a1p += a1; *a2p += a2; *a3p += a3; *a4p += a4; } ``` ### x86-64 calling convention We could generate an executable program as follows: ```clike= $ gcc -Og -o prog proc_call.c ``` We can disassemble the file prog: ```clike= $ objdump -d prog ``` The disassembler will extract various code sequences, including the following: ```clike= Disassembly of section .text: 0000000000001169 <proc>: 1169: f3 0f 1e fa endbr64 116d: 48 8b 44 24 10 mov 0x10(%rsp),%rax ## Load &x4 from stack to register %rax 1172: 48 01 3e add %rdi,(%rsi) ## Perform *x1p += x1; 1175: 01 11 add %edx,(%rcx) ## Perform *x2p += x2; 1177: 66 45 01 01 add %r8w,(%r9) ## Perform *x3p += x3; 117b: 8b 54 24 08 mov 0x8(%rsp),%edx ## Load x4 from stack to register %edx 117f: 00 10 add %dl,(%rax) ## Perform *x4p += x4; 1181: c3 retq 0000000000001182 <call_proc>: 1182: f3 0f 1e fa endbr64 1186: 53 push %rbx 1187: 48 83 ec 20 sub $0x20,%rsp ## Allocate 20 bytes stack frame 118b: bb 28 00 00 00 mov $0x28,%ebx 1190: 64 48 8b 03 mov %fs:(%rbx),%rax ## Rerieve canary 1194: 48 89 44 24 18 mov %rax,0x18(%rsp) ## Store on stack 1199: 31 c0 xor %eax,%eax ## Zero out register 119b: 48 c7 44 24 10 01 00 movq $0x1,0x10(%rsp) ## Save long x1 to stack for getting &x1 11a2: 00 00 11a4: c7 44 24 0c 02 00 00 movl $0x2,0xc(%rsp) ## Save int x2 to stack for getting &x2 11ab: 00 11ac: 66 c7 44 24 0a 03 00 movw $0x3,0xa(%rsp) ## Save short x3 to stack for getting &x3 11b3: c6 44 24 09 04 movb $0x4,0x9(%rsp) ## Save char x4 to stack for getting &x4 11b8: 48 8d 4c 24 0c lea 0xc(%rsp),%rcx ## Let register %rcx hold address &x2 11bd: 48 8d 74 24 10 lea 0x10(%rsp),%rsi ## Let register %rsi hold address &x1 11c2: 48 8d 44 24 09 lea 0x9(%rsp),%rax ## Let register %rax hold address &x4 11c7: 50 push %rax ## Spilling register argument &x4 need to be store on stack 11c8: 6a 04 pushq $0x4 ## Spilling register argument x4 need to be store on stack 11ca: 4c 8d 4c 24 1a lea 0x1a(%rsp),%r9 ## Let register %r9 hold address &x4 11cf: 41 b8 03 00 00 00 mov $0x3,%r8d ## Pass 3 as argument 11d5: ba 02 00 00 00 mov $0x2,%edx ## Pass 2 as argument 11da: bf 01 00 00 00 mov $0x1,%edi ## Pass 1 as argument 11df: e8 85 ff ff ff callq 1169 <proc> ## Call procedure proc() 11e4: 48 63 4c 24 1c movslq 0x1c(%rsp),%rcx 11e9: 48 03 4c 24 20 add 0x20(%rsp),%rcx 11ee: 0f bf 54 24 1a movswl 0x1a(%rsp),%edx 11f3: 0f be 44 24 19 movsbl 0x19(%rsp),%eax 11f8: 29 c2 sub %eax,%edx 11fa: 48 63 c2 movslq %edx,%rax 11fd: 48 0f af c1 imul %rcx,%rax 1201: 48 83 c4 10 add $0x10,%rsp 1205: 48 8b 7c 24 18 mov 0x18(%rsp),%rdi 120a: 64 48 33 3b xor %fs:(%rbx),%rdi 120e: 75 06 jne 1216 <call_proc+0x94> 1210: 48 83 c4 20 add $0x20,%rsp 1214: 5b pop %rbx 1215: c3 retq 1216: e8 45 fe ff ff callq 1060 <__stack_chk_fail@plt> 000000000000121b <main>: 121b: f3 0f 1e fa endbr64 121f: 48 83 ec 08 sub $0x8,%rsp 1223: b8 00 00 00 00 mov $0x0,%eax 1228: e8 55 ff ff ff callq 1182 <call_proc> 122d: 48 89 c2 mov %rax,%rdx 1230: 48 8d 35 cd 0d 00 00 lea 0xdcd(%rip),%rsi # 2004 <_IO_stdin_used+0x4> 1237: bf 01 00 00 00 mov $0x1,%edi 123c: b8 00 00 00 00 mov $0x0,%eax 1241: e8 2a fe ff ff callq 1070 <__printf_chk@plt> 1246: b8 00 00 00 00 mov $0x0,%eax 124b: 48 83 c4 08 add $0x8,%rsp 124f: c3 retq ``` ### RISC-V calling convention Since we are not using standard library for this case, the program should be modified as follow: ```clike= // Modified to fit RISC-V compiler // Some operators are changed for the purpose of convenience long call_proc(); void proc( long, long *, int, int*, short, short*, char, char *); void _start() { volatile char* tx = (volatile char*) 0x40002000; const char *str1 = "The result is:"; long *p, result = call_proc(); //printf("The result is: %ld\n", result); p = &result; *tx = *p + '0'; } long call_proc(){ long x1 = 1; int x2 = 2; short x3 = 3; char x4 = 4; proc(x1, &x1, x2, &x2, x3, &x3, x4, &x4); return (x1+x2)+(x3-x4); } void proc(long a1, long *a1p, int a2, int *a2p, short a3, short *a3p, char a4, char *a4p) { *a1p += a1; *a2p += a2; *a3p += a3; *a4p += a4; } ``` We could generate an executable program as follows: ``` $ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -Og -nostdlib proc_call.c -o proc_call ``` We can disassemble the file prog: ``` $ riscv-none-embed-objdump -d proc_call ``` The disassembler will extract various code sequences, including the following: ```clike= 00010074 <proc>: 10074: 0005a303 lw t1,0(a1) 10078: 00a30333 add t1,t1,a0 1007c: 0065a023 sw t1,0(a1) ## *a1p += a1; 10080: 0006a583 lw a1,0(a3) 10084: 00c585b3 add a1,a1,a2 10088: 00b6a023 sw a1,0(a3) ## *a2p += a2; 1008c: 0007d683 lhu a3,0(a5) 10090: 00d70733 add a4,a4,a3 10094: 00e79023 sh a4,0(a5) ## *a3p += a3; 10098: 0008c783 lbu a5,0(a7) 1009c: 00f80833 add a6,a6,a5 100a0: 01088023 sb a6,0(a7) ## *a4p += a4; 100a4: 00008067 ret 000100a8 <call_proc>: 100a8: fe010113 addi sp,sp,-32 ## Allocate 32 byte stack frame 100ac: 00112e23 sw ra,28(sp) ## Store return address 100b0: 00100793 li a5,1 100b4: 00f12623 sw a5,12(sp) ## Immediates must be put on stack first 100b8: 00200793 li a5,2 100bc: 00f12423 sw a5,8(sp) 100c0: 00300793 li a5,3 100c4: 00f11323 sh a5,6(sp) 100c8: 00400793 li a5,4 100cc: 00f102a3 sb a5,5(sp) 100d0: 00510893 addi a7,sp,5 ## Load the address of immediates &x1, &x2, &x3, &x4 to registers a7, a5, a3, a1 respectively 100d4: 00400813 li a6,4 ## Load the immediate to registers a0, a2, a4, a6 100d8: 00610793 addi a5,sp,6 100dc: 00300713 li a4,3 100e0: 00810693 addi a3,sp,8 100e4: 00200613 li a2,2 100e8: 00c10593 addi a1,sp,12 100ec: 00100513 li a0,1 100f0: f85ff0ef jal ra,10074 <proc> ## Call proc() 100f4: 00c12503 lw a0,12(sp) 100f8: 00812783 lw a5,8(sp) 100fc: 00f50533 add a0,a0,a5 10100: 00611783 lh a5,6(sp) 10104: 00514703 lbu a4,5(sp) 10108: 40e787b3 sub a5,a5,a4 1010c: 00f50533 add a0,a0,a5 10110: 01c12083 lw ra,28(sp) 10114: 02010113 addi sp,sp,32 10118: 00008067 ret 0001011c <_start>: 1011c: ff010113 addi sp,sp,-16 10120: 00112623 sw ra,12(sp) 10124: f85ff0ef jal ra,100a8 <call_proc> 10128: 03050513 addi a0,a0,48 1012c: 0ff57513 zext.b a0,a0 10130: 400027b7 lui a5,0x40002 10134: 00a78023 sb a0,0(a5) # 40002000 <__global_pointer$+0x3fff06b0> 10138: 00c12083 lw ra,12(sp) 1013c: 01010113 addi sp,sp,16 10140: 00008067 ret ``` In terms of RISC-V assembly, we can see that there aren't any spilling register in this example. The arguments can be put in a0-a7 and then call procedure `proc()`. But the number of lines in RISC-V assembly is more than that of x86-64. One of the reason is that the arithemtic and logical opertations in RISC-V only use register operands but not memory operands.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.