--- tags: Computer Architecture 2022 --- # Assignment2: RISC-V Toolchain ## Pick a Question The following question is picked from the Assignment 1 > 王彥翔 Convert Binary Number in a Linked List to Integer(LeetCode 1290) > Given head which is a reference node to a singly-linked list. The value of each node in the linked list is either 0 or 1. The linked list holds the binary representation of a number. > Return the decimal value of the number in the linked list. > The most significant bit is at the head of the linked list. [Source code](https://hackmd.io/@JHfVFxIbS6umD24Uy4CqvA/BkKU8MFzo) The original implementation of the question is as follows. ```clike= /** * Definition for singly-linked list. * struct ListNode { * int val; * struct ListNode *next; * }; */ int getDecimalValue(struct ListNode* head){ int ans = 0; while(head){ ans <<= 1; ans |= head->val; head = head->next; } return ans; } ``` The reason I choose this question is that I am curious about how to solve a question using linked list by Risc V assembly. And also contrast the handwritten and compiler-optimized assembly. Handwritten assembly is as follow. ```clike= .data list: .word 1, 0, 1 list_length: .word 3 .text main: jal ra, create_list mv t1, s1 # store list head jal ra, print_list mv t1, s1 jal ra getDecimalValue exit: addi a7, x0 10 # exit ecall create_list: addi sp, sp, 8 li s2, 0 # i = 0 la t1, list # list head address la t0, list_length # to store list length lw t0, 0(t0) bne s2, t0, create_node jr ra create_node: li s0, 0x0fff0000 # address of list head sw ra, 0(sp) # store return address sw s0, 4(sp) # store address of list head jal ra malloc lw s1, 0(t1) # load word list[i] addi t1, t1, 4 # next element sw s1, 0(s0) # node->value = list[i] sw zero, 4(s0) # node->next = NULL mv t2, s0 # prev = node addi s2, s2, 1 # i++ addi s0, s0, 8 # one node for 8 byte beq s2, t0, ret # chekc end condition in for loop loop: jal ra, malloc # get memory for next node lw s1, 0(t1) # load word list[i] addi t1, t1, 4 # next element sw s1, 0(s0) # node-j>value = array[i] sw zero, 4(s0) # node->next = NULL sw s0, 4(t2) # prev->next = node mv t2, s0 addi s2, s2, 1 addi s0, s0, 8 bne s2, t0, loop ret: lw ra, 0(sp) lw s1, 4(sp) addi sp, sp, 8 jr ra malloc: addi a0, s0, 8 li a7 214 ecall jr ra print_list: beq t1, x0, end lw a0, 0(t1) addi a7, x0, 1 ecall lw t1, 4(t1) j print_list end: addi a0, x0, 10 # '\n' ascii addi a7, x0, 11 # print char ecall jr ra getDecimalValue: addi sp, sp, -4 sw ra, 0(sp) # store return address addi a0, x0, 0 # ans = 0 jal ra while # traverse linked list node lw ra, 0(sp) # load return address addi sp, sp, 4 jr ra while: beq t1, x0, end_while # if head is NULL, branch end while lw t0, 0(t1) # t0 is node->value lw t1, 4(t1) # load next node address slli a0, a0, 1 # shift left 1 bit or a0, a0, t0 # ans = ans + node->value j while end_while: li a7, 1 # print ans ecall jr ra # return address ``` ## Compiling using RISC-V gcc To achieve this, I rewrite c code by adding main function as follow. ```clike= #include<stdio.h> #include<stdlib.h> struct ListNode { int val; struct ListNode *next; }; int getDecimalValue(struct ListNode* head){ int ans = 0; while(head){ ans <<= 1; ans |= head->val; head = head->next; } return ans; } int main(){ struct ListNode *head = (struct ListNode *)malloc(sizeof(struct ListNode)); int ans; head->next = NULL; head->val = 1; struct ListNode *current = head; for(int i = 0 ; i < 2 ; i++){ current->next = (struct ListNode *)malloc(sizeof(struct ListNode)); current = current->next; current->val = i & 1; current->next = NULL; } ans = getDecimalValue(head); printf("ans is %d\n", ans); free(head); free(current); return 0; } ``` Then, compile with the riscv-gcc, instruction as follow. ``` $ riscv-none-elf-gcc leetcode.c ``` It is executable using rv32emu. ``` $ ~/rv32emu/build/rv32emu a.out ``` ![](https://i.imgur.com/61vAY2T.png) Display the ELF file header by the following instruction. ``` $ riscv-none-elf-readelf -h a.out ``` ![](https://i.imgur.com/ABqwxAM.png) List the section size. ``` $ riscv-none-elf-size a.out ``` ![](https://i.imgur.com/iQlfurX.png) Meanwhile, I found out that this simple c code that doesn't exceed 50 lines has a massive size of RISC-V assembly. Disassemble using following instruction. ``` $ riscv-none-elf-objdump -d a.out ``` I am curious about this situation, and discuss with classmate. The assembly code is to much to display on console, so I write it in an text file. The instruction is as follow. ``` $ riscv-none-elf-objdump -d a.out > leetcode.txt ``` Here is the main function in assembly code. ```clike= 00010188 <main>: 10188: 1101 addi sp,sp,-32 1018a: ce06 sw ra,28(sp) 1018c: cc22 sw s0,24(sp) 1018e: 1000 addi s0,sp,32 10190: 4521 li a0,8 10192: 2291 jal 102d6 <malloc> 10194: 87aa mv a5,a0 10196: fef42223 sw a5,-28(s0) 1019a: fe442783 lw a5,-28(s0) 1019e: 0007a223 sw zero,4(a5) 101a2: fe442783 lw a5,-28(s0) 101a6: 4705 li a4,1 101a8: c398 sw a4,0(a5) 101aa: fe442783 lw a5,-28(s0) 101ae: fef42623 sw a5,-20(s0) 101b2: fe042423 sw zero,-24(s0) 101b6: a82d j 101f0 <main+0x68> 101b8: 4521 li a0,8 101ba: 2a31 jal 102d6 <malloc> 101bc: 87aa mv a5,a0 101be: 873e mv a4,a5 101c0: fec42783 lw a5,-20(s0) 101c4: c3d8 sw a4,4(a5) 101c6: fec42783 lw a5,-20(s0) 101ca: 43dc lw a5,4(a5) 101cc: fef42623 sw a5,-20(s0) 101d0: fe842783 lw a5,-24(s0) 101d4: 0017f713 andi a4,a5,1 101d8: fec42783 lw a5,-20(s0) 101dc: c398 sw a4,0(a5) 101de: fec42783 lw a5,-20(s0) 101e2: 0007a223 sw zero,4(a5) 101e6: fe842783 lw a5,-24(s0) 101ea: 0785 addi a5,a5,1 101ec: fef42423 sw a5,-24(s0) 101f0: fe842703 lw a4,-24(s0) 101f4: 4789 li a5,2 101f6: fce7d1e3 bge a5,a4,101b8 <main+0x30> 101fa: fe442503 lw a0,-28(s0) 101fe: 3791 jal 10142 <getDecimalValue> 10200: fea42023 sw a0,-32(s0) 10204: fe042583 lw a1,-32(s0) 10208: 67f1 lui a5,0x1c 1020a: 91078513 addi a0,a5,-1776 # 1b910 <__clzsi2+0x70> 1020e: 2f05 jal 1093e <printf> 10210: fe442503 lw a0,-28(s0) 10214: 20e9 jal 102de <free> 10216: fec42503 lw a0,-20(s0) 1021a: 20d1 jal 102de <free> 1021c: 4781 li a5,0 1021e: 853e mv a0,a5 10220: 40f2 lw ra,28(sp) 10222: 4462 lw s0,24(sp) 10224: 6105 addi sp,sp,32 10226: 8082 ret ``` The reason for all that 50,000 lines of assembly code is the following include. ``` include<stdio.h> include<stdlib.h> ``` It tells the linker to link these two library to our code, and that why the assembly code exceed 50,000 lines. ## Analysis Assembly Here I find something really interesting about the assembly above. Line 36 and 37 show as follow. ``` 101ec: fef42423 sw a5,-24(s0) 101f0: fe842703 lw a4,-24(s0) ``` The value in register a5 do not assign to a4 immediately. Instead, a5 first store its value to memory and assign a4 the value from memory. In my opinion, there are two possibility for this. First, the code is not compiled in optimize way. Second, compiler may want to avoid interrupt issue. So I compile with optimize first. ### -O1 We can see optimized code use fewer lines. ![](https://i.imgur.com/WiOPqCa.png) After optimizing, the situation above does not exist anymore. ```clike= 00010156 <main>: 10156: 1101 addi sp,sp,-32 10158: ce06 sw ra,28(sp) 1015a: cc22 sw s0,24(sp) 1015c: ca26 sw s1,20(sp) 1015e: c84a sw s2,16(sp) 10160: c64e sw s3,12(sp) 10162: c452 sw s4,8(sp) 10164: 4521 li a0,8 10166: 2211 jal 1026a <malloc> 10168: 892a mv s2,a0 1016a: 00052223 sw zero,4(a0) 1016e: 4785 li a5,1 10170: c11c sw a5,0(a0) 10172: 842a mv s0,a0 10174: 4481 li s1,0 10176: 4a0d li s4,3 10178: 89a2 mv s3,s0 1017a: 4521 li a0,8 1017c: 20fd jal 1026a <malloc> 1017e: 842a mv s0,a0 10180: 00a9a223 sw a0,4(s3) 10184: 0014f793 andi a5,s1,1 10188: c11c sw a5,0(a0) 1018a: 00052223 sw zero,4(a0) 1018e: 0485 addi s1,s1,1 10190: ff4494e3 bne s1,s4,10178 <main+0x22> 10194: 854a mv a0,s2 10196: 376d jal 10140 <getDecimalValue> 10198: 85aa mv a1,a0 1019a: 6571 lui a0,0x1c 1019c: 8a850513 addi a0,a0,-1880 # 1b8a8 <__clzsi2+0x72> 101a0: 2f0d jal 108d2 <printf> 101a2: 854a mv a0,s2 101a4: 20f9 jal 10272 <free> 101a6: 8522 mv a0,s0 101a8: 20e9 jal 10272 <free> 101aa: 4501 li a0,0 101ac: 40f2 lw ra,28(sp) 101ae: 4462 lw s0,24(sp) 101b0: 44d2 lw s1,20(sp) 101b2: 4942 lw s2,16(sp) 101b4: 49b2 lw s3,12(sp) 101b6: 4a22 lw s4,8(sp) 101b8: 6105 addi sp,sp,32 101ba: 8082 ret ``` ## Rewrite C code ![](https://i.imgur.com/iN8LGZ5.png) I rewrite my C code by using pointer to pointer to create linked list. ```clike= #include <stdlib.h> #include <stdio.h> const int array[3] = {1, 0, 1}; struct ListNode { int val; struct ListNode *next; }; void create_list(struct ListNode **cur){ for(int i = 0 ; i < 3 ; i++){ struct ListNode *new_node = (struct ListNode *)malloc(sizeof(struct ListNode)); new_node->val = array[i]; new_node->next = NULL; *cur = new_node; cur = &((*cur)->next); } } int getDecimalValue(struct ListNode* head){ int ans = 0; while(head){ ans <<= 1; ans |= head->val; head = head->next; } return ans; } int main(){ struct ListNode *head = NULL; int ans; create_list(&head); ans = getDecimalValue(head); if(ans == 5){ printf("correct!\n"); }else{ printf("wrong\n"); } return 0; } ``` Compile through riscv toolchain ``` $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 code.c ``` Then dump out the header of ELF and check the size of my code again. ``` $ riscv-none-elf-readelf -h a.out ``` ![](https://i.imgur.com/6TS1hya.png) ``` $ riscv-none-elf-size a.out ``` ![](https://i.imgur.com/JBLbZtG.png) I was shocked by what I saw. A nice and clean C code really matters the size of the ELF file. Here I show the assembly code by disassemble. ```clike= 00010184 <create_list>: 10184: fd010113 addi sp,sp,-48 10188: 02112623 sw ra,44(sp) 1018c: 02812423 sw s0,40(sp) 10190: 03010413 addi s0,sp,48 10194: fca42e23 sw a0,-36(s0) 10198: fe042623 sw zero,-20(s0) 1019c: 0640006f j 10200 <create_list+0x7c> 101a0: 00800513 li a0,8 101a4: 254000ef jal ra,103f8 <malloc> 101a8: 00050793 mv a5,a0 101ac: fef42423 sw a5,-24(s0) 101b0: 000147b7 lui a5,0x14 101b4: a7478713 addi a4,a5,-1420 # 13a74 <array> 101b8: fec42783 lw a5,-20(s0) 101bc: 00279793 slli a5,a5,0x2 101c0: 00f707b3 add a5,a4,a5 101c4: 0007a703 lw a4,0(a5) 101c8: fe842783 lw a5,-24(s0) 101cc: 00e7a023 sw a4,0(a5) 101d0: fe842783 lw a5,-24(s0) 101d4: 0007a223 sw zero,4(a5) 101d8: fdc42783 lw a5,-36(s0) 101dc: fe842703 lw a4,-24(s0) 101e0: 00e7a023 sw a4,0(a5) 101e4: fdc42783 lw a5,-36(s0) 101e8: 0007a783 lw a5,0(a5) 101ec: 00478793 addi a5,a5,4 101f0: fcf42e23 sw a5,-36(s0) 101f4: fec42783 lw a5,-20(s0) 101f8: 00178793 addi a5,a5,1 101fc: fef42623 sw a5,-20(s0) 10200: fec42703 lw a4,-20(s0) 10204: 00200793 li a5,2 10208: f8e7dce3 bge a5,a4,101a0 <create_list+0x1c> 1020c: 00000013 nop 10210: 00000013 nop 10214: 02c12083 lw ra,44(sp) 10218: 02812403 lw s0,40(sp) 1021c: 03010113 addi sp,sp,48 10220: 00008067 ret 00010224 <getDecimalValue>: 10224: fd010113 addi sp,sp,-48 10228: 02812623 sw s0,44(sp) 1022c: 03010413 addi s0,sp,48 10230: fca42e23 sw a0,-36(s0) 10234: fe042623 sw zero,-20(s0) 10238: 0300006f j 10268 <getDecimalValue+0x44> 1023c: fec42783 lw a5,-20(s0) 10240: 00179793 slli a5,a5,0x1 10244: fef42623 sw a5,-20(s0) 10248: fdc42783 lw a5,-36(s0) 1024c: 0007a783 lw a5,0(a5) 10250: fec42703 lw a4,-20(s0) 10254: 00f767b3 or a5,a4,a5 10258: fef42623 sw a5,-20(s0) 1025c: fdc42783 lw a5,-36(s0) 10260: 0047a783 lw a5,4(a5) 10264: fcf42e23 sw a5,-36(s0) 10268: fdc42783 lw a5,-36(s0) 1026c: fc0798e3 bnez a5,1023c <getDecimalValue+0x18> 10270: fec42783 lw a5,-20(s0) 10274: 00078513 mv a0,a5 10278: 02c12403 lw s0,44(sp) 1027c: 03010113 addi sp,sp,48 10280: 00008067 ret 00010284 <main>: 10284: fe010113 addi sp,sp,-32 10288: 00112e23 sw ra,28(sp) 1028c: 00812c23 sw s0,24(sp) 10290: 02010413 addi s0,sp,32 10294: fe042423 sw zero,-24(s0) 10298: fe840793 addi a5,s0,-24 1029c: 00078513 mv a0,a5 102a0: ee5ff0ef jal ra,10184 <create_list> 102a4: fe842783 lw a5,-24(s0) 102a8: 00078513 mv a0,a5 102ac: f79ff0ef jal ra,10224 <getDecimalValue> 102b0: fea42623 sw a0,-20(s0) 102b4: fec42703 lw a4,-20(s0) 102b8: 00500793 li a5,5 102bc: 00f71a63 bne a4,a5,102d0 <main+0x4c> 102c0: 000147b7 lui a5,0x14 102c4: a8078513 addi a0,a5,-1408 # 13a80 <array+0xc> 102c8: 2e1000ef jal ra,10da8 <puts> 102cc: 0100006f j 102dc <main+0x58> 102d0: 000147b7 lui a5,0x14 102d4: a8c78513 addi a0,a5,-1396 # 13a8c <array+0x18> 102d8: 2d1000ef jal ra,10da8 <puts> 102dc: 00000793 li a5,0 102e0: 00078513 mv a0,a5 102e4: 01c12083 lw ra,28(sp) 102e8: 01812403 lw s0,24(sp) 102ec: 02010113 addi sp,sp,32 102f0: 00008067 ret ``` ## Make ELF When I try to make an ELF from the handwritten assembly, an error occured. Error message in several lines show illegal operands. ![](https://i.imgur.com/GAH59UQ.png) I figure out that the operands need seperated by comma symbol, then I modified the assembly. At last, it works. We get an executable file call 'leetcode.elf'. ![](https://i.imgur.com/96kUUDj.png) But, it doesn't execute properly. Showing the error as follow. ![](https://i.imgur.com/j0LAAYO.png) The code return -1 that means there's maybe a bug in handwritten assembly code. The handwritten assembly can run correctly on 'Ripes'. After dicussing with classmate, we have a conclusion that 'Ripes' won't detect overflow. The code that may cause overflow issue in line 20 is as follow. ``` addi sp, sp, 8 ``` So I decide to rewrite the assembly code as well. Here's my assembly code. ```clike= .global _start .set SYSEXIT, 93 .set SYSWRITE, 64 .data array: .word 1 .word 0 .word 1 answer: .word 4 correct: .ascii "correct!\n" .set correct_size, .-correct wrong: .ascii "wrong answer\n" .set wrong_size, .-wrong .text _start: main: addi sp, sp, -4 # head pointer sw x0, 0(sp) # head = NULL la s0, array # s0 = array[1,0,1] la s1, answer # s1 = answer address lw s1, 0(s1) # s1 = 5 add a0, sp, x0 # head pointer's address jal ra, createList lw a0, 0(sp) jal ra, getDecimalValue bne a0, s1, wrongAnswer li a7, SYSWRITE li a0, 1 la a1, correct li a2, correct_size ecall exit: li a7, SYSEXIT li a0, 0 li a1, 0 li a2, 0 addi sp, sp, 4 ecall wrongAnswer: li a7, SYSWRITE li a0, 1 la a1, wrong li a2, wrong_size ecall j exit createList: addi sp, sp, -12 # store ra, s0, s1 (callee save) sw ra, 0(sp) sw s0, 4(sp) sw s1, 8(sp) mv s0, a0 # s0 = head pointer address li a0, 8 # allocate 8 byte for structure ListNode li a7, 214 # system call malloc ecall sw a0, 0(s0) # head pointer points to first node li t0, 1 # value = 1 sw t0, 0(a0) mv t1, a0 # t1 = first node address li a0, 8 ecall # malloc second node sw a0, 4(t1) # first node points to second node li t0, 0 # value = 0 sw t0, 0(a0) mv t1, a0 # t1 = second node address li a0, 8 ecall sw a0, 4(t1) # second node points to third node li t0, 1 # value = 1 sw t0, 0(a0) li t0, 1 # value = 1 sw t0, 0(a0) sw x0, 4(a0) # third node's next pointer point to NULL lw ra, 0(sp) # restore register value in stack lw s0, 4(sp) lw s1, 8(sp) addi sp, sp, 12 jr ra getDecimalValue: addi sp, sp, -8 # store ra, s0(for a new pointer) sw ra, 0(sp) sw s0, 4(sp) mv s0, a0 li a0, 0 # a0 = answer loop: beq s0, x0, exitLoop slli a0, a0, 1 # answer shift left 1 lw t0, 0(s0) # t0 = current node value or a0, a0, t0 # answer = answer | current node value lw s0, 4(s0) # head = head->next j loop exitLoop: lw ra, 0(sp) lw s0, 4(sp) addi sp, sp, 8 jr ra ``` I found out that it is difficult to print an integer from RV32, because of the system call in newlib. And also Jserv told us to do verification automatically without printing the array value. According to the above, I write the answer of the input in data section to compare. But when I execute by RV32, it outputs "wrong answer". ![](https://i.imgur.com/vf0CEcY.png) I did use the GDB remote debugging, but error. ``` $ riscv32-none-elf-gdb (gdb) file hello.elf (gdb) target remote :1234 ``` ![](https://i.imgur.com/O4O4opO.png) ## update I finally find the bug in my code. I accidentally write the answer as 4. It soppose to be 5! I fix it in my [assignment3](https://hackmd.io/@Fo7UsdePRsKPVV4CPYGbpA/rydAP0OSo).