Lab1: RV32I Simulator

--- tags: Computer Architecture (2022 Fall) --- # Lab1: RV32I Simulator ###### tags: `Computer Architecture 2022` ## probelm introduction Given an integer rowIndex, return the rowIndexth (0-indexed) row of the Pascal's triangle. In Pascal's triangle, each number is the sum of the two numbers directly above it as shown: ![](https://i.imgur.com/QqJslqq.gif) ## my thinking ex. 1 layer 0 / \ 1 (1) layer 1 / \ / \ 1 2 (1) layer 2 /\ /\ /\ 1 3 (3) (1) layer 3 () represents copy * I calculate the result layer by layer, and just rewrite to the same array. Because I just have to get values of specified layer. * I only caculate the left half elements, from middle to leftmost. * Then I copy the left value to right side. The () indicates the copy value. * By doing so, an half computing complexity can be saved. ## my leetcode solution ```c= int* getRow(int rowIndex, int* returnSize){ int *ret_arr = malloc(sizeof(int)* (rowIndex + 1)); *returnSize = (rowIndex + 1); int layer = 0; while(rowIndex>=layer){// check whether the required layer is reached int i=0; for(i=layer/2;i>=0;i--){//only calculate the left half. by the way, calculate from the middle to leftmost to prevent wrong answers from rewriting array if(i==0) ret_arr[i] = 1; else ret_arr[i] = ret_arr[i] + ret_arr[i-1]; } for(i=0;i<=layer/2;i++){// copy left to right ret_arr[layer-i] = ret_arr[i]; } layer++; } return ret_arr; } ``` ## rewrite to main function by myself ``` c= #include<stdio.h> #include<stdlib.h> int *getRow(int ,int*); int main(){ int *ans = NULL; int ans_size ; int i; int rowIndex = 0; ans = getRow(rowIndex, &ans_size); for(i=0;i<ans_size;i++){ printf("%d ",ans[i]); } printf("\n"); rowIndex = 10; ans = getRow(rowIndex, &ans_size); for(i=0;i<ans_size;i++){ printf("%d ",ans[i]); } printf("\n"); rowIndex = 33; ans = getRow(rowIndex, &ans_size); for(i=0;i<ans_size;i++){ printf("%d ",ans[i]); } printf("\n"); return 0; } int* getRow(int rowIndex, int* returnSize){ int *ret_arr = malloc(sizeof(int)* (rowIndex + 1)); *returnSize = (rowIndex + 1); int layer = 0; int middle = layer >> 1; while(rowIndex>=layer){ int i=0; for(i=layer/2;i>=0;i--){ if(i==0) ret_arr[i] = 1; else ret_arr[i] = ret_arr[i] + ret_arr[i-1]; } for(i=0;i<=layer/2;i++){ ret_arr[layer-i] = ret_arr[i]; } layer++; } return ret_arr; } ``` ## test data ### test1 input: rowIndex = 0 output: returnSize = 1 array = 1 ### test2 rowIndex = 33 returnSize = 34 array = 1 33 528 5456 40920 237336 1107568 4272048 13884156 38567100 92561040 193536720 354817320 573166440 818809200 1037158320 1166803110 1166803110 1037158320 818809200 573166440 354817320 193536720 92561040 38567100 13884156 4272048 1107568 237336 40920 5456 528 33 1 ### test3 rowIndex = 10 returnSize = 11 array = 1 10 45 120 210 252 210 120 45 10 1 ## assembly code (handwrite by myself) ```s= #s0-11 :save registers #a0-7 :argument registers #t0-6 :temp registers .data #ret_arr: .word 0,0,0,0,0,0,0,0,0,0 str1: .string " " newline: .string "\n" .bss ret_arr: .word 0 .text main: addi a3,x0,0 #a3 = rowIndex la s0,ret_arr # la s1,str1 jal getRow addi t1,x0,0 #t1 = i jal print_loop addi a3,x0,10 #a3 = rowIndex la s0,ret_arr # la s1,str1 jal getRow addi t1,x0,0 #t1 = i jal print_loop addi a3,x0,33 #a3 = rowIndex la s0,ret_arr # la s1,str1 jal getRow addi t1,x0,0 #t1 = i jal print_loop li a7, 10 # Halts the simulator ecall getRow: addi sp,sp,-4 sw ra,0(sp) addi a4,a3,1 #a4 = returnSize addi a5,x0,0 #a5 = layer while_loop: # to determine whether the target layer has been calculated srli a6,a5,1 #a6 = middle, to ack the boundary of keft side addi t1,a6,0 #t1 = i la s0,ret_arr #s0 = base addr of ret_arr jal for_loop1 #bge t1,x0, for_loop1 addi t1,x0,0 #t1 = i jal for_loop2 addi a5,a5,1 bge a3,a5,while_loop # rowIndex>=layer? if true, while loop continue becaue while loop hasn't reached the target layer while_end: lw ra,0(sp) addi sp,sp,4 jr ra for_loop1: # to calculate left half elements of a layer blt t1,x0,for_loop1_end # determine whether loop continue bne t1,x0,else1 # determine i==0? determine if-else statement addi t3,x0,1 #t3 = 1 slli t2,t1,2 #t2 = i*4, word addr to byte addr add t2,t2,s0 #t2 =offset+base addr sw t3,0(t2) #store 1 to array[0] addi t1,t1,-1 # update iteration variable j for_loop1 for_loop1_end: jr ra # return to caller else1: addi t2,t1,-1 #t2 = i-1 slli t2,t2,2 #to byte address add t2,s0,t2 lw t3,0(t2) # load ret_arr[i-1] add t4,t3,x0 # t4 = ret_arr[i-1] addi t2,t1,0 #t2 = i slli t2,t2,2 #to byte address add t2,s0,t2 #base addr + i*4, to update the address of storing lw t3,0(t2) # load ret_arr[i] add t4,t3,t4 #t4+=ret_arr[i] sw t4,0(t2) addi t1,t1,-1 j for_loop1 for_loop2: bge a6,t1,for_loop2_itr jr ra for_loop2_itr: # for copy left to right sub t2,a5,t1 # t2=layer-i slli t2,t2,2 slli t3,t1,2 add t3,s0,t3 add t2,s0,t2 lw t4,0(t3) sw t4,0(t2) addi t1,t1,1 j for_loop2 print_loop: # to show the result bge t1,a4,print_loop_end lw a0, 0(s0) # a0 used for return print number li a7,1 #to print int ecall la a0,str1 li a7,4 #to print string ecall addi t1,t1,1 addi s0,s0,4 j print_loop print_loop_end: la a0,newline li a7,4 #to print string ecall jr ra ``` ## simulation result by Ripes the results are indentical to the execution results of c program ![](https://i.imgur.com/kmzuSJF.png) ## analysis use 5-stage processor ![](https://i.imgur.com/IOgTC4O.png) each type has different opcode. we can even only use some(not entire) opcode bit to judge the type. ### trace instructions ![](https://i.imgur.com/pNiKnCL.png) instrction fetch pc=0 inst_mem[0]=0x693 ![](https://i.imgur.com/wfSupFT.png) instrction decode & r/w register file I type, imm=0 ![](https://i.imgur.com/3mT0y14.png) alu execution alu_out = 0+0 = 0 ![](https://i.imgur.com/cH6Rtrj.png) data memory r/w ![](https://i.imgur.com/8Uy9XHM.png) select which signals (PC, alu_out and mem_read_out) should write back or forwarding wr_idx = 13 wr_data = 0 ![](https://i.imgur.com/GD4zM9B.png) # hazard Becasue the cpu is pipeline architecure, the data dependency problem induce absolutely. => data hazard normal read-write hazard should use forwaring, i.e. MEM to EXE, WB to EXE ![](https://i.imgur.com/WqhEkQd.png) load-use hazard must insert nop, then probably use forwarding ![](https://i.imgur.com/OgtiEq8.png) ![](https://i.imgur.com/rbXP90N.png) Besides, because some decisions only can be made by the specified stage(ex. Ripes cpu fails to get the destination of jumping address, until the instruction arrive EXE stage ), meanwhile some instructions had entered(fetched or decoded), the cpu has to flush or invalid those instructions => control hazard. ![](https://i.imgur.com/qnx80sG.png) ![](https://i.imgur.com/utjFTTr.png)