Lab1: RV32I Simulator

# Lab1: RV32I Simulator ###### tags: `Computer Architecture 2022` [TOC] ## Problem Description This problem is based on [LeetCode 13. Roman to Integer](https://leetcode.com/problems/roman-to-integer/). Roman numerals are represented by seven different symbols: I, V, X, L, C, D and M. ``` Symbol Value I 1 V 5 X 10 L 50 C 100 D 500 M 1000 ``` Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not IIII. Instead, the number four is written as IV. ## Code ```c int romanToInt(char* s) { int value[100]; value['I'] = 1; value['V'] = 5; value['X'] = 10; value['L'] = 50; value['C'] = 100; value['D'] = 500; value['M'] = 1000; value['\0'] = 0; int sum = 0; for(int i = 0; s[i] !='\0'; i++){ if(value[s[i]] < value[s[i + 1]]) sum = sum - value[s[i]]; else sum += value[s[i]]; } return sum; } ``` ```assembly romanToInt: addi sp,sp,-448 sw s0,444(sp) # store the value of saver register addi s0,sp,448 sw a0,-436(s0) #addr of s li a5,1 sw a5,-132(s0) #value['I'] li a5,5 sw a5,-80(s0) #value['V'] li a5,10 sw a5,-72(s0) #value['X'] li a5,50 sw a5,-120(s0) #value['L'] li a5,100 sw a5,-156(s0) #value['C'] li a5,500 sw a5,-152(s0) #value['D'] #152 + 4*68(decimal value of 'D') = 424 li a5,1000 sw a5,-116(s0) #value['M'] sw zero,-424(s0) #value['\0'] = value[0] = 0 sw zero,-20(s0) # sum = 0 sw zero,-24(s0) # i = 0 j LOOP_END_CONDITION #s[i] != '\0' FOR_LOOP: lw a5,-24(s0) #a5 = i lw a4,-436(s0) #a4 = s add a5,a4,a5 #a5 = &s[i] lbu a5,0(a5) #a5 = s[i] slli a5,a5,2 #a5 = s[i]*4 addi a5,a5,-16 ## add a5,a5,s0 # lw a4,-408(a5) ##a4 = value[0 + s[i]] = *(-424(s0) + s[i]*4) lw a6,-24(s0) #a6 = i addi a6,a6,1 #a6 = i + 1 lw a3,-436(s0) #a3 = &s add a6,a3,a6 #a6 = &s[i + 1] lbu a6,0(a6) #a6 = s[i + 1] slli a6,a6,2 #a6 = s[i + 1]*2 addi a6,a6,-16 ## add a6,a6,s0 # lw a6,-408(a6) ##a6 = value[0 + s[i + 1]] bge a4,a6,ELSE # value[s[i]] >= value[s[i+1]] lw a5,-20(s0) #a5 = sum sub a5,a5,a4 sw a5,-20(s0) j INCRE_I ELSE: lw a5,-20(s0) #a5 = sum add a5,a4,a5 #a4 = value[s[i]], a5 = sum + value[s[i]] sw a5,-20(s0) #sum = a5 INCRE_I: lw a5,-24(s0) #a5 = i addi a5,a5,1 #a5 = i + 1 sw a5,-24(s0) #i = i + 1 LOOP_END_CONDITION: lw a5,-24(s0) #a5 = i lw a4,-436(s0) #a4 = &s add a5,a4,a5 #a5 = &s[i] lbu a5,0(a5) #a5 = s[i] bne a5,zero, FOR_LOOP #if(s[i]!='\0') goto for_loop lw a5,-20(s0) #a5 = sum mv a0,a5 #a0 = sum lw s0,444(sp) #s0 restore addi sp,sp,448 #sp restore jr ra ``` Following is the allocation of each variable. The details have been illustrated behind each instruction within the above code. <img src="https://i.imgur.com/pUHmjQJ.png" width="50%" /> ## Analysis via Ripes ### IF <img src="https://i.imgur.com/IZZxlBa.png" width="50%" /> Notice that there are three paths out of PC. 1. The one connected to the adder is for getting the next instruction by adding the current one with 4. 2. The middle one is for the branch instruction(e.g. bge, we would see in the IE stage for more details!). 3. The lowest one is for general instruction fetch.(Notice that the value comes out from PC could be viewed as a "address", and the Instr. memory block could give you what the real instruction looks like! In verilog design, we use a "block memory generator" to generate an instruction memory) ### ID & IE ![](https://i.imgur.com/PfCglPs.png) The instruction would conduct decoding(to get the rd, rs1, rs2, funct7, etc.) and relative execution in these two stage, respectively. The thing I want to emphasize is behavior of the branch instruction. Notice that when bge is in EX stage and the jump condition holds, the clear signal in IFID and IDEX pipelines light up, which means the two instructions following bge would be flushed and the next pc value would return after adding the immediate. ### MEM & WB The data in register to/from memory would store/load in the MEM stage; likewise, you can implement in verilog while generate a RAM for the data memory(compared with ROM for the instruction memory). The WB stage is for data to write back, if any. In these two stage, we want to take a closer look at data hazard problem. Since the implementation of forwarding(as the following graphs show), we could only consider the load instruction, which need to stall a cycle(not flush, that is, the following data wouldn't be thrown out, compared to the branch instruction!) <img src="https://i.imgur.com/4IMdRBI.png" width="75%" > ![](https://i.imgur.com/yVrLfmY.png) ### ecall ![](https://i.imgur.com/HA3se8U.png) After the ecall instruction(exception occurs), the cpu would stall 2 cycles in order to deal with the exception handling. There are some special registers that need to be modified; mtvec, mcause, mepc, mtval(or mbadaddr). Notice that the front character "m" means machine mode. In general, there are three modes in a complete CPU; machine mode, user mode, and supervisor mode. ## Problems Unsolved The assembly code is revised the generated code via the the tool **riscv32-unknown-elf-gcc**; however, there are several things I can't figure out the purposes * Why the offset of the stack pointer is 448? I know besides the value array we need to space 400 bytes for it, there are totally five variables we need to store. And the input character array it remain 12 bytes for it to store the value, why it remains 12 bytes between the bottom of the stack frame and the location of s array? I need to do more experiment to figure it out. * The code as shown in the following, why does the compiler not combine the first and third instruction into the one? Is there any optimization technique I don't understand? ``` addi a5,a5,-16 ## add a5,a5,s0 # lw a4,-408(a5) ##a4 = value[0 + s[i]] = *(-424(s0) + s[i]*4)` ``` * How does the ecall instruction work exactly? Is there any coprocessor we need to implement to fulfill the exception function? * Last but not least, how could I read integer array from the terminal? It seems that the system call **read** would always read the input from the terminal as a character. I need to study more for those questions.