Asssigment3: SoftCPU

# Asssigment3: SoftCPU ## setup install [embedded gcc](https://https://github.com/xpack-dev-tools/riscv-none-embed-gcc-xpack/releases) ```bash= wget https://github.com/xpack-dev-tools/riscv-none-embed-gcc-xpack/releases/download/v10.2.0-1.1/xpack-riscv-none-embed-gcc-10.2.0-1.1-linux-x64.tar.gz tar zxvf xpack-riscv-none-embed-gcc-10.2.0-1.1-linux-x64.tar.gz mv xpack-riscv-none-embed-gcc-10.2.0-1.1 $HOME/riscv-none-embed-gcc export PATH=$PATH /home/lee/riscv-none-gcc/bin ``` install [srv32](https://github.com/sysprog21/srv32) ```bash= git clone https://github.com/sysprog21/srv32 ``` install gtkwave ```bash= sudo apt install gtkwave ``` put your file in srv32/sw and make it up ```bash= mkdir hw3 mv hw3.c $HOME/srv32/sw/hw3 make hw3 ``` ## First requirement [First Assignment](https://hackmd.io/@IVUP6rGJQAmUMyQytt73_g/Hkr7EhUBK) origin code ```c= #include<stdio.h> int main(){ int nums[4] = {2,7,11,15}, target = 9, len = 4; for (int i = 0; i<len; i++){ int complement = target - nums[i]; for (int j = i+1; j<len;j++){ if(nums[j]==complement){ printf("%d %d\n", i, j); } } } } ``` ### make up twosum ![](https://i.imgur.com/kVAzrpx.png) ## Second requirement ### GTKWave It seems like a 4 starge pipeline to me. There are 4 stages, included FETCH, ID, EX, MEM/WB. ![](https://i.imgur.com/Am0jFj4.png) We can see the diagram on the bottom. ![](https://i.imgur.com/DrEZnVo.png) ### Data Hazard There are 3 kinds of data hazard, such as RAW, WAW, and WAR. In the svc32, there is only RAW happening . However, the svc32 is supportint full forwarding. The RAW will be solved, too. ### Load-use data hazard In 5 stage pipeline, registers must be read at ID stage. That occurs a stall because that the ID stage can be only forwading from the WB stage.However, in the svc32, the register will be read at WB stage. This solved the load-use data hazard. ### Control hazard We can see the GTKWave and the assembly code on the bottom. There is a branch at 00000028 to 00000020. We can see the flush happened after 3 cycle, this is because that the branch need to be resolved after EX stage. Then the WB flush happened one cycle later for 0000002C. The number of instructions will be killed is 2 instruction. We can see the flush truggered two cycle. ![](https://i.imgur.com/JN1CkF3.png) ![](https://i.imgur.com/EzKKVOb.png) ## Third requirement optumized code ```c= #include<stdio.h> int main(){ int nums[4] = {2,7,11,15}, target = 9, len = 4; int table[100] = {0}; for (int i = 0; i<len; i++){ int complement = target - nums[i]; if(table[complement]!=0){ printf("%d %d\n", table[complement]-1, i); break; } else table[nums[i]]=i+1; } } ``` We can see the benchmark on the bottom. In the optimized code, I get fewer instructions and fewer cycles. ![](https://i.imgur.com/nxSHZWg.png) ## Fourth requirement The source code go through the embedded gcc gor compile, and we can get some bin and elf file, then we get though the CPU simulator to get the resault of benchmark and wave.fst file. Signature will record value of the operations carried out in the test. Verilator is the simulater that can simulate the execution of RISV-V.