# Computer-Architecture term_project riscv SOC # pre-work On the afternoon of the day when I chose the subject to do, I returned my two classes, because I knew that I would't have too much fun in the next month, and I needed a lot of time to complete this term_project. ## sample [從零開始的RISC-V SoC架構設計與Linux核心運行 - 硬體篇](https://hackmd.io/@w4K9apQGS8-NFtsnFXutfg/B1Re5uGa5) ## VIVADO [guide](https://www.youtube.com/watch?v=fBFn32Al0yw) Install VIVADO (please use the ML version, do not need to use the license). **Please use the version after 2020.** There is a problem of file corruption when downloading in 2018 (personal experience, re-running three times) ## VSCODE I use VSCODE to edit my code. ## sample [從零開始的RISC-V SoC架構設計與Linux核心運行 - 硬體篇](https://hackmd.io/@w4K9apQGS8-NFtsnFXutfg/B1Re5uGa5) ## VIVADO [guide](https://www.youtube.com/watch?v=fBFn32Al0yw) Install VIVADO (please use the ML version, do not need to use the license). **Please use the version after 2020.** There is a problem of file corruption when downloading in 2018 (personal experience, re-running three times) ## VSCODE I use VSCODE to edit my code. (notepad is super bad) ## install make on windows [follow this](https://blog.csdn.net/weixin_45903371/article/details/113886121) ## fpga ARTY_A7_100T Be sure to use the above vivado version Specification req: ![](https://i.imgur.com/8cERv7j.png) [borad file](https://digilent.com/reference/programmable-logic/guides/installing-vivado-and-sdk)(it got guidence) ## References [ithome](https://ithelp.ithome.com.tw/users/20141480/ironman/4772) [bilibili](https://www.bilibili.com/video/BV1Ve411x75W/?spm_id_from=333.337.search-card.all.click&vd_source=03de631ca969e8af97eafc9d8d816f56) well,I know learning how to write verilog on this web site sounds funny,but I'M VERY BAD IN english:( [datasheet](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf) It is very usefull,you don't have to remember the inst-type when this book is in your computer [exp from github](https://github.com/ucb-bar/riscv-mini) [book1](https://www.books.com.tw/products/0010933946)(this book is very usefull to learn some basic knoledge)**MUST READ CH1-CH3 VERY CAREFULLY** ![](https://i.imgur.com/sROhDA6.jpg) [book2](https://www.books.com.tw/products/CN11542677) [github exp](https://github.com/yutongshen/RISC-V_SoC) [stackoverflow](https://stackoverflow.com/) **A 5-STAGE_cpu code from my friend.(x86)** **A NTU EE MASTER friend** **A lot of money(fpga board is very expensive QQ).** [handshakes's RTL](https://www.cnblogs.com/mikewolf2002/p/11345401.html) ## first of all I made two versions on the CPU side (yes, you read that right Since I don't know how to write verilog, I trained myself from scratch. At the beginning, I wrote a 3-stage pipeline CPU that can execute some RV32I (except load, store types) instructions. Then I tried to write a three-stage pipeline CPU, and tried the SOC, and successfully executed the test (RV32I) on modleSim. Then came the final version, I tried to add division and multiplication instructions, and added some SOC peripherals, simulated by two top-level files, one is used to integrate all the components of the CPU, and the other is used to connect SOC&CPU The instruction test can be successfully performed on windows, and the bitstream is successfully produced on vivado. BUT!!!!!! **My board hasn't arrived yet** ![](https://i.imgur.com/SvBNHKD.jpg) Alright, a little too much nonsense, just like the teacher said ***"talk is meaningless,show me the code!"*** # The first version ## Purpose: Pass the instruction set test of risc-v RV32I ### generated(take addi_test as example) we turned this rv32i code to binary file,and use it to test our RTL code's instruction fetch,decode,exe,mem_op's ability. ``` generated/rv32ui-p-addi: file format elf32-littleriscv Disassembly of section .text.init: 00000000 <_start>: 0: 00000d13 li s10,0 4: 00000d93 li s11,0 00000008 <test_2>: 8: 00000093 li ra,0 c: 00008f13 mv t5,ra 10: 00000e93 li t4,0 14: 00200193 li gp,2 18: 27df1c63 bne t5,t4,290 <fail> 0000001c <test_3>: 1c: 00100093 li ra,1 20: 00108f13 addi t5,ra,1 24: 00200e93 li t4,2 28: 00300193 li gp,3 2c: 27df1263 bne t5,t4,290 <fail> ... ``` ## RTL introduce(cpu) Determine the address of the command pc_reg ``` module pc_reg( input wire clk, input wire rst, input wire[31:0] jump_addr_i, input wire jump_en, output reg[31:0] pc_o ); always @(posedge clk) begin if (rst==1'b0) //set low/0 as neg pc_o<=32'b0; else if(jump_en) pc_o<=jump_addr_i; else pc_o<=pc_o+3'd4; end endmodule ``` ## if_id ``` module if_id( input wire clk, input wire rst, input wire [31:0] inst_i, input wire hold_flag_i, input wire [31:0] inst_addr_i, output wire[31:0] inst_addr_o, output wire[31:0] inst_o ); reg rom_flag; always @(posedge clk) begin if(!rst|hold_flag_i) rom_flag<=1'b0; else rom_flag<=1'b1; end assign inst_o=rom_flag?inst_i:`INST_NOP;//if flag==1 go rom dff_set #(32)dff2(clk,rst,hold_flag_i,32'b0,inst_addr_i,inst_addr_o); ``` ## id According to the opcode to determine which type of instruction. Use the function code (fun3) (fun7) to determine which command to generate the corresponding signal to the next stage ``` `include "defines.v" module id( //from if_id input wire[31:0] inst_i , input wire[31:0] inst_addr_i , // to regs output reg[4:0] rs1_addr_o , output reg[4:0] rs2_addr_o , // from regs input wire[31:0] rs1_data_i , input wire[31:0] rs2_data_i , //to id_ex output reg[31:0] inst_o , output reg[31:0] inst_addr_o , output reg[31:0] op1_o , output reg[31:0] op2_o , output reg[4:0] rd_addr_o , output reg reg_wen , output reg[31:0] base_addr_o , output reg[31:0] addr_offset_o , //to mem read output reg mem_rd_req_o , output reg[31:0] mem_rd_addr_o ); wire[6:0] opcode; wire[4:0] rd ; wire[2:0] func3 ; wire[4:0] rs1 ; wire[4:0] rs2 ; wire[6:0] func7 ; wire[11:0]imm ; wire[4:0] shamt ; assign opcode = inst_i[6:0]; assign rd = inst_i[11:7]; assign func3 = inst_i[14:12]; assign rs1 = inst_i[19:15]; assign rs2 = inst_i[24:20]; assign func7 = inst_i[31:25]; assign imm = inst_i[31:20]; assign shamt = inst_i[24:20]; always @(*)begin inst_o = inst_i; inst_addr_o = inst_addr_i; case(opcode) `INST_TYPE_I:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; base_addr_o = 32'b0; addr_offset_o = 32'b0; case(func3) `INST_ADDI,`INST_SLTI,`INST_SLTIU,`INST_XORI,`INST_ORI,`INST_ANDI:begin rs1_addr_o = rs1; rs2_addr_o = 5'b0; op1_o = rs1_data_i; op2_o = {{20{imm[11]}},imm}; rd_addr_o = rd; reg_wen = 1'b1; end `INST_SLLI,`INST_SRI:begin rs1_addr_o = rs1; rs2_addr_o = 5'b0; op1_o = rs1_data_i; op2_o = {27'b0,shamt}; rd_addr_o = rd; reg_wen = 1'b1; end default:begin rs1_addr_o = 5'b0; rs2_addr_o = 5'b0; op1_o = 32'b0; op2_o = 32'b0; rd_addr_o = 5'b0; reg_wen = 1'b0; end endcase end `INST_TYPE_R_M:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; base_addr_o = 32'b0; addr_offset_o = 32'b0; case(func3) `INST_ADD_SUB,`INST_SLT,`INST_SLTU,`INST_XOR,`INST_OR,`INST_AND:begin rs1_addr_o = rs1; rs2_addr_o = rs2; op1_o = rs1_data_i; op2_o = rs2_data_i; rd_addr_o = rd; reg_wen = 1'b1; end `INST_SLL,`INST_SR:begin rs1_addr_o = rs1; rs2_addr_o = rs2; op1_o = rs1_data_i; op2_o = {27'b0,rs2_data_i[4:0]}; rd_addr_o = rd; reg_wen = 1'b1; end default:begin rs1_addr_o = 5'b0; rs2_addr_o = 5'b0; op1_o = 32'b0; op2_o = 32'b0; rd_addr_o = 5'b0; reg_wen = 1'b0; end endcase end `INST_TYPE_B:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; case(func3) `INST_BNE,`INST_BEQ,`INST_BLT,`INST_BGE,`INST_BLTU,`INST_BGEU:begin rs1_addr_o = rs1; rs2_addr_o = rs2; op1_o = rs1_data_i; op2_o = rs2_data_i; rd_addr_o = 5'b0; reg_wen = 1'b0; base_addr_o = inst_addr_i; addr_offset_o = {{19{inst_i[31]}},inst_i[31],inst_i[7],inst_i[30:25],inst_i[11:8],1'b0}; end default:begin rs1_addr_o = 5'b0; rs2_addr_o = 5'b0; op1_o = 32'b0; op2_o = 32'b0; rd_addr_o = 5'b0; reg_wen = 1'b0; base_addr_o = 32'b0; addr_offset_o = 32'b0; end endcase end `INST_TYPE_L:begin case(func3) `INST_LW,`INST_LH,`INST_LB,`INST_LHU,`INST_LBU:begin mem_rd_req_o = 1'b1 ; mem_rd_addr_o = rs1_data_i + {{20{imm[11]}},imm}; rs1_addr_o = rs1; rs2_addr_o = 5'b0; op1_o = 32'b0; op2_o = 32'b0; rd_addr_o = rd; reg_wen = 1'b1; base_addr_o = rs1_data_i; addr_offset_o = {{20{imm[11]}},imm}; end default:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0 ; rs1_addr_o = 5'b0 ; rs2_addr_o = 5'b0 ; op1_o = 32'b0 ; op2_o = 32'b0 ; rd_addr_o = 5'b0 ; reg_wen = 1'b0 ; end endcase end `INST_TYPE_S:begin case(func3) `INST_SW,`INST_SH,`INST_SB:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0 ; rs1_addr_o = rs1 ; rs2_addr_o = rs2 ; op1_o = 32'b0 ; op2_o = rs2_data_i ; rd_addr_o = 5'b0 ; reg_wen = 1'b0 ; base_addr_o = rs1_data_i ; addr_offset_o = {{20{inst_i[31]}},inst_i[31:25],inst_i[11:7]}; end default:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0 ; rs1_addr_o = 5'b0 ; rs2_addr_o = 5'b0 ; op1_o = 32'b0 ; op2_o = 32'b0 ; rd_addr_o = 5'b0 ; reg_wen = 1'b0 ; base_addr_o = 32'b0; addr_offset_o = 32'b0; end endcase end `INST_JAL:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; rs1_addr_o = 5'b0; rs2_addr_o = 5'b0; op1_o = inst_addr_i; op2_o = 32'h4; rd_addr_o = rd; reg_wen = 1'b1; base_addr_o = inst_addr_i; addr_offset_o = {{12{inst_i[31]}}, inst_i[19:12], inst_i[20], inst_i[30:21], 1'b0}; end `INST_LUI:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; rs1_addr_o = 5'b0; rs2_addr_o = 5'b0; op1_o = {inst_i[31:12],12'b0}; op2_o = 32'b0; rd_addr_o = rd; reg_wen = 1'b1; base_addr_o = 32'b0; addr_offset_o = 32'b0; end `INST_JALR:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; rs1_addr_o = rs1; rs2_addr_o = 5'b0; op1_o = inst_addr_i; op2_o = 32'h4; rd_addr_o = rd; reg_wen = 1'b1; base_addr_o = rs1_data_i; addr_offset_o = {{20{imm[11]}},imm}; end `INST_AUIPC:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; rs1_addr_o = 5'b0; rs2_addr_o = 5'b0; op1_o = {inst_i[31:12],12'b0}; op2_o = inst_addr_i; rd_addr_o = rd; reg_wen = 1'b1; base_addr_o = 32'b0; addr_offset_o = 32'b0; end default:begin mem_rd_req_o = 1'b0 ; mem_rd_addr_o = 32'b0; rs1_addr_o = 5'b0; rs2_addr_o = 5'b0; op1_o = 32'b0; op2_o = 32'b0; rd_addr_o = 5'b0; reg_wen = 1'b0; base_addr_o = 32'b0; addr_offset_o = 32'b0; end endcase end endmodule ``` id_ex ``` module id_ex( input wire clk, input wire rst, //from id input wire[31:0] inst_i, input wire[31:0] inst_addr_i, input wire[31:0] op1_i, input wire[31:0] op2_i, input wire[4:0] rd_addr_i, input wire reg_wen_i, input wire[31:0] base_addr_i, input wire[31:0] addr_offset_i, //from ctrl input wire hold_flag_i, //to ex output wire[31:0] inst_o, output wire[31:0] inst_addr_o, output wire[31:0] op1_o, output wire[31:0] op2_o, output wire[4:0] rd_addr_o, output wire[31:0] base_addr_o, output wire reg_wen_o, output wire[31:0] addr_offset_o ); dff_set #(32) dff1(clk,rst,hold_flag_i,`INST_NOP,inst_i,inst_o); dff_set #(32) dff2(clk,rst,hold_flag_i,32'b0,inst_addr_i,inst_addr_o); dff_set #(32) dff3(clk,rst,hold_flag_i,32'b0,op1_i,op1_o); dff_set #(32) dff4(clk,rst,hold_flag_i,32'b0,op2_i,op2_o); dff_set #(5) dff5(clk,rst,hold_flag_i,5'b0,rd_addr_i,rd_addr_o); dff_set #(1) dff6(clk,rst,hold_flag_i,1'b0,reg_wen_i,reg_wen_o); dff_set #(32) dff7(clk,rst,hold_flag_i,32'b0,base_addr_i,base_addr_o); dff_set #(32) dff8(clk,rst,hold_flag_i,32'b0,addr_offset_i,addr_offset_o); endmodule ``` ## ex I started by putting calculations into logic without prior announcement, and then my friends said I was stupid XD I later verified that this would slow down the overall CPU performance ``` module ex( //from id_ex input wire[31:0] inst_i, input wire[31:0] inst_addr_i, input wire[31:0] op1_i, input wire[31:0] op2_i, input wire[4:0] rd_addr_i, input wire rd_wen_i, input wire[31:0] base_addr_i, input wire[31:0] addr_offset_i, //to regs output reg[4:0] rd_addr_o, output reg[31:0]rd_data_o, output reg rd_wen_o, //to ctrl output reg[31:0]jump_addr_o, output reg jump_en_o, output reg hold_flag_o, //to mem write output reg mem_wr_req_o, output reg[3:0] mem_wr_sel_o, output reg[31:0]mem_wr_addr_o, output reg[31:0]mem_wr_data_o, //from memread input wire[31:0]mem_rd_data_i ); wire[6:0] opcode; wire[4:0] rd; wire[2:0] func3; wire[4:0] rs1; wire[4:0] rs2; wire[6:0] func7; wire[11:0] imm; wire[4:0] shamt; assign opcode=inst_i[6:0]; assign rd=inst_i[11:7]; assign func3 =inst_i[14:12]; assign func7 =inst_i[31:25]; assign rs1=inst_i[19:15]; assign rs2=inst_i[24:20]; assign imm=inst_i[31:20]; assign shamt=inst_i[24:20]; //branch //wire[31:0] jump_imm={{19{inst_i[31]}},inst_i[31],inst_i[7],inst_i[30:25],inst_i[11:8],1'b0}; wire op1_i_equal_op2_i; wire op1_i_less_op2_i_signed; wire op1_i_less_op2_i_unsigned; assign op1_i_less_op2_i_signed = ($signed(op1_i) < $signed(op2_i))?1'b1:1'b0; assign op1_i_less_op2_i_unsigned = (op1_i < op2_i)?1'b1:1'b0; assign op1_i_equal_op2_i = (op1_i == op2_i)?1'b1:1'b0; // logic units wire[31:0] op1_i_add_op2_i; wire[31:0] op1_i_and_op2_i; wire[31:0] op1_i_xor_op2_i; wire[31:0] op1_i_or_op2_i; wire[31:0] op1_i_shift_left_op2_i; wire[31:0] op1_i_shift_right_op2_i; wire[31:0] base_addr_add_addr_offset; assign op1_i_add_op2_i=op1_i+op2_i; assign op1_i_and_op2_i=op1_i&op2_i; assign op1_i_xor_op2_i=op1_i^op2_i; assign op1_i_or_op2_i=op1_i|op2_i; assign op1_i_shift_left_op2_i=op1_i<<op2_i; assign op1_i_shift_right_op2_i=op1_i>>op2_i; assign base_addr_add_addr_offset=base_addr_i+addr_offset_i; // type I wire[31:0] SRA_mask; assign SRA_mask = (32'hffff_ffff) >> op2_i[4:0]; wire[1:0]store_index = base_addr_add_addr_offset[1:0]; wire[1:0]load_index = base_addr_add_addr_offset[1:0]; ``` ``` `INST_TYPE_I:begin jump_addr_o=32'b0;//write wen jump_en_o=1'b0; hold_flag_o=1'b0; mem_wr_req_o=1'b0; mem_wr_sel_o=4'b0; mem_wr_addr_o=32'b0; mem_wr_data_o=32'b0; case(func3) `INST_ADDI:begin//same instruction structure rd_data_o=op1_i_add_op2_i; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end `INST_SLTI:begin rd_data_o={30'b0,op1_i_less_op2_i_signed}; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end `INST_SLTIU:begin rd_data_o={30'b0,op1_i_less_op2_i_unsigned}; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end `INST_XORI:begin rd_data_o=op1_i_xor_op2_i; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end `INST_ORI:begin rd_data_o=op1_i_or_op2_i; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end `INST_ANDI:begin rd_data_o= op1_i_and_op2_i; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end `INST_SLLI:begin rd_data_o=op1_i_shift_left_op2_i; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end `INST_SRI:begin if (func7[5]==1'b1) begin //SRAI (only 5 bit limit) rd_data_o=((op1_i_shift_right_op2_i) & SRA_mask) | ({32{op1_i[31]}} & (~SRA_mask)); rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end else begin//SRLI rd_data_o=op1_i_shift_right_op2_i; rd_addr_o=rd_addr_i; rd_wen_o=1'b1; end end default:begin rd_data_o=32'b0; rd_addr_o=5'b0; rd_wen_o=1'b0; end endcase end ``` ram As a first version external device. ``` module ram( input wire clk, input wire rst, input wire [3:0] wen, input wire [32-1:0]w_addr_i, input wire [32-1:0]w_data_i, input wire ren, input wire [32-1:0]r_addr_i, output wire [32-1:0]r_data_o ); wire[11:0] w_addr = w_addr_i[13:2]; wire[11:0] r_addr = r_addr_i[13:2]; dual_ram #( .DW(8), .AW(12), .MEM_NUM(4096) ) ram_byte0 ( .clk (clk ), .rst (rst ), .wen (wen[0] ), .w_addr_i (w_addr ), .w_data_i (w_data_i[7:0] ), .ren (ren ), .r_addr_i (r_addr ), .r_data_o (r_data_o[7:0] ) ); dual_ram #( .DW(8), .AW(12), .MEM_NUM(4096) ) ram_byte1 ( .clk (clk ), .rst (rst ), .wen (wen[1] ), .w_addr_i (w_addr ), .w_data_i (w_data_i[15:8] ), .ren (ren ), .r_addr_i (r_addr ), .r_data_o (r_data_o[15:8] ) ); dual_ram #( .DW(8), .AW(12), .MEM_NUM(4096) ) ram_byte2 ( .clk (clk ), .rst (rst ), .wen (wen[2] ), .w_addr_i (w_addr ), .w_data_i (w_data_i[23:16]), .ren (ren ), .r_addr_i (r_addr ), .r_data_o (r_data_o[23:16]) ); dual_ram #( .DW(8), .AW(12), .MEM_NUM(4096) ) ram_byte3 ( .clk (clk ), .rst (rst ), .wen (wen[3] ), .w_addr_i (w_addr ), .w_data_i (w_data_i[31:24]), .ren (ren ), .r_addr_i (r_addr ), .r_data_o (r_data_o[31:24]) ); endmodule ``` rom As a first version external device. ``` module rom( input wire clk, input wire rst, input wire wen, input wire[32-1:0] w_addr_i, input wire[32-1:0] w_data_i, input wire ren, input wire[32-1:0] r_addr_i, output wire[32-1:0] r_data_o ); wire[11:0] w_addr = w_addr_i[13:2]; wire[11:0] r_addr = r_addr_i[13:2]; dual_ram#( .DW(32), .AW(12), .MEM_NUM(4096) ) rom_32bit( .clk(clk), .rst(rst), .wen(wen), .w_addr_i(w_addr), .w_data_i(w_data_i), .ren(ren), .r_addr_i(r_addr), .r_data_o(r_data_o) ); endmodule ``` ctrl Control the jump of B_TYPE instruction ``` module ctrl ( input wire[31:0]jump_addr_i, input wire jump_en_i, input wire hold_flag_ex_i, output reg[31:0]jump_addr_o, output reg jump_en_o, output reg hold_flag_o ); always @(*)begin jump_addr_o = jump_addr_i; jump_en_o = jump_en_i; if( jump_en_i || hold_flag_ex_i)begin hold_flag_o = 1'b1; end else begin hold_flag_o = 1'b0; end end endmodule ``` ### risc-v top connect each module. ``` module open_risc_v( input wire clk , input wire rst , //inst input wire [31:0] inst_i , output wire [31:0] inst_addr_o , //read mem output wire mem_rd_req_o , output wire [31:0] mem_rd_addr_o , input wire [31:0] mem_rd_data_i , //write mem output wire mem_wr_req_o , output wire [3:0] mem_wr_sel_o , output wire [31:0] mem_wr_addr_o , output wire [31:0] mem_wr_data_o ); //pc to rom wire[31:0] pc_reg_pc_o; assign inst_addr_o = pc_reg_pc_o; //if to if_id wire[31:0] if_inst_addr_o; wire[31:0] if_inst_o; // if_id to id wire[31:0] if_id_inst_addr_o; wire[31:0] if_id_inst_o; //ex to regs wire[4:0] ex_rd_addr_o; wire[31:0] ex_rd_data_o; wire ex_reg_wen_o; //id to regs wire[4:0] id_rs1_addr_o; wire[4:0] id_rs2_addr_o; //id to id_ex wire[31:0] id_inst_o; wire[31:0] id_inst_addr_o; wire[31:0] id_op1_o; wire[31:0] id_op2_o; wire[4:0] id_rd_addr_o; wire id_reg_wen; wire[31:0] id_base_addr_o ; wire[31:0] id_addr_offset_o ; //regs to id wire[31:0] regs_reg1_rdata_o; wire[31:0] regs_reg2_rdata_o; //id_ex to ex wire[31:0] id_ex_inst_o; wire[31:0] id_ex_inst_addr_o; wire[31:0] id_ex_op1_o; wire[31:0] id_ex_op2_o; wire[4:0] id_ex_rd_addr_o; wire id_ex_reg_wen; wire[31:0] id_ex_base_addr_o ; wire[31:0] id_ex_addr_offset_o; //ex to ctrl wire[31:0] ex_jump_addr_o; wire ex_jump_en_o; wire ex_hold_flag_o; //ctrl to pc_reg wire[31:0] ctrl_jump_addr_o; wire ctrl_jump_en_o; //ctrl to if_id id_ex wire ctrl_hold_flag_o; pc_reg pc_reg_inst( .clk (clk ), .rst (rst ), .jump_addr_i (ctrl_jump_addr_o ), .jump_en (ctrl_jump_en_o ), .pc_o (pc_reg_pc_o ) ); if_id if_id_inst( .clk (clk ), .rst (rst ), .hold_flag_i (ctrl_hold_flag_o ), .inst_i (inst_i ), .inst_addr_i (pc_reg_pc_o ), .inst_addr_o (if_id_inst_addr_o ), .inst_o (if_id_inst_o ) ); //id to rom id id_inst( .inst_i (if_id_inst_o ), .inst_addr_i (if_id_inst_addr_o ), .rs1_addr_o (id_rs1_addr_o ), .rs2_addr_o (id_rs2_addr_o ), .rs1_data_i (regs_reg1_rdata_o ), .rs2_data_i (regs_reg2_rdata_o ), .inst_o (id_inst_o ), .inst_addr_o (id_inst_addr_o ), .op1_o (id_op1_o ), .op2_o (id_op2_o ), .rd_addr_o (id_rd_addr_o ), .reg_wen (id_reg_wen ), .base_addr_o (id_base_addr_o ), .addr_offset_o (id_addr_offset_o ), .mem_rd_req_o (mem_rd_req_o ), .mem_rd_addr_o (mem_rd_addr_o ) ); regs regs_inst( .clk (clk ), .rst (rst ), .reg1_raddr_i (id_rs1_addr_o ), .reg2_raddr_i (id_rs2_addr_o ), .reg1_rdata_o (regs_reg1_rdata_o ), .reg2_rdata_o (regs_reg2_rdata_o ), .reg_waddr_i (ex_rd_addr_o ), .reg_wdata_i (ex_rd_data_o ), .reg_wen (ex_reg_wen_o ) ); id_ex id_ex_inst( .clk (clk ), .rst (rst ), .hold_flag_i (ctrl_hold_flag_o ), .inst_i (id_inst_o ), .inst_addr_i (id_inst_addr_o ), .op1_i (id_op1_o ), .op2_i (id_op2_o ), .rd_addr_i (id_rd_addr_o ), .reg_wen_i (id_reg_wen ), .base_addr_i (id_base_addr_o ), .addr_offset_i (id_addr_offset_o ), .inst_o (id_ex_inst_o ), .inst_addr_o (id_ex_inst_addr_o ), .op1_o (id_ex_op1_o ), .op2_o (id_ex_op2_o ), .rd_addr_o (id_ex_rd_addr_o ), .reg_wen_o (id_ex_reg_wen ), .base_addr_o (id_ex_base_addr_o ), .addr_offset_o (id_ex_addr_offset_o) ); ex ex_inst( .inst_i (id_ex_inst_o ), .inst_addr_i (id_ex_inst_addr_o ), .op1_i (id_ex_op1_o ), .op2_i (id_ex_op2_o ), .rd_addr_i (id_ex_rd_addr_o ), .rd_wen_i (id_ex_reg_wen ), .base_addr_i (id_ex_base_addr_o ), .addr_offset_i (id_ex_addr_offset_o), .rd_addr_o (ex_rd_addr_o ), .rd_data_o (ex_rd_data_o ), .rd_wen_o (ex_reg_wen_o ), .jump_addr_o (ex_jump_addr_o ), .jump_en_o (ex_jump_en_o ), .hold_flag_o (ex_hold_flag_o ), .mem_wr_req_o (mem_wr_req_o ), .mem_wr_sel_o (mem_wr_sel_o ), .mem_wr_addr_o (mem_wr_addr_o ), .mem_wr_data_o (mem_wr_data_o ), .mem_rd_data_i (mem_rd_data_i ) ); ctrl ctrl_inst( .jump_addr_i (ex_jump_addr_o ), .jump_en_i (ex_jump_en_o ), .hold_flag_ex_i (ex_hold_flag_o ), .jump_addr_o (ctrl_jump_addr_o ), .jump_en_o (ctrl_jump_en_o ), .hold_flag_o (ctrl_hold_flag_o ) ); endmodule ``` ### dual_ram ``` module dual_ram #( parameter DW = 32, parameter AW = 12, parameter MEM_NUM = 4096 ) ( input wire clk, input wire rst, input wire wen, input wire[AW-1:0] w_addr_i, input wire[DW-1:0] w_data_i, input wire ren, input wire[AW-1:0] r_addr_i, output wire[DW-1:0] r_data_o ); wire[DW-1:0] r_data_wire ; reg rd_equ_wr_flag ; reg[DW-1:0] w_data_reg ; assign r_data_o = (rd_equ_wr_flag) ? w_data_reg : r_data_wire; always @(posedge clk)begin if(!rst) w_data_reg <= 'b0; else w_data_reg <= w_data_i; end //switch always @(posedge clk)begin if(rst && wen && ren && w_addr_i == r_addr_i ) rd_equ_wr_flag <= 1'b1; else if(rst && ren) rd_equ_wr_flag <= 1'b0; end dual_ram_template #( .DW (DW), .AW (AW), .MEM_NUM (MEM_NUM) )dual_ram_template_isnt ( .clk(clk), .rst(rst), .wen(wen), .w_addr_i(w_addr_i ), .w_data_i(w_data_i ), .ren(ren), .r_addr_i(r_addr_i ), .r_data_o(r_data_wire) ); endmodule module dual_ram_template #( parameter DW = 32, parameter AW = 12, parameter MEM_NUM = 4096 ) ( input wire clk, input wire rst, input wire wen, input wire[AW-1:0] w_addr_i, input wire[DW-1:0] w_data_i, input wire ren, input wire[AW-1:0] r_addr_i, output reg[DW-1:0] r_data_o ); reg[DW-1:0] memory[0:MEM_NUM-1]; always @(posedge clk)begin if(rst && ren) r_data_o <= memory[r_addr_i]; end always @(posedge clk)begin if(rst && wen) memory[w_addr_i] <= w_data_i; end endmodule ``` ### top_soc Connect each module of SOC. ``` module open_risc_v_soc( input wire clk, input wire rst, input wire uart_rxd, input wire debug_button, output wire led_debug, output wire led2 ); // open_risc_v to rom wire[31:0] open_risc_v_inst_addr_o; //rom to open_risc_v wire[31:0] rom_inst_o; // open_risc_v to ram wire open_risc_v_mem_wr_req_o ; wire[3:0] open_risc_v_mem_wr_sel_o ; wire[31:0] open_risc_v_mem_wr_addr_o; wire[31:0] open_risc_v_mem_wr_data_o; wire open_risc_v_mem_rd_req_o ; wire[31:0] open_risc_v_mem_rd_addr_o; //ram to open_risc_v wire[31:0] ram_rd_data_o; //uart_debug to rom wire uart_debug_ce; wire uart_debug_wen; wire[31:0] uart_debug_addr_o; wire[31:0] uart_debug_data_o; //debug_button_debounce to debug wire debug; debug_button_debounce debug_button_debounce_inst( .clk(clk), .rst(rst), .debug_button(debug_button), .debug(debug), .led_debug(led_debug) ); open_risc_v open_risc_v_inst( .clk(clk), .rst(rst), .inst_i(rom_inst_o), .inst_addr_o(open_risc_v_inst_addr_o), .mem_rd_req_o(open_risc_v_mem_rd_req_o), .mem_rd_addr_o(open_risc_v_mem_rd_addr_o), .mem_rd_data_i(ram_rd_data_o), .mem_wr_req_o(open_risc_v_mem_wr_req_o), .mem_wr_sel_o(open_risc_v_mem_wr_sel_o), .mem_wr_addr_o(open_risc_v_mem_wr_addr_o), .mem_wr_data_o(open_risc_v_mem_wr_data_o) ); assign led2 = open_risc_v_mem_wr_data_o[2]; ram ram_inst( .clk(clk), .rst(rst ), .wen(open_risc_v_mem_wr_sel_o), .w_addr_i(open_risc_v_mem_wr_addr_o), .w_data_i(open_risc_v_mem_wr_data_o), .ren(open_risc_v_mem_rd_req_o), .r_addr_i(open_risc_v_mem_rd_addr_o), .r_data_o(ram_rd_data_o) ); rom rom_inst( .clk(clk), .rst(debug), .wen(uart_debug_wen),//ins_write .w_addr_i(uart_debug_addr_o), .w_data_i(uart_debug_data_o), .ren(1'b1),//ins_read .r_addr_i(open_risc_v_inst_addr_o ), .r_data_o(rom_inst_o ) ); uart_debug uart_debug_inst( .clk(clk), .debug(debug), .uart_rxd(uart_rxd), .ce(uart_debug_ce), .wen(uart_debug_wen), .addr_o(uart_debug_addr_o), .data_o(uart_debug_data_o) ); endmodule ``` ### testing insts ``` module tb; reg clk; reg rst; wire x3 = tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[3]; wire x26 = tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[26]; wire x27 = tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[27]; always #10 clk = ~clk; initial begin clk <= 1'b1; rst <= 1'b0; #30; rst <= 1'b1; end //rom start_value initial begin $readmemh("./generated/rv32ui-p-lhu.txt",tb.open_risc_v_soc_inst.rom_inst.rom_32bit.dual_ram_template_isnt.memory); end //get wave initial begin $dumpfile("tb.vcd"); $dumpvars(0, tb); end integer r; initial begin wait(x26 == 32'b1); #200; if(x27 == 32'b1) begin $display("############################"); $display("######## pass !!!#########"); $display("############################"); end else begin $display("############################"); $display("######## fail !!!#########"); $display("############################"); $display("fail testnum = %2d", x3); for(r = 0;r < 31; r = r + 1)begin $display("x%2d register value is %d",r,tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[r]); end end $finish; end open_risc_v_soc open_risc_v_soc_inst( .clk (clk), .rst (rst) ); endmodule ``` ## SIM ### compile and sim ``` import os import subprocess import sys def list_binfiles(path): files = [] list_dir = os.walk(path) for maindir, subdir, all_file in list_dir: for filename in all_file: apath = os.path.join(maindir, filename) if apath.endswith('.bin'): files.append(apath) return files def bin_to_mem(infile, outfile): binfile = open(infile, 'rb') binfile_content = binfile.read(os.path.getsize(infile)) datafile = open(outfile, 'w') index = 0 b0 = 0 b1 = 0 b2 = 0 b3 = 0 for b in binfile_content: if index == 0: b0 = b index = index + 1 elif index == 1: b1 = b index = index + 1 elif index == 2: b2 = b index = index + 1 elif index == 3: b3 = b index = 0 array = [] array.append(b3) array.append(b2) array.append(b1) array.append(b0) datafile.write(bytearray(array).hex() + '\n') binfile.close() datafile.close() def compile(): rtl_dir = os.path.abspath(os.path.join(os.getcwd(), "..")) iverilog_cmd = ['iverilog'] iverilog_cmd += ['-o', r'out.vvp'] iverilog_cmd += ['-I', rtl_dir + r'/rtl'] iverilog_cmd.append(rtl_dir + r'/tb/tb.v') iverilog_cmd.append(rtl_dir + r'/rtl/defines.v') iverilog_cmd.append(rtl_dir + r'/rtl/pc_reg.v') iverilog_cmd.append(rtl_dir + r'/rtl/if_id.v') iverilog_cmd.append(rtl_dir + r'/rtl/id.v') iverilog_cmd.append(rtl_dir + r'/rtl/id_ex.v') iverilog_cmd.append(rtl_dir + r'/rtl/ex.v') iverilog_cmd.append(rtl_dir + r'/rtl/regs.v') iverilog_cmd.append(rtl_dir + r'/rtl/ctrl.v') # iverilog_cmd.append(rtl_dir + r'/rtl/ram.v') iverilog_cmd.append(rtl_dir + r'/rtl/rom.v') iverilog_cmd.append(rtl_dir + r'/rtl/ifetch.v') iverilog_cmd.append(rtl_dir + r'/rtl/open_risc_v.v') iverilog_cmd.append(rtl_dir + r'/utils/dff_set.v') # iverilog_cmd.append(rtl_dir + r'/utils/dual_ram.v') iverilog_cmd.append(rtl_dir + r'/tb/open_risc_v_soc.v') process = subprocess.Popen(iverilog_cmd) process.wait(timeout=5) def sim(): compile() vvp_cmd = [r'vvp'] vvp_cmd.append(r'out.vvp') process = subprocess.Popen(vvp_cmd) try: process.wait(timeout=10) except subprocess.TimeoutExpired: print('!!!Fail, vvp exec timeout!!!') def run(test_binfile): rtl_dir = os.path.abspath(os.path.join(os.getcwd(), "..")) out_mem = rtl_dir + r'/sim/generated/inst_data.txt' bin_to_mem(test_binfile, out_mem) sim() if __name__ == '__main__': sys.exit(run(sys.argv[1])) ``` ### test_all ``` import os import subprocess import sys from compile_and_sim import compile from compile_and_sim import list_binfiles from compile_and_sim import sim from compile_and_sim import bin_to_mem def main(): rtl_dir = os.path.abspath(os.path.join(os.getcwd(), "..")) all_bin_files = list_binfiles(rtl_dir + r'/sim/generated/') for file_bin in all_bin_files: cmd = r'python compile_and_sim.py' + ' ' + file_bin f = os.popen(cmd) r = f.read() index = file_bin.index('-p-') print_name = file_bin[index + 3:-4] if r.find('pass') != -1: print(' ins ' + print_name.ljust(10, ' ') + ' PASS') else: print('ins ' + print_name.ljust(10, ' ') + ' !!!FAIL!!!') f.close() if __name__ == '__main__': main() ``` ![](https://i.imgur.com/5lKB87Y.png) ### test_one_inst ``` from compile_and_sim import compile from compile_and_sim import list_binfiles from compile_and_sim import sim from compile_and_sim import bin_to_mem import sys def main(name='addi'): rtl_dir = os.path.abspath(os.path.join(os.getcwd(), "..")) all_bin_files = list_binfiles(rtl_dir + r'/sim/generated/') for file in all_bin_files: if file.find(name) != -1 and file.find('.bin') != -1: test_binfile = file out_mem = rtl_dir + r'/sim/generated/inst_data.txt' # bin2mem bin_to_mem(test_binfile, out_mem) sim() # get wave # gtkwave_cmd = [r'gtkwave'] # gtkwave_cmd.append(r'tb.vcd') # process = subprocess.Popen(gtkwave_cmd) if __name__ == '__main__': sys.exit(main(sys.argv[1])) ``` # second version ## Purpose1:Complete mul, div, SRC and other instructions, and add more SOC ## Purpose2:Generate bit stream to load into arty-a7-100t Make changes based on the first version, I added J_TAG bus UART TIMER and DIV to implement csr div mlu and other instructions (but I haven't overcome the OVERFLOW problem. I have released some snippets of code to record my implementation process (I haven't slept well for a month...). ## STRUCTURE ![](https://i.imgur.com/XuBwv3T.png) ## RTL ## CORE ### pc_regs ![](https://i.imgur.com/nSI2cuP.png) The main functions are: reset, jump, pause, address increment and other operations on the address signal of the instruction memory, that is, process the address of the instruction to generate the value of the PC register, which will be used as the instruction memory Address signal, used to read instruction content from rom. ``` always @ (posedge clk) begin if (rst == 1'b0 || jtag_reset_flag_i == 1'b1) begin pc_o <= 32'h0; end else if (jump_flag_i == 1'b1) begin pc_o <= jump_addr_i; end else if (hold_flag_i >= 3'b001) begin pc_o <= pc_o; end else begin pc_o <= pc_o + 4'h4; end end ``` ### rom ![](https://i.imgur.com/bOKbq94.png) The main function is: store the programmed instruction code, and output the instruction code according to the value of the PC register Define a 32*4096 two-dimensional array as the space for storing data. That is to store 32bit instruction codes, up to 4096 instruction codes can be stored, and the dimension of 4096 is the address corresponding to the instruction codes. **In the process of actually transplanting to FPGA, it is necessary to pay attention to the resource capacity of the FPGA used and adjust the size appropriately** ``` module rom( input wire clk, input wire rst, input wire we_i,// write enable input wire[31:0] addr_i,// addr input wire[31:0] data_i, output reg[31:0] data_o// read data ); reg[31:0] _rom[0:4095];//rom total-1 always @ (posedge clk) begin if (we_i == 1'b1) begin _rom[addr_i[31:2]] <= data_i; end end always @ (*) begin if (rst == 1'b0) begin data_o = 32'h0; end else begin data_o = _rom[addr_i[31:2]]; end end endmodule ``` ### ex ![](https://i.imgur.com/oLbRve1.png) 1. Execute the corresponding operation according to the current instruction (addition, subtraction, multiplication, division, shift, etc.), such as the add instruction, add the value of register 1 to the value of register 2. 2. If it is a jump instruction, a jump signal is issued. 3. If it is a memory load command, read the memory data of the corresponding address. ``` always @ (*) begin//deal div inst div_dividend_o = reg1_rdata_i; div_divisor_o = reg2_rdata_i; div_op_o = fun3; div_reg_waddr_o = reg_waddr_i; if ((opcode == 7'b0110011) && (fun7 == 7'b0000001)) begin div_wenable = 1'b0; div_wdata = 32'h0; div_waddr = 32'h0; case (fun3) 3'b100, 3'b101, 3'b110, 3'b111: begin div_start = 1'b1; div_j_flag = 1'b1; div_h_flag = 1'b1; div_j_addr = op1_j_add_op2_j_res; end default: begin div_start = 1'b0; div_j_flag = 1'b0; div_h_flag = 1'b0; div_j_addr = 32'h0; end endcase end else begin div_j_flag = 1'b0; div_j_addr = 32'h0; if (div_busy_i == 1'b1) begin div_start = 1'b1; div_wenable = 1'b0; div_wdata = 32'h0; div_waddr = 32'h0; div_h_flag = 1'b1; end else begin div_start = 1'b0; div_h_flag = 1'b0; if (div_ready_i == 1'b1) begin div_wdata = div_result_i; div_waddr = div_reg_waddr_i; div_wenable = 1'b1; end else begin div_wenable = 1'b0; div_wdata = 32'h0; div_waddr = 32'h0; end end end end ``` ### Division (this took me several nights...) ![](https://i.imgur.com/XCOFbjh.png) ![](https://i.imgur.com/5UN5Ry9.jpg) I implemented this state machine with RTL and added it to the core. The biggest trouble I encountered in the middle was the control of interrupt, which would involve BUS, exe, and ctrl, because my model would use multiple cycles to complete the entire finite state machine Each division operation requires at least 39 clock cycles. **important:** During the operation of signed data, the complement of the negative number is inverted and one is added. The purpose of inverting and adding one is obvious: it is actually to convert all negative numbers into positive numbers for calculation (because the complement form of negative numbers has a sign bit, so it cannot be directly calculated), and the final calculated result must also be a positive number. Finally, according to the sign of the divisor and the dividend, the quotient is operated (that is, whether to invert and add one) ``` case(fsm_st) FSM_IDEL:begin if (start_i == 1'b1) begin op_r <= op_i; dividend <= dividend_i; divisor <= divisor_i; reg_waddr_o <=reg_waddr_o; fsm_st <= FSM_START; busy_o <= 1'b1; end else begin op_r<=3'h0; reg_waddr_o <=32'h0; dividend <=32'h0; divisor <=32'h0; ready_o <= 1'b0; result_o <=32'h0; busy_o <= 1'b0; end end FSM_START:begin if (start_i==1'b1) begin if (divisor==32'h0) begin if (div_op|divu_op) begin result_o<=32'hffffffff; end else begin result_o<=dividend; end ready_o <=1'b1 ; fsm_st <=FSM_IDEL; busy_o <=1'b0 ; end else begin busy_o <=1'h1 ; ct <=32'h40000000 ; fsm_st <=FSM_CALC ; div_result <=32'h0 ; div_remain <=32'h0 ; if (div_op|rem_op) begin if (dividend[31]==1'b1) begin dividend<=dividend_neg; minen<=dividend_neg[31]; end else begin minen<=dividend[31]; end if (divisor[31]==1'b1) begin divisor<=divisor_neg; end end else begin minen<=dividend[31]; end if ((div_op && (dividend[31] ^ divisor[31] == 1'b1))||(rem_op && (dividend[31] == 1'b1))) begin invert_result<=1'b1; end else begin invert_result<=1'b0; end end end else begin fsm_st<=FSM_IDEL; result_o<=32'h0; ready_o<=1'b0; busy_o<=1'b0; end end FSM_CALC:begin if (start_i==1'b1) begin dividend<={dividend[30:0], 1'b0}; div_result<=div_result_temp; ct<={1'b0,ct[31:1]}; if (|ct) begin minen<= {minen_temp[30:0], dividend[30]}; end else begin fsm_st<=FSM_END; if (minen_divisor) begin div_remain<=minen_sub_res; end else begin div_remain<=minen; end end end else begin fsm_st <= FSM_IDEL; result_o<= 32'h0; ready_o <= 1'b0; busy_o <= 1'b0; end end FSM_END:begin if (start_i==1'b1) begin ready_o<=1'b1; fsm_st<=FSM_IDEL; busy_o<=1'b0; if (div_op|divu_op) begin if (invert_result) begin result_o<=(-div_result); end else begin result_o<=div_result; end end else begin if (invert_result) begin result_o<=(-div_remain); end else begin result_o<=div_remain; end end end else begin fsm_st<=FSM_IDEL; result_o<=32'h0; ready_o<=1'b0; busy_o<=1'b0; end end endcase end end ``` ### Interrupt Interrupt type: External interrupts: interrupts generated by peripherals, interrupts that occur outside the processing core. Timer interrupt (one of the external interrupts): controlled by the mtie field in the mie register. Software interrupt: an interrupt triggered by the software (software language such as C language) itself. Debug Interrupt: Interrupt when Debugging. Interrupt masking: through the MIE register, to control different types of interrupt enable and mask (external interrupt, timer interrupt, software interrupt). ### ctrl ![](https://i.imgur.com/Wt9bnSL.png) A jump is to change the value of the PC register. And because whether to jump or not needs to be known at the execution stage, when a jump is required, the pipeline needs to be suspended ``` //hold_flag[7:0] module ctrl( input wire rst, // from ex input wire jump_flag_i, input wire[31:0] jump_addr_i, input wire hold_flag_ex_i, // from rib input wire hold_flag_rib_i, // from jtag input wire jtag_halt_flag_i, // from clint input wire hold_flag_clint_i, output reg[2:0] hold_flag_o, // to pc_reg output reg jump_flag_o, output reg[31:0] jump_addr_o ); always @ (*) begin jump_addr_o = jump_addr_i; jump_flag_o = jump_flag_i; hold_flag_o = 3'b000; if (jump_flag_i == 1'b1 || hold_flag_ex_i == 1'b1 || hold_flag_clint_i == 1'b1) begin hold_flag_o = 3'b011; end else if (hold_flag_rib_i == 1'b1) begin hold_flag_o = 3'b001; end else if (jtag_halt_flag_i == 1'b1) begin hold_flag_o = 3'b011; end else begin hold_flag_o = 3'b000; end end endmodule ``` ### csr_reg ![](https://i.imgur.com/hzkQhtJ.png) Machine Trap Vector=t_vec Machine Exception Cause Machine Exception PC Machine Status Write register: According to the last 12 bits of the write register address, store the data in the ex or clint module in the control and status register (CSR register). ``` always @(posedge clk) begin//what kind of reg we need to write if (rst==1'b1) begin t_vec<=32'h0; cause<=32'h0; epc<=32'h0; mei<=32'h0; status<=32'h0; scratch<=32'h0; end else begin if (we_i==1'b1) begin case (waddr_i[11:0]) 12'h305:begin tvec<=data_i; end 12'h342:begin cause<=data_i; end 12'h341:begin epc<=data_i; end 12'h304:begin mei<=data_i; end 12'h300:begin status<=data_i; end 12'h340:begin scratch<=data_i; end default:begin end endcase end else if(clint_we_i == 1'b1) begin case (clint_waddr_i[11:0]) 12'h305:begin t_vec<=clint_data_i; end 12'h342:begin cause<=clint_data_i; end 12'h341:begin epc<=clint_data_i; end 12'h304:begin mei<=clint_data_i; end 12'h300:begin status<=clint_data_i; end 12'h340:begin scratch<=clint_data_i; end default:begin end endcase end end end ``` Read register (combined logic): The address of the read register comes from the decoding id module, and the data read from the register is sent to the decoding id module (according to the last 12 bits of the read register address). The address of the read register comes from the interrupt clint module, and the data read from the register is sent to the clint module (according to the last 12 bits of the read register address). ``` assign global_int_en_o=(status[3]==1'b1)?1'b1:1'b0; assign clint_csr_mtvec = tvec ; assign clint_csr_mepc = epc ; assign clint_csr_mstatus = status ; ``` ### clint ![](https://i.imgur.com/68uC8hF.png) RISC-V interrupts are divided into two types, one is synchronous interrupts, that is, interrupts generated by ECALL, EBREAK and other instructions, and the other is asynchronous interrupts, that is, interrupts generated by peripherals such as GPIO and UART. When an interrupt (interrupt return) signal is detected, first suspend the entire pipeline, set the jump address as the interrupt entry address, then read and write the necessary CSR registers (mstatus, mepc, mcause, etc.), and wait until these CSR registers are read and written After that, the pipeline suspension is canceled, so that the processor can fetch instructions from the interrupt entry address and enter the interrupt service routine. 1. steps Synchronous interrupt > asynchronous interrupt > interrupt return 2. CSR register state machine jump Extract the address returned by the interrupt and the code that caused the interrupt, as well as the state jump of the CSR register. 3. Write CSR registers (status, epc, cause) First write the interrupt return address epc Write mstatus again to turn off the global interrupt Write the interrupt exception code to the mcause register Interrupt return, the global interrupt bit needs to be restored at the same time of return (status[3]=status[7]) 4. Send interrupt signal to ex int_assert_o: interrupt valid signal, when the signal is 1, start to run the interrupt handler. inst_flag_i: The interrupt flag signal of the timer interrupt. int_assert_o: interrupt valid signal, when the signal is 1, start to run the interrupt handler. ``` //def有限狀態機 localparam INT_IDLE = 4'b0001; localparam INT_SYNC = 4'b0010; localparam INT_ASYNC = 4'b0100; localparam INT_MERT = 4'b1000; //csr regs狀態定義 localparam CSR_IDEL =5'b00001; localparam CSR_STAT =5'b00010; localparam CSR_MEPC =5'b00100; localparam CSR_STMT =5'b01000; localparam CSR_CAUS =5'b10000; ``` always @(*) begin//控制中斷 if(rst==1'b0)begin int_st=INT_IDLE; end else begin if(inst_i==32'h73||inst_i==32'h00100073)begin if(div_started_i==1'b0)begin int_st=INT_SYNC; end else begin int_st=INT_IDLE; end end else if(int_flag_i!=8'h0&&global_int_en_i==1'b1)begin int_st=INT_ASYNC; end else if(inst_i==32'h30200073)begin int_st=INT_MERT; end else begin int_st=INT_IDLE; end end end ``` always @(posedge clk) begin//CSR有限狀態機控制 if (rst==1'b0) begin csr_st <=CSR_IDEL; cause <= 32'h0; int_addr <= 32'h0; end else begin case(csr_st) CSR_IDEL:begin if (int_st==INT_SYNC) begin csr_st<=CSR_MEPC; if (jump_flag_i==1'b1) begin int_addr<=jump_addr_i-4'h4; end else begin int_addr<=inst_addr_i; end case(inst_i) 32'h73:begin cause<=32'd11; end 32'h00100073:begin cause<=32'd3; end default:begin cause<=32'd10; end endcase end else if (int_st==INT_ASYNC) begin cause<=32'h80000004; csr_st<=CSR_MEPC; if (jump_flag_i==1'b1) begin int_addr<=jump_addr_i; end else if(div_started_i==1'b1) begin int_addr<=inst_addr_i-4'h4; end else begin int_addr<=inst_addr_i; end end else if (int_st==INT_MERT) begin csr_st<=CSR_STMT; end end CSR_STAT:begin csr_st<=CSR_STAT; end CSR_MEPC:begin csr_st<=CSR_CAUS; end CSR_STMT:begin csr_st<=CSR_IDEL; end CSR_CAUS:begin csr_st<=CSR_IDEL; end default:begin csr_st<=CSR_IDEL; end endcase end end ``` ``` //同步中斷&非同步中斷&發送訊號給EX模組 if (rst == 1'b0) begin we_o <= 1'b0; waddr_o <= 32'h0; data_o <= 32'h0; end else begin case(csr_st)//中斷返回地址會需要IMM_PC_ADDR+4 CSR_MEPC:begin we_o<=1'b1; waddr_o<={20'h0,12'h341}; data_o<=int_addr; end CSR_CAUS:begin we_o<=1'b1; waddr_o<={20'h0,12'h342}; data_o<=cause; end CSR_STAT:begin we_o<=1'b1; waddr_o<={20'h0,12'h300}; data_o<={csr_mstatus[31:4],1'b0,csr_mstatus[2:0]}; end CSR_STMT:begin we_o<=1'b1; waddr_o<={20'h0,12'h300}; data_o<={csr_mstatus[31:4],csr_mstatus[7],csr_mstatus[2:0]}; end case(csr_st) CSR_CAUS:begin int_assert_o<=1'b1; int_addr_o<=csr_mtvec; end CSR_STMT:begin int_assert_o<=1'b1; int_addr_o<=csr_mepc; end ``` ### registers regs ![](https://i.imgur.com/Nqk2en6.png) Temporary data storage for decoding and execution A register regs with a width of 32 bits and a depth of 32 bits is defined in the program. ``` reg[31:0] regs[0:32 - 1]; ``` 1.Write register: store the data in ex or jtag in the register regs ``` always @ (posedge clk) begin if (rst == 1'b1) begin if ((we_i == 1'b1) && (waddr_i != 5'h0)) begin regs[waddr_i] <= wdata_i; end else if ((jtag_we_i == 1'b1) && (jtag_addr_i != 5'h0)) begin regs[jtag_addr_i] <= jtag_data_i; end end end ``` 2.2. Read register (combination logic): The address of the read register comes from the decoding id module, and the data read from the register is sent to the decoding id module (regs). The address of the read register comes from the jtag module, and the data read from the register is sent to the jtag module (jtag read register). ``` always @ (*) begin//we==write_enable if (raddr1_i == 5'h0) begin r_data1_o = 32'h0; end else if (raddr1_i == waddr_i && we_i == 1'b1) begin r_data1_o = wdata_i; end else begin r_data1_o = regs[raddr1_i]; end end always @ (*) begin if (r_addr2_i == 5'h0) begin r_data2_o = 32'h0; end else if (r_addr2_i == waddr_i && we_i == 1'b1) begin r_data2_o = wdata_i; end else begin r_data2_o = regs[raddr2_i]; end end always @ (*) begin if (jtag_addr_i == 5'h0) begin jtag_data_o = 32'h0; end else begin jtag_data_o = regs[jtag_addr_i]; end end ``` --- Because of the pipeline, when the current instruction is in the execution stage, the next instruction is in the decoding stage. Since the register will not be written in the execution stage, but the register write operation will be performed when the next clock arrives. If the instruction in the decoding stage requires the result of the previous instruction, the value of the register read at this time is wrong. For example, the following two instructions: add x1, x2, x3, add x4, x1, x5 The second instruction depends on the result of the first instruction. To solve this problem, if the read register is equal to the write register, the value to be written is directly returned to the read operation. ### databus Assuming that a peripheral has an address bus and a data bus, and there are N peripherals in total, then the processor core has N address buses and N data buses, and each additional peripheral needs to be modified (the change is not small) ) core code. With the bus, the processor core only needs one address bus and one data bus, which greatly simplifies the connection between the processor core and peripherals. 1. Bus arbitration mechanism First, each host sends an access request req to the bus: we will perform the access priority of each foreign agency according to the demand ``` Select the main device according to the order of priority through if_else to perform the corresponding access operation. For the arbitration of the master device, the priority order of the master device is: uart serial port download, ex.v execution module, jtag module, pc_reg fetch module. ``` and why??? because 2. Download the uart program. Since the program needs to be updated, it does not matter which step the program executes. No need to consider other module requests, download directly, and re-run the new program (need to pause the pipeline) 3.ex.v execution module (memory read and write request) unless the new program code is re-downloaded. In the case that the program remains unchanged, it is necessary to ensure that the current instruction runs completely in order to ensure that subsequent operations will not go wrong (need to suspend the pipeline) 4. jtag module the previous step instruction is finished, The jtag debugging module can modify the debugging parameters, control the execution of the program, including the value, because during the debugging process, Setting a breakpoint may suspend the value operation (need to suspend the pipeline) 5. The pc_reg instruction fetch module It is the first step of all the above main equipment modules, serving the above "main equipment". ### riscv_bus ![](https://i.imgur.com/Egm3i8h.png) Select the slave device that needs to be operated through the case statement, and then pass the write_enable of the master to the slave to be written. The bus supports multi-master and multi-slave connections, but only supports one master and one slave communication at the same time. A fixed priority arbitration mechanism is adopted between each master device on the RIB bus. The highest 4 bits of the bus address determine which slave device to access, so up to 16 slave devices are supported. ``` master_addr_i[31:28] ``` ``` case (slave_needed) grant0: begin case (m0_addr_i[31:28]) slave_0: begin s0_we_o = m0_we_i; s0_addr_o = {{4'h0}, {m0_addr_i[27:0]}}; s0_data_o = m0_data_i; m0_data_o = s0_data_i; end slave_1: begin s1_we_o = m0_we_i; s1_addr_o = {{4'h0}, {m0_addr_i[27:0]}}; s1_data_o = m0_data_i; m0_data_o = s1_data_i; end slave_2: begin s2_we_o = m0_we_i; s2_addr_o = {{4'h0}, {m0_addr_i[27:0]}}; s2_data_o = m0_data_i; m0_data_o = s2_data_i; end slave_3: begin s3_we_o = m0_we_i; s3_addr_o = {{4'h0}, {m0_addr_i[27:0]}}; s3_data_o = m0_data_i; m0_data_o = s3_data_i; end slave_4: begin s4_we_o = m0_we_i; s4_addr_o = {{4'h0}, {m0_addr_i[27:0]}}; s4_data_o = m0_data_i; m0_data_o = s4_data_i; end slave_5: begin s5_we_o = m0_we_i; s5_addr_o = {{4'h0}, {m0_addr_i[27:0]}}; s5_data_o = m0_data_i; m0_data_o = s5_data_i; end default: begin end endcase end ``` ## perips ### [GPIO](http://wiki.csie.ncku.edu.tw/embedded/GPIO) ![](https://i.imgur.com/E1bTj5W.png) Every 2 bits control 1 IO mode, supporting up to 16 IOs 0: high impedance, 1: output, 2: input Step1: First design two registers: gpio_ctrl (control GPIO input and output mode); gpio_data (store GPIO input or output data). ``` reg[31:0] gpio_ctrl; reg[31:0] gpio_data; ``` Step2: Plan addresses for these two registers. ``` localparam CTRL = 4'h0; localparam DATA = 4'h4; ``` Step3: Through register addressing, write to the two registers defined above, and realize the input and output of GPIO by configuring the gpio_ctrl register. ``` always @ (posedge clk) begin if (rst == 1'b0) begin gpio_data <= 32'h0; gpio_ctrl <= 32'h0; end else begin if (we_i == 1'b1) begin case (addr_i[3:0]) CTRL: begin gpio_ctrl <= data_i; end DATA: begin gpio_data <= data_i; end endcase end else begin if (gpio_ctrl[1:0] == 2'b10) begin gpio_data[0] <= iopin_i[0]; end if (gpio_ctrl[3:2] == 2'b10) begin gpio_data[1] <= iopin_i[1]; end end end end always @ (*) begin if (rst == 1'b0) begin data_o = 32'h0; end else begin case (addr_i[3:0]) CTRL: begin data_o = gpio_ctrl; end DATA: begin data_o = gpio_data; end default: begin data_o = 32'h0; end endcase end end ``` Note that the following concepts need to be kept in mind when simulating TOP ``` gpio[0] = (gpio_ctrl[1:0] == 2'b01)? gpio_data[0]: 1'bz; ``` When the configuration register gpio_ctrl[1:0] is 1, it means that GPIO is in output mode, and gpio_data[0] is output to the corresponding IO port. If gpio_ctrl[1:0] is not 1, it is 0 or 2, corresponding to high Both resistive and input modes set the GPIO to a high-impedance state for the following reasons: High-impedance state is a common term in digital circuits. It refers to an output state of the circuit, which is neither high nor low. The impact is the same as not connected. If you use a multimeter to measure it, it may be high or low, depending on what is connected behind it. ### SPI [wiki](https://en.wikipedia.org/wiki/Serial_Peripheral_Interface) [youtube](https://www.youtube.com/watch?v=TR0Pw89EuGk) ![](https://i.imgur.com/sv4zSrv.png) The SPI protocol specifies 4 logical signal interfaces: SCLK (Serial Clock, will be issued by the master) MOSI (Master Out, Slave In) MISO (Master In, Slave Out) CS (Chip Select, because a master can communicate with several slaves, so CS is needed to select the slave to communicate with, and usually CS is enabled at low potential) ![](https://i.imgur.com/0PUKJd6.png) step1:set write_enable(we)always work ``` always @ (posedge clk) begin if (rst == 1'b0) begin en <= 1'b0; end else begin if (spi_ctrl[0] == 1'b1) begin en <= 1'b1; end else if (done == 1'b1) begin en <= 1'b0; end else begin en <= en; end end end ``` step2:cut_clk count the clk ``` always @ (posedge clk) begin if (rst == 1'b0) begin clk_cnt <= 9'h0; end else if (en == 1'b1) begin if (clk_cnt == div_cnt) begin clk_cnt <= 9'h0; end else begin clk_cnt <= clk_cnt + 1'b1; end end else begin clk_cnt <= 9'h0; end end ``` step3:count SPI_CLK ``` always @ (posedge clk) begin if (rst == 1'b0) begin spi_clk_cnt <= 5'h0; spi_clk_level <= 1'b0; end else if (en == 1'b1) begin if (clk_cnt == div_cnt) begin if (spi_clk_cnt == 5'd17) begin spi_clk_cnt <= 5'h0; spi_clk_level <= 1'b0; end else begin spi_clk_cnt <= spi_clk_cnt + 1'b1; spi_clk_level <= 1'b1; end end else begin spi_clk_level <= 1'b0; end end else begin spi_clk_cnt <= 5'h0; spi_clk_level <= 1'b0; end end ``` step4:write regs ``` always @ (posedge clk) begin if (rst == 1'b0) begin spi_ctrl <= 32'h0; spi_data <= 32'h0; spi_status <= 32'h0; end else begin spi_status[0] <= en; if (we_i == 1'b1) begin case (addr_i[3:0]) SPI_CTRL: begin spi_ctrl <= data_i; end SPI_DATA: begin spi_data <= data_i; end default: begin end endcase end else begin spi_ctrl[0] <= 1'b0; if (done == 1'b1) begin spi_data <= {24'h0, rdata}; end end end end ``` ### timer [youtube](https://www.youtube.com/watch?v=qQdZrY5mhkU) ![](https://i.imgur.com/nlAbdAo.png) Step1: Define three registers 1. Control register: CTRL=4'h0 2. Counting threshold register: VALUE=4'h4 3. Current count value register (readonly): COUNT=4'h8 Step2:regs read&write start ``` // counter always @ (posedge clk) begin if (rst == 1'b0) begin t_ct <= 32'h0; end else begin if (t_ctrl[0] == 1'b1) begin t_ct <= t_ct + 1'b1; if (t_ct >= t_val) begin t_ct <= 32'h0; end end else begin t_ct <= 32'h0; end end end ``` R&W ``` always @ (*) begin if (rst == 1'b0) begin data_o = 32'h0; end else begin case (addr_i[3:0]) VALUE: begin data_o = t_val; end CTRL: begin data_o = t_ctrl; end CT: begin data_o = t_ct; end default: begin data_o = 32'h0; end endcase end end ``` ``` always @ (posedge clk) begin if (rst == 1'b0) begin t_ctrl <= 32'h0; t_val <= 32'h0; end else begin if (we_i == 1'b1) begin case (addr_i[3:0]) CTRL: begin t_ctrl <= {data_i[31:3], (t_ctrl[2] & (~data_i[2])), data_i[1:0]}; end VALUE: begin t_val <= data_i; end endcase end else begin if ((t_ctrl[0] == 1'b1) && (t_ct >= t_val)) begin t_ctrl[0] <= 1'b0; t_ctrl[2] <= 1'b1; end end end end ``` ### uart ![](https://i.imgur.com/X74Nlt8.png) ![](https://i.imgur.com/AkzkzGV.png) [EXP](https://nandland.com/uart-serial-port-module/) 1. UART stands for Universal Asynchronous Receiver Transmitter. 2. Synchronous serial communication requires both communication parties to transmit data synchronously under the control of the same clock; asynchronous serial communication means that both communication parties use their own clocks to control the sending and receiving process of data. 3. A frame of data in the sending or receiving process of UART consists of 4 parts, start bit, data bit, parity bit and stop bit The rate of serial port communication is represented by baud rate, which represents the number of bits of binary data transmitted per second, and the unit is bps. then...TX sending ``` always @ (posedge clk) begin if (rst == 1'b0) begin FSM_ <= FSM_IDLE; cycle_count <= 16'd0; tx_reg <= 1'b0; bit_count <= 4'd0; tx_data_ready <= 1'b0; end else begin if (FSM_ == FSM_IDLE) begin tx_reg <= 1'b1; tx_data_ready <= 1'b0; if (tx_data_valid == 1'b1) begin FSM_ <= FSM_START; cycle_count <= 16'd0; bit_count <= 4'd0; tx_reg <= 1'b0; end end else begin cycle_count <= cycle_count + 16'd1; if (cycle_count == uart_baud[15:0]) begin cycle_count <= 16'd0; case (FSM_) FSM_START: begin tx_reg <= tx_data[bit_count]; FSM_ <= FSM_SEND_BYTE; bit_count <= bit_count + 4'd1; end FSM_SEND_BYTE: begin bit_count <= bit_count + 4'd1; if (bit_count == 4'd8) begin FSM_ <= FSM_STOP; tx_reg <= 1'b1; end else begin tx_reg <= tx_data[bit_count]; end end FSM_STOP: begin tx_reg <= 1'b1; FSM_ <= FSM_IDLE; tx_data_ready <= 1'b1; end endcase end end end end ``` RX reception (partial) ``` assign rx_neg_edge = rx_q1 && ~rx_q0; always @ (posedge clk) begin if (rst == 1'b0) begin rx_q0 <= 1'b0; rx_q1 <= 1'b0; end else begin rx_q0 <= rx_pin; rx_q1 <= rx_q0; end end always @ (posedge clk) begin if (rst == 1'b0) begin rx_start <= 1'b0; end else begin if (uart_ctrl[1]) begin if (rx_neg_edge) begin rx_start <= 1'b1; end else if (rx_clk_count == 4'd9) begin rx_start <= 1'b0; end end else begin rx_start <= 1'b0; end end end ``` Specific process: a. When sending idle (that is, not sending data), (according to the protocol) keep the sending end set to 1; when sending data is valid (C language writes the data to be sent to the register UART_TXDATA), the sending end sends the start bit 0 (a counting cycle) b. Control the counting threshold of the clock frequency division counter according to the agreed sending rate (baud rate), send data, first send the low bit and then send the high bit, after sending the data, set the sending end to 1, corresponding to the stop bit in the sequence; and update The corresponding bit of the receiving and sending status register UART_STATUS[0] <= 0; c. Wait for the next sending (that is, the next sending data valid signal) tips(from my friend) Since the input and output pins of the FPGA serial port are at TTL level, 3.3V is used to represent the logic"1", 0V represents logic "0"; while the computer serial port uses RS-232 level, which is a negative logic level. That is, -15V~-5V represents logic "1", and +5V~+15V represents logic "0". Therefore, when the computer communicates with the FPGA, it is necessary to add a level conversion chip ## SIM ![](https://i.imgur.com/bS4d420.png) ### test_all_inst.py find all bis files ``` import os import subprocess import sys def list_binfiles(path): files = [] list_dir = os.walk(path) for maindir, subdir, all_file in list_dir: for filename in all_file: apath = os.path.join(maindir, filename) if apath.endswith('.bin'): files.append(apath) return files ``` test all bin files ``` def main(): bin_files = list_binfiles(r'../tests/isa/generated') anyfail = False for file in bin_files: cmd = r'python new_nw.py' + ' ' + file + ' ' + 'inst.data' f = os.popen(cmd) r = f.read() f.close() if (r.find('TEST_PASS') != -1): print(file + ' nlnlsofun') else: print(file + '!!!關進熊熊監獄,因為你失敗了!!!') anyfail = True break if (anyfail == False): print('恨熊熊,你再水時數阿, All PASS...') if __name__ == '__main__': sys.exit(main()) ``` ### step2 new_nw.py turn bin files to mem files ``` cmd = r'python ../tools/Bin2Mem_CLI.py' + ' ' + sys.argv[1] + ' ' + sys.argv[2] f = os.popen(cmd) f.close() ``` compile rtl files ### step3 use Iverilog ``` def main(): rtl_dir = sys.argv[1] if rtl_dir != r'..': tb_file = r'/tb/compliance_test/cwwppb_soc_tb.v' else: tb_file = r'/tb/cwwppb_soc_tb.v' # iverilog process iverilog_cmd = ['iverilog'] ... ... ... ``` ## most of the trouble ### 1.define is not necessarily very convenient, the risc_v manual is your good friend When you need a lot of "types" of values, but the VALUE is the same, it will be very inconvenient when coding. You need to keep clicking on the prompt, and then you will be crazy. Why is a wire&reg designed like this ### 2.[latch](https://zh.wikipedia.org/zh-hant/%E9%94%81%E5%AD%98%E5%99%A8) As a novice, I have never learned logic design. I remember that I was stuck for 14 hours on the fourth day because of a wrong judgment. EXP ``` always @(al or b) begin if(al) q <= b; end ``` 1. In this "always" block, the if statement ensures that q takes the value of d only when al = 1. This program does not write the result when al = 0, so what happens when al = 0? The variable q retains its original value. 2. Improper use of case statements (where I am stuck) The case where the latch is generated occurs when the default item is missing when using the case statement. The function of the case statement is to assign different values to another signal (q in this example) when a signal (sel in this example) takes different values. Pay attention to the example on the left side of the figure below, such as sel=00, q takes the value of a, and sel=11, q takes the value of b. What is not clear in this example is: what value will q be assigned if sel takes on valuesother than 00 and 11? In the example on the left below, the program is written in Verilog HDL, that is, the default is to keep the original value of q, which will automatically generate a latch. ### 3. **vivado is super invincible and difficult to use**, modelsim can be very good for you to test normally. ### 4. Do not directly apply the board file as your project mode I use the board format starting with xca7100... to write my xdc file ![](https://i.imgur.com/S8KBm2a.png) ### 5. When designing a finite state machine, be sure to search for information from multiple sources When I was writing a division model, I had a big problem with my logic, because I was looking at a strange China websites' guide, until my NTU EE friend told me that I couldn't write it for 20,000 years (I was try to solve this with multiplication of divisors... super dumb) ### 6.J_TAG VS UART [here is the answer look it properly](https://www.quora.com/What-is-the-difference-between-JTAG-and-UART) ### 7.If you want to know something abuot a board go to read the datasheet! when generating the bitstream,I always thought the I got enought numbers of IO/ports untill I read the datasheet... [arty a7 100T](https://digilent.com/reference/programmable-logic/arty-a7/reference-manual) ## TEST C code ### GCC toolchains compare and try: [how to use them](https://hackmd.io/ADNQPiEFSPC_daP2GJ_sSQ) I took **riscv64-unknow-elf-gcc** as my tool first,but I found that bin file would be too big for our SOC,so I tried -Os as compile method &reduce size from linker useing [**strip**](https://zh.wikipedia.org/zh-tw/Strip_(Unix)),but sadly,the all faild,so I tried useing the toolchain for MCU([riscv-none-embed-](https://xpack.github.io/riscv-none-elf-gcc/)),that means I have to give up some systemcall on my C code to fit the toolchain:( ### set your toolchain put your toolchain in tools [download](https://gnu-mcu-eclipse.github.io/blog/2019/05/21/riscv-none-gcc-v8-2-0-2-2-20190521-released/) I took my [homework 1's](https://hackmd.io/h-6-OkMpRxOAIeN3ebF4Jg) C code as test code ### how to test it 1.test_all_isa go to sim folder and do this instruction ``` python test_all_isa.py ``` ![](https://i.imgur.com/WCD1Wzo.png) ![](https://i.imgur.com/Xex3A3m.png) 2.test C code go to sim folder and do this instruction ``` python sim.py ..\tests\example\simple\C_test.bin inst.data ``` cause I use riscv-none-embed-gcc as my tool on Windows,it means I have no need to use "newlib" and I abandon some systemcalls like printf(),but I have to say riscv-none-embed-gcc can deal with "newlib",it's just my personal chooice. ![](https://i.imgur.com/l4itbnE.png) If success you can see this on your computer: ![](https://i.imgur.com/qH7KwBC.png) ``` #include<stdio.h> #include"..\lib\utils.h" int main(){ int arr[]={20,1,0,2,1,16,1,3,2,1,2,17}; int height=12; int ans=trap(arr,height); if (ans == 141) set_test_pass(); else set_test_fail(); return 0; /*printf("%d\n",ans);*/ } int trap(int* height, int heightSize){ int maxh=0,maxhi; if(heightSize==0||heightSize==1) return 0; for(int i=0;i<heightSize;i++){ if(height[i]>maxh){ maxh=height[i]; maxhi=i; } } int water_l=0; int rain=0; for(int i=0;i<maxhi;i++){ if(height[i]>water_l){ water_l=height[i]; } rain+=water_l-height[i]; } water_l=0; for(int i=heightSize-1;i>maxhi;i--){ if(height[i]>water_l){ water_l=height[i]; } rain+=water_l-height[i]; } return rain; } ``` clips from C_test.c's dump file(Os as CFLAGS) ``` 000001d8 <trap>: 1d8: 00100793 li a5,1 1dc: 08b7fe63 bgeu a5,a1,278 <trap+0xa0> 1e0: 00000793 li a5,0 1e4: 00000693 li a3,0 1e8: 02b7c463 blt a5,a1,210 <trap+0x38> 1ec: 00000693 li a3,0 1f0: 00000793 li a5,0 1f4: 00000613 li a2,0 1f8: 0306cc63 blt a3,a6,230 <trap+0x58> 1fc: fff58593 addi a1,a1,-1 200: 00000693 li a3,0 204: 04b84863 blt a6,a1,254 <trap+0x7c> 208: 00078513 mv a0,a5 20c: 00008067 ret 210: 00279713 slli a4,a5,0x2 214: 00e50733 add a4,a0,a4 218: 00072703 lw a4,0(a4) 21c: 00e6d663 bge a3,a4,228 <trap+0x50> 220: 00078813 mv a6,a5 224: 00070693 mv a3,a4 228: 00178793 addi a5,a5,1 22c: fbdff06f j 1e8 <trap+0x10> 230: 00269713 slli a4,a3,0x2 234: 00e50733 add a4,a0,a4 238: 00072703 lw a4,0(a4) 23c: 00e65463 bge a2,a4,244 <trap+0x6c> 240: 00070613 mv a2,a4 244: 40e60733 sub a4,a2,a4 248: 00e787b3 add a5,a5,a4 24c: 00168693 addi a3,a3,1 250: fa9ff06f j 1f8 <trap+0x20> 254: 00259713 slli a4,a1,0x2 258: 00e50733 add a4,a0,a4 25c: 00072703 lw a4,0(a4) 260: 00e6d463 bge a3,a4,268 <trap+0x90> 264: 00070693 mv a3,a4 268: 40e68733 sub a4,a3,a4 26c: 00e787b3 add a5,a5,a4 270: fff58593 addi a1,a1,-1 274: f91ff06f j 204 <trap+0x2c> 278: 00000793 li a5,0 27c: f8dff06f j 208 <trap+0x30> ``` ### perips testing working... ## GITHUB I'm still working on my project... here is [Version 2.00](https://github.com/Chiwawachiwawa/cwwppb_RISCV_SOC) [vedio](https://youtu.be/3COX5TDpetI) ## arty-a7-100t testing(still working) 1.[what is xdc???](https://digilent.com/reference/programmable-logic/guides/vivado-xdc-file) 2.[refrence how do I write a xdc file](https://github.com/Digilent/digilent-xdc) we got a sucess on generate bit stream... Not in vain I slept less than six hours almost every day this month, and even dropped two courses QAQ after 14 hours I finaly deal the last problem... ![](https://i.imgur.com/Dm4m6Go.png) ### non-os booting I got a question that someone asked me how to boot a non-os machine, it is a great question. In risc-v offical datasheet # boot Linux(先暫定中文,完成後我會轉成英文的)(20230201) ### add eth_ip ### cwwppb_v1.02 bitstream ### PetaLinux 安裝過程請"務必要看datasheet"而不是按照網路上的奇怪教學,對好版本,安裝所需的函式庫 # 心得: If there are students who want to improve their own strength and are willing to spend time, this class is a blood push, super recommended, you never know where the teacher can push your limit, the teacher is also very serious in class, prepare The teaching materials are also very good, learning things is the second, and some values of the teacher are also worth learning. I was scolded by the teacher for a good sentence: "**Are you talking like an engineer? How can an engineer use it?" "Should" and "probably" are used to describe your thoughts**", in short, I think it is necessary to take this course to ensure that you can learn everything you want to learn!