Try   HackMD

Computer-Architecture term_project riscv SOC

pre-work

On the afternoon of the day when I chose the subject to do, I returned my two classes, because I knew that I would't have too much fun in the next month, and I needed a lot of time to complete this term_project.

sample

從零開始的RISC-V SoC架構設計與Linux核心運行 - 硬體篇

VIVADO

guide
Install VIVADO (please use the ML version, do not need to use the license).
Please use the version after 2020.
There is a problem of file corruption when downloading in 2018 (personal experience, re-running three times)

VSCODE

I use VSCODE to edit my code. (notepad is super bad)

install make on windows

follow this

fpga

ARTY_A7_100T
Be sure to use the above vivado version
Specification req:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

borad file(it got guidence)

References

ithome
bilibili well,I know learning how to write verilog on this web site sounds funny,but I'M VERY BAD IN english:(
datasheet It is very usefull,you don't have to remember the inst-type when this book is in your computer

exp from github

book1(this book is very usefull to learn some basic knoledge)MUST READ CH1-CH3 VERY CAREFULLY

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

book2
github exp
stackoverflow
A 5-STAGE_cpu code from my friend.(x86)
A NTU EE MASTER friend
A lot of money(fpga board is very expensive QQ).
handshakes's RTL

first of all

I made two versions on the CPU side (yes, you read that right
Since I don't know how to write verilog, I trained myself from scratch.
At the beginning, I wrote a 3-stage pipeline CPU that can execute some RV32I (except load, store types) instructions.

Then I tried to write a three-stage pipeline CPU, and tried the SOC, and successfully executed the test (RV32I) on modleSim.

Then came the final version, I tried to add division and multiplication instructions, and added some SOC peripherals, simulated by two top-level files, one is used to integrate all the components of the CPU, and the other is used to connect SOC&CPU

The instruction test can be successfully performed on windows, and the bitstream is successfully produced on vivado.
BUT!!!

My board hasn't arrived yet

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Alright, a little too much nonsense, just like the teacher said
"talk is meaningless,show me the code!"

The first version

Purpose:

Pass the instruction set test of risc-v RV32I

generated(take addi_test as example)

we turned this rv32i code to binary file,and use it to test our RTL code's instruction fetch,decode,exe,mem_op's ability.

generated/rv32ui-p-addi:     file format elf32-littleriscv


Disassembly of section .text.init:

00000000 <_start>:
   0:	00000d13          	li	s10,0
   4:	00000d93          	li	s11,0

00000008 <test_2>:
   8:	00000093          	li	ra,0
   c:	00008f13          	mv	t5,ra
  10:	00000e93          	li	t4,0
  14:	00200193          	li	gp,2
  18:	27df1c63          	bne	t5,t4,290 <fail>

0000001c <test_3>:
  1c:	00100093          	li	ra,1
  20:	00108f13          	addi	t5,ra,1
  24:	00200e93          	li	t4,2
  28:	00300193          	li	gp,3
  2c:	27df1263          	bne	t5,t4,290 <fail>

...

RTL introduce(cpu)

Determine the address of the command
pc_reg

module pc_reg(
    input wire      clk,
    input wire      rst,
    input wire[31:0] jump_addr_i,
    input wire jump_en,
    output reg[31:0] pc_o
);
always @(posedge clk) begin
    if (rst==1'b0) //set low/0 as neg
        pc_o<=32'b0;
    else if(jump_en)
        pc_o<=jump_addr_i;
    else
        pc_o<=pc_o+3'd4;
        
    end
endmodule

if_id

module if_id(
	input wire clk,
	input wire rst,
	input wire [31:0]  inst_i, 
	input wire hold_flag_i,
	input wire [31:0]  inst_addr_i,
	output wire[31:0]  inst_addr_o, 
	output wire[31:0]  inst_o

);	
reg rom_flag;

always @(posedge clk) begin
	if(!rst|hold_flag_i)
		rom_flag<=1'b0;
	else
		rom_flag<=1'b1;
end
assign inst_o=rom_flag?inst_i:`INST_NOP;//if flag==1 go rom
dff_set #(32)dff2(clk,rst,hold_flag_i,32'b0,inst_addr_i,inst_addr_o);

id

According to the opcode to determine which type of instruction.
Use the function code (fun3) (fun7) to determine which command to generate the corresponding signal to the next stage

`include "defines.v"

module id(
	//from if_id
	input wire[31:0] inst_i		   ,
	input wire[31:0] inst_addr_i   ,
		
	// to regs 
	output reg[4:0] rs1_addr_o	   ,
	output reg[4:0] rs2_addr_o	   ,
	// from regs
	input wire[31:0] rs1_data_i	   ,
	input wire[31:0] rs2_data_i	   ,
	
	//to id_ex
	output reg[31:0] inst_o		   ,
	output reg[31:0] inst_addr_o   ,
	output reg[31:0] op1_o		   ,
	output reg[31:0] op2_o		   ,
	output reg[4:0]  rd_addr_o	   ,
	output reg 		 reg_wen	   ,
	output reg[31:0] base_addr_o   ,
	output reg[31:0] addr_offset_o ,	
	
	//to mem read
	output reg       mem_rd_req_o  ,
	output reg[31:0] mem_rd_addr_o
);

	wire[6:0] opcode; 
	wire[4:0] rd	; 
	wire[2:0] func3	; 
	wire[4:0] rs1	;
	wire[4:0] rs2	;
	wire[6:0] func7	;
	wire[11:0]imm	;
	wire[4:0] shamt	;
	
	assign opcode = inst_i[6:0];
	assign rd 	  = inst_i[11:7];
	assign func3  = inst_i[14:12];
	assign rs1 	  = inst_i[19:15];
	assign rs2 	  = inst_i[24:20];
	assign func7  = inst_i[31:25];
	assign imm    = inst_i[31:20];
	assign shamt  = inst_i[24:20];

	always @(*)begin
		inst_o  	= inst_i;
		inst_addr_o = inst_addr_i;  
		
		case(opcode)
			`INST_TYPE_I:begin
				mem_rd_req_o  = 1'b0 ;
				mem_rd_addr_o = 32'b0;
				base_addr_o   = 32'b0;
				addr_offset_o = 32'b0;				
				case(func3)
					`INST_ADDI,`INST_SLTI,`INST_SLTIU,`INST_XORI,`INST_ORI,`INST_ANDI:begin
						rs1_addr_o = rs1;
						rs2_addr_o = 5'b0;
						op1_o 	   = rs1_data_i;
						op2_o      = {{20{imm[11]}},imm};
						rd_addr_o  = rd;
						reg_wen    = 1'b1;
					end
					`INST_SLLI,`INST_SRI:begin
						rs1_addr_o = rs1;
						rs2_addr_o = 5'b0;
						op1_o 	   = rs1_data_i;
						op2_o      = {27'b0,shamt};
						rd_addr_o  = rd;
						reg_wen    = 1'b1;					
					end
					default:begin
						rs1_addr_o = 5'b0;
						rs2_addr_o = 5'b0;
						op1_o 	   = 32'b0;
						op2_o      = 32'b0;
						rd_addr_o  = 5'b0;
						reg_wen    = 1'b0;						
					end
				endcase	
			end
			`INST_TYPE_R_M:begin
				mem_rd_req_o  = 1'b0 ;
				mem_rd_addr_o = 32'b0;
				base_addr_o   = 32'b0;
				addr_offset_o = 32'b0;				
				case(func3)
					`INST_ADD_SUB,`INST_SLT,`INST_SLTU,`INST_XOR,`INST_OR,`INST_AND:begin
						rs1_addr_o = rs1;
						rs2_addr_o = rs2;
						op1_o 	   = rs1_data_i;
						op2_o      = rs2_data_i;
						rd_addr_o  = rd;
						reg_wen    = 1'b1;
					end	
					`INST_SLL,`INST_SR:begin
						rs1_addr_o = rs1;
						rs2_addr_o = rs2;
						op1_o 	   = rs1_data_i;
						op2_o      = {27'b0,rs2_data_i[4:0]};
						rd_addr_o  = rd;
						reg_wen    = 1'b1;
					end	
					default:begin
						rs1_addr_o = 5'b0;
						rs2_addr_o = 5'b0;
						op1_o 	   = 32'b0;
						op2_o      = 32'b0;
						rd_addr_o  = 5'b0;
						reg_wen    = 1'b0;						
					end
				endcase				
			end
			`INST_TYPE_B:begin
				mem_rd_req_o  = 1'b0 ;
				mem_rd_addr_o = 32'b0;
				case(func3)
					`INST_BNE,`INST_BEQ,`INST_BLT,`INST_BGE,`INST_BLTU,`INST_BGEU:begin
						rs1_addr_o = rs1;
						rs2_addr_o = rs2;
						op1_o 	   = rs1_data_i;
						op2_o      = rs2_data_i;
						rd_addr_o  = 5'b0;
						reg_wen    = 1'b0;
						base_addr_o   = inst_addr_i;
						addr_offset_o = {{19{inst_i[31]}},inst_i[31],inst_i[7],inst_i[30:25],inst_i[11:8],1'b0};						
					end	
					default:begin
						rs1_addr_o = 5'b0;
						rs2_addr_o = 5'b0;
						op1_o 	   = 32'b0;
						op2_o      = 32'b0;
						rd_addr_o  = 5'b0;
						reg_wen    = 1'b0;	
						base_addr_o   = 32'b0;
						addr_offset_o = 32'b0;							
					end
				endcase
			end
			`INST_TYPE_L:begin	
				case(func3)
					`INST_LW,`INST_LH,`INST_LB,`INST_LHU,`INST_LBU:begin
						mem_rd_req_o	= 1'b1 ;
						mem_rd_addr_o 	= rs1_data_i + {{20{imm[11]}},imm};
						rs1_addr_o  	= rs1;
						rs2_addr_o  	= 5'b0;
						op1_o 	    	= 32'b0;
						op2_o       	= 32'b0;
						rd_addr_o   	= rd;
						reg_wen     	= 1'b1;	
						base_addr_o   	= rs1_data_i;
						addr_offset_o 	= {{20{imm[11]}},imm};						
					end
					default:begin
						mem_rd_req_o	= 1'b0  ;
						mem_rd_addr_o 	= 32'b0 ;
						rs1_addr_o  	= 5'b0	;
						rs2_addr_o  	= 5'b0	;
						op1_o 	    	= 32'b0	;
						op2_o       	= 32'b0	;
						rd_addr_o   	= 5'b0	;
						reg_wen     	= 1'b0	;					
					end
				endcase
			end
			`INST_TYPE_S:begin
				case(func3)
					`INST_SW,`INST_SH,`INST_SB:begin
						mem_rd_req_o	= 1'b0  		;
						mem_rd_addr_o 	= 32'b0 		;
						rs1_addr_o  	= rs1			;
						rs2_addr_o  	= rs2			;
						op1_o 	    	= 32'b0			;
						op2_o       	= rs2_data_i	;
						rd_addr_o   	= 5'b0			;
						reg_wen     	= 1'b0			;
						base_addr_o     = rs1_data_i	;
						addr_offset_o   = {{20{inst_i[31]}},inst_i[31:25],inst_i[11:7]};						
					end
					default:begin
						mem_rd_req_o	= 1'b0  ;
						mem_rd_addr_o 	= 32'b0 ;
						rs1_addr_o  	= 5'b0	;
						rs2_addr_o  	= 5'b0	;
						op1_o 	    	= 32'b0	;
						op2_o       	= 32'b0	;
						rd_addr_o   	= 5'b0	;
						reg_wen     	= 1'b0	;
						base_addr_o     = 32'b0;
						addr_offset_o   = 32'b0;						
					end
				endcase
			end
			`INST_JAL:begin
				mem_rd_req_o  = 1'b0 ;
				mem_rd_addr_o = 32'b0;			
				rs1_addr_o 	  = 5'b0;
				rs2_addr_o 	  = 5'b0;
				op1_o 	   	  = inst_addr_i;
				op2_o      	  = 32'h4;
				rd_addr_o  	  = rd;
				reg_wen    	  = 1'b1;
				base_addr_o     = inst_addr_i;
				addr_offset_o   = {{12{inst_i[31]}}, inst_i[19:12], inst_i[20], inst_i[30:21], 1'b0};				
			end
			`INST_LUI:begin
				mem_rd_req_o	= 1'b0 ;
				mem_rd_addr_o 	= 32'b0;			
				rs1_addr_o 		= 5'b0;
				rs2_addr_o 		= 5'b0;
				op1_o 	   		= {inst_i[31:12],12'b0};
				op2_o      		= 32'b0;
				rd_addr_o  		= rd;
				reg_wen    		= 1'b1;
				base_addr_o     = 32'b0;
				addr_offset_o   = 32'b0;			
			end	
			`INST_JALR:begin
				mem_rd_req_o	= 1'b0 ;
				mem_rd_addr_o 	= 32'b0;			
				rs1_addr_o 		= rs1;
				rs2_addr_o 		= 5'b0;
				op1_o 	   		= inst_addr_i;
				op2_o      		= 32'h4;
				rd_addr_o  		= rd;
				reg_wen    		= 1'b1;
				base_addr_o     = rs1_data_i;
				addr_offset_o   = {{20{imm[11]}},imm};				
			end
			`INST_AUIPC:begin
				mem_rd_req_o	= 1'b0 ;
				mem_rd_addr_o 	= 32'b0;			
				rs1_addr_o 		= 5'b0;
				rs2_addr_o 		= 5'b0;
				op1_o 	   		= {inst_i[31:12],12'b0};
				op2_o      		= inst_addr_i;
				rd_addr_o  		= rd;
				reg_wen    		= 1'b1;
				base_addr_o     = 32'b0;
				addr_offset_o   = 32'b0;				
			end
			default:begin
				mem_rd_req_o	= 1'b0 ;
				mem_rd_addr_o	= 32'b0;			
				rs1_addr_o 		= 5'b0;
				rs2_addr_o 		= 5'b0;
				op1_o 	   		= 32'b0;
				op2_o      		= 32'b0;
				rd_addr_o  		= 5'b0;
				reg_wen    		= 1'b0;
				base_addr_o     = 32'b0;
				addr_offset_o   = 32'b0;				
			end
		endcase
	end
endmodule

id_ex

module id_ex(
	input wire clk,
	input wire rst,
	//from id
	input wire[31:0] inst_i,
	input wire[31:0] inst_addr_i,
	input wire[31:0] op1_i,	
	input wire[31:0] op2_i,
	input wire[4:0] rd_addr_i,	
	input wire 		reg_wen_i,
	input wire[31:0] base_addr_i,
	input wire[31:0] addr_offset_i,
	//from ctrl
	input wire hold_flag_i,
	//to ex
	output wire[31:0] inst_o,
	output wire[31:0] inst_addr_o,
	output wire[31:0] op1_o,	
	output wire[31:0] op2_o,
	output wire[4:0]  rd_addr_o,
	output wire[31:0] base_addr_o,	
	output wire 	  reg_wen_o,
	output wire[31:0] addr_offset_o	
);

dff_set #(32) dff1(clk,rst,hold_flag_i,`INST_NOP,inst_i,inst_o);
dff_set #(32) dff2(clk,rst,hold_flag_i,32'b0,inst_addr_i,inst_addr_o);
dff_set #(32) dff3(clk,rst,hold_flag_i,32'b0,op1_i,op1_o);
dff_set #(32) dff4(clk,rst,hold_flag_i,32'b0,op2_i,op2_o);
dff_set #(5) dff5(clk,rst,hold_flag_i,5'b0,rd_addr_i,rd_addr_o);
dff_set #(1) dff6(clk,rst,hold_flag_i,1'b0,reg_wen_i,reg_wen_o);
dff_set #(32) dff7(clk,rst,hold_flag_i,32'b0,base_addr_i,base_addr_o);
dff_set #(32) dff8(clk,rst,hold_flag_i,32'b0,addr_offset_i,addr_offset_o);

endmodule

ex

I started by putting calculations into logic without prior announcement, and then my friends said I was stupid XD

I later verified that this would slow down the overall CPU performance

module ex(
	//from id_ex
	input wire[31:0] inst_i,	
	input wire[31:0] inst_addr_i,
	input wire[31:0] op1_i,
	input wire[31:0] op2_i,
	input wire[4:0]  rd_addr_i,
	input wire 	rd_wen_i,
	input wire[31:0] base_addr_i,
	input wire[31:0] addr_offset_i,	
	//to regs
	output reg[4:0] rd_addr_o,
	output reg[31:0]rd_data_o,
	output reg rd_wen_o,
	//to ctrl
	output reg[31:0]jump_addr_o,
	output reg  jump_en_o,
	output reg  hold_flag_o,
	//to mem write
	output reg mem_wr_req_o,
	output reg[3:0] mem_wr_sel_o,
	output reg[31:0]mem_wr_addr_o,
	output reg[31:0]mem_wr_data_o,
	//from memread
	input wire[31:0]mem_rd_data_i
);
    wire[6:0] opcode;
	wire[4:0] rd;
	wire[2:0] func3;
	wire[4:0] rs1;
    wire[4:0] rs2;
    wire[6:0] func7;
	wire[11:0] imm;
	wire[4:0] shamt;

	assign opcode=inst_i[6:0];
	assign rd=inst_i[11:7];
	assign func3 =inst_i[14:12];
    assign func7 =inst_i[31:25];
	assign rs1=inst_i[19:15];
    assign rs2=inst_i[24:20];
	assign imm=inst_i[31:20];
	assign shamt=inst_i[24:20];
    //branch
    //wire[31:0] jump_imm={{19{inst_i[31]}},inst_i[31],inst_i[7],inst_i[30:25],inst_i[11:8],1'b0};
    wire op1_i_equal_op2_i;
	wire op1_i_less_op2_i_signed;
	wire op1_i_less_op2_i_unsigned;

	assign	   op1_i_less_op2_i_signed = ($signed(op1_i) < $signed(op2_i))?1'b1:1'b0;
	assign	   op1_i_less_op2_i_unsigned = (op1_i < op2_i)?1'b1:1'b0;
	assign	   op1_i_equal_op2_i = (op1_i == op2_i)?1'b1:1'b0;
	// logic units
	wire[31:0] op1_i_add_op2_i;
	wire[31:0] op1_i_and_op2_i;
	wire[31:0] op1_i_xor_op2_i;
	wire[31:0] op1_i_or_op2_i;
	wire[31:0] op1_i_shift_left_op2_i;
	wire[31:0] op1_i_shift_right_op2_i;
	wire[31:0] base_addr_add_addr_offset;

	assign op1_i_add_op2_i=op1_i+op2_i;
	assign op1_i_and_op2_i=op1_i&op2_i;
	assign op1_i_xor_op2_i=op1_i^op2_i;
	assign op1_i_or_op2_i=op1_i|op2_i;
	assign op1_i_shift_left_op2_i=op1_i<<op2_i;
	assign op1_i_shift_right_op2_i=op1_i>>op2_i;
	assign base_addr_add_addr_offset=base_addr_i+addr_offset_i;
	// type I
	wire[31:0] SRA_mask;
	assign 	   SRA_mask = (32'hffff_ffff) >> op2_i[4:0];
	wire[1:0]store_index = base_addr_add_addr_offset[1:0];
	wire[1:0]load_index = base_addr_add_addr_offset[1:0];
`INST_TYPE_I:begin
                        jump_addr_o=32'b0;//write wen
                        jump_en_o=1'b0;
                        hold_flag_o=1'b0;
						mem_wr_req_o=1'b0;
						mem_wr_sel_o=4'b0;
						mem_wr_addr_o=32'b0;
						mem_wr_data_o=32'b0;
                    case(func3)
                        `INST_ADDI:begin//same instruction structure
                            rd_data_o=op1_i_add_op2_i;
                            rd_addr_o=rd_addr_i;
                            rd_wen_o=1'b1;
                        end
						`INST_SLTI:begin
							rd_data_o={30'b0,op1_i_less_op2_i_signed};
                            rd_addr_o=rd_addr_i;
                            rd_wen_o=1'b1;
						end
						`INST_SLTIU:begin
							rd_data_o={30'b0,op1_i_less_op2_i_unsigned};
                            rd_addr_o=rd_addr_i;
                            rd_wen_o=1'b1;
						end
						`INST_XORI:begin
							rd_data_o=op1_i_xor_op2_i;
                            rd_addr_o=rd_addr_i;
                            rd_wen_o=1'b1;
						end
						`INST_ORI:begin
							rd_data_o=op1_i_or_op2_i;
                            rd_addr_o=rd_addr_i;
                            rd_wen_o=1'b1;
						end
						`INST_ANDI:begin
							rd_data_o= op1_i_and_op2_i;
                            rd_addr_o=rd_addr_i;
                            rd_wen_o=1'b1;
						end
						`INST_SLLI:begin
							rd_data_o=op1_i_shift_left_op2_i;
                            rd_addr_o=rd_addr_i;
                            rd_wen_o=1'b1;
						end
						`INST_SRI:begin
							if (func7[5]==1'b1) begin //SRAI (only 5 bit limit)
								rd_data_o=((op1_i_shift_right_op2_i) & SRA_mask) | ({32{op1_i[31]}} & (~SRA_mask));
                            	rd_addr_o=rd_addr_i;
                            	rd_wen_o=1'b1;
							end
							else begin//SRLI
								rd_data_o=op1_i_shift_right_op2_i;
                            	rd_addr_o=rd_addr_i;
                            	rd_wen_o=1'b1;
							end
						end

                    default:begin
                        rd_data_o=32'b0;
                        rd_addr_o=5'b0;
                        rd_wen_o=1'b0;
                    end 
                    endcase
                end

ram
As a first version external device.

module ram(
       input  wire clk,
	   input  wire rst,
       input  wire [3:0] wen,
	   input  wire [32-1:0]w_addr_i,
	   input  wire [32-1:0]w_data_i,
	   input  wire ren,
	   input  wire [32-1:0]r_addr_i,
       output wire [32-1:0]r_data_o	
);
	wire[11:0] w_addr = w_addr_i[13:2];
	wire[11:0] r_addr = r_addr_i[13:2];

	dual_ram #(
		 .DW(8),
		 .AW(12),
		 .MEM_NUM(4096)
	)
	ram_byte0
	(
       .clk (clk ),
	   .rst (rst ),
       .wen (wen[0] ),
	   .w_addr_i (w_addr ),
	   .w_data_i (w_data_i[7:0] ),
	   .ren (ren ),
	   .r_addr_i (r_addr ),
	   .r_data_o (r_data_o[7:0] )
	);
	dual_ram #(
		 .DW(8),
		 .AW(12),
		 .MEM_NUM(4096)
	)
	ram_byte1
	(
       .clk		 	(clk        	),
	   .rst		 	(rst        	),
       .wen		 	(wen[1]			),
	   .w_addr_i 	(w_addr 		),
	   .w_data_i 	(w_data_i[15:8]	),
	   .ren		 	(ren			),
	   .r_addr_i 	(r_addr 		),
	   .r_data_o 	(r_data_o[15:8]	)
	);	
	dual_ram #(
		 .DW(8),
		 .AW(12),
		 .MEM_NUM(4096)
	)
	ram_byte2
	(
       .clk		 	(clk        	),
	   .rst		 	(rst        	),
       .wen		 	(wen[2]			),
	   .w_addr_i 	(w_addr 		),
	   .w_data_i 	(w_data_i[23:16]),
	   .ren		 	(ren			),
	   .r_addr_i 	(r_addr 		),
	   .r_data_o 	(r_data_o[23:16])
	);
	dual_ram #(
		 .DW(8),
		 .AW(12),
		 .MEM_NUM(4096)
	)
	ram_byte3
	(
       .clk		 	(clk        	),
	   .rst		 	(rst        	),
       .wen		 	(wen[3]			),
	   .w_addr_i 	(w_addr 		),
	   .w_data_i 	(w_data_i[31:24]),
	   .ren		 	(ren			),
	   .r_addr_i 	(r_addr 		),
	   .r_data_o 	(r_data_o[31:24])
	);	
	
endmodule

rom
As a first version external device.

module rom(
	input wire clk,
	input wire rst,
	input wire wen,
	input wire[32-1:0]	w_addr_i,
	input wire[32-1:0]  w_data_i,
	input wire ren,
	input wire[32-1:0]	r_addr_i,
	output wire[32-1:0]  r_data_o	
);
wire[11:0] w_addr = w_addr_i[13:2];
wire[11:0] r_addr = r_addr_i[13:2];
dual_ram#( 
	.DW(32),
	.AW(12),
	.MEM_NUM(4096)
)
rom_32bit(
	.clk(clk),
	.rst(rst),
	.wen(wen),
	.w_addr_i(w_addr),
	.w_data_i(w_data_i),
	.ren(ren),
	.r_addr_i(r_addr),
	.r_data_o(r_data_o)	
);
endmodule

ctrl
Control the jump of B_TYPE instruction

module ctrl (
    input wire[31:0]jump_addr_i,
    input wire jump_en_i,
    input wire hold_flag_ex_i,
    output reg[31:0]jump_addr_o,
    output reg jump_en_o,
    output reg hold_flag_o

);
	always @(*)begin
		jump_addr_o = jump_addr_i;
		jump_en_o   = jump_en_i; 
		if( jump_en_i || hold_flag_ex_i)begin 
			hold_flag_o = 1'b1;
		end
		else begin
			hold_flag_o = 1'b0;
		end
	end
endmodule

risc-v top

connect each module.

module open_risc_v(
	input   wire 		  clk   		 	,
	input   wire 		  rst		     	, 
	//inst
	input   wire [31:0]   inst_i		 	,
	output  wire [31:0]   inst_addr_o	 	,
	//read  mem
	output  wire 	   	  mem_rd_req_o		,
	output  wire [31:0]   mem_rd_addr_o		,
	input	wire [31:0]	  mem_rd_data_i		,	
	//write mem
	output  wire	 	  mem_wr_req_o		,
	output  wire  [3:0]   mem_wr_sel_o		,
	output  wire  [31:0]  mem_wr_addr_o		,
	output  wire  [31:0]  mem_wr_data_o		
);
	//pc to rom
	wire[31:0] pc_reg_pc_o;
	assign inst_addr_o = pc_reg_pc_o;
	
	//if to if_id
	wire[31:0] if_inst_addr_o;
	wire[31:0] if_inst_o;	
	
	// if_id to id
	wire[31:0] if_id_inst_addr_o;
	wire[31:0] if_id_inst_o;	
	
	//ex to regs
	wire[4:0]  ex_rd_addr_o;
	wire[31:0] ex_rd_data_o;
	wire       ex_reg_wen_o;

	//id to regs
	wire[4:0] id_rs1_addr_o;
	wire[4:0] id_rs2_addr_o;
	
	//id to id_ex
	wire[31:0] id_inst_o;
	wire[31:0] id_inst_addr_o;
	wire[31:0] id_op1_o;
	wire[31:0] id_op2_o;
	wire[4:0]  id_rd_addr_o;
	wire       id_reg_wen;
	wire[31:0] id_base_addr_o	;
	wire[31:0] id_addr_offset_o	;

	//regs to id
	wire[31:0] regs_reg1_rdata_o;
	wire[31:0] regs_reg2_rdata_o;
	
	
	//id_ex to ex
	wire[31:0] id_ex_inst_o;
	wire[31:0] id_ex_inst_addr_o;
	wire[31:0] id_ex_op1_o;
	wire[31:0] id_ex_op2_o;
	wire[4:0]  id_ex_rd_addr_o;
	wire       id_ex_reg_wen;
	wire[31:0] id_ex_base_addr_o  ;
	wire[31:0] id_ex_addr_offset_o;	
	
	//ex  to ctrl
	wire[31:0] ex_jump_addr_o;
	wire  	   ex_jump_en_o;
	wire 	   ex_hold_flag_o;
	//ctrl to pc_reg
	wire[31:0] ctrl_jump_addr_o;
	wire  	   ctrl_jump_en_o;
	//ctrl to if_id id_ex
	wire 	   ctrl_hold_flag_o;		
	
	
	pc_reg pc_reg_inst(
		.clk			(clk			 	),
		.rst			(rst			 	),
		.jump_addr_i	(ctrl_jump_addr_o	), 
		.jump_en		(ctrl_jump_en_o  	),		
		.pc_o   		(pc_reg_pc_o     	)
	);
	
	

	if_id if_id_inst(
		.clk			(clk		      	),
		.rst			(rst		      	),
		.hold_flag_i	(ctrl_hold_flag_o 	),
		.inst_i			(inst_i        	  	),  
		.inst_addr_i	(pc_reg_pc_o   	  	),  
		.inst_addr_o	(if_id_inst_addr_o	), 
		.inst_o         (if_id_inst_o	  	)
	);
	
	//id to rom

	
	id id_inst(
		.inst_i			(if_id_inst_o		),
		.inst_addr_i	(if_id_inst_addr_o	),
		.rs1_addr_o		(id_rs1_addr_o		),
		.rs2_addr_o		(id_rs2_addr_o		),
		.rs1_data_i		(regs_reg1_rdata_o	),
		.rs2_data_i		(regs_reg2_rdata_o	),
		.inst_o			(id_inst_o			),
		.inst_addr_o	(id_inst_addr_o		),	
		.op1_o			(id_op1_o			),	
		.op2_o			(id_op2_o			),
		.rd_addr_o		(id_rd_addr_o		),	
		.reg_wen        (id_reg_wen			),
		.base_addr_o    (id_base_addr_o		),
		.addr_offset_o  (id_addr_offset_o	),		
		.mem_rd_req_o	(mem_rd_req_o		),
		.mem_rd_addr_o	(mem_rd_addr_o		)	
		);


	
	regs regs_inst(
		.clk			(clk				),
		.rst			(rst				),
		.reg1_raddr_i	(id_rs1_addr_o		),
		.reg2_raddr_i	(id_rs2_addr_o		), 
		.reg1_rdata_o	(regs_reg1_rdata_o	),
		.reg2_rdata_o	(regs_reg2_rdata_o	),
		.reg_waddr_i	(ex_rd_addr_o		),
		.reg_wdata_i	(ex_rd_data_o		),
		.reg_wen        (ex_reg_wen_o		)
	);
	

	id_ex id_ex_inst(
		.clk			(clk				),
		.rst			(rst				),
		.hold_flag_i	(ctrl_hold_flag_o 	),
		.inst_i			(id_inst_o			),
		.inst_addr_i	(id_inst_addr_o		),
		.op1_i			(id_op1_o			),	
		.op2_i			(id_op2_o			),
		.rd_addr_i		(id_rd_addr_o		),	
		.reg_wen_i		(id_reg_wen			),
		.base_addr_i    (id_base_addr_o		),
		.addr_offset_i  (id_addr_offset_o	),
		.inst_o			(id_ex_inst_o		),
		.inst_addr_o    (id_ex_inst_addr_o	),
		.op1_o			(id_ex_op1_o		),		
		.op2_o			(id_ex_op2_o		),
		.rd_addr_o		(id_ex_rd_addr_o	),	
		.reg_wen_o		(id_ex_reg_wen		),
		.base_addr_o   	(id_ex_base_addr_o	),
		.addr_offset_o 	(id_ex_addr_offset_o)
		);
	


	ex ex_inst(
		.inst_i			(id_ex_inst_o		),	
		.inst_addr_i	(id_ex_inst_addr_o	),
		.op1_i			(id_ex_op1_o		),
		.op2_i			(id_ex_op2_o		),
		.rd_addr_i		(id_ex_rd_addr_o	),
		.rd_wen_i		(id_ex_reg_wen		),
		.base_addr_i	(id_ex_base_addr_o  ),
		.addr_offset_i	(id_ex_addr_offset_o),		
		.rd_addr_o		(ex_rd_addr_o		),
		.rd_data_o		(ex_rd_data_o		),	
		.rd_wen_o       (ex_reg_wen_o		),
		.jump_addr_o	(ex_jump_addr_o		),		
		.jump_en_o		(ex_jump_en_o		),		
		.hold_flag_o	(ex_hold_flag_o		),		
		.mem_wr_req_o	(mem_wr_req_o		),
		.mem_wr_sel_o	(mem_wr_sel_o		),
		.mem_wr_addr_o	(mem_wr_addr_o		),
		.mem_wr_data_o	(mem_wr_data_o		),	
		.mem_rd_data_i	(mem_rd_data_i		)	
	);
	
	
	
	
	ctrl ctrl_inst(
		.jump_addr_i	(ex_jump_addr_o		),
		.jump_en_i		(ex_jump_en_o		),
		.hold_flag_ex_i	(ex_hold_flag_o		),
		.jump_addr_o	(ctrl_jump_addr_o	),
		.jump_en_o		(ctrl_jump_en_o		),
		.hold_flag_o	(ctrl_hold_flag_o	)	
	);

endmodule

dual_ram

module dual_ram #( 
	parameter DW = 32,
	parameter AW = 12,
	parameter MEM_NUM = 4096
)
(
	input wire clk,
	input wire rst,
	input wire wen,
	input wire[AW-1:0]	w_addr_i,
	input wire[DW-1:0]  w_data_i,
	input wire ren,
	input wire[AW-1:0]	r_addr_i,
	output wire[DW-1:0]  r_data_o	
);	
	
	
	wire[DW-1:0] r_data_wire	;	
	reg 		 rd_equ_wr_flag	;
	reg[DW-1:0]	 w_data_reg		;
	
	assign r_data_o = (rd_equ_wr_flag) ? w_data_reg : r_data_wire;
	
	always @(posedge clk)begin
		if(!rst)
			w_data_reg <= 'b0;
		else
			w_data_reg <= w_data_i;
	end
	
	//switch
	always @(posedge clk)begin
		if(rst && wen && ren && w_addr_i == r_addr_i )
			rd_equ_wr_flag <= 1'b1;
		else if(rst && ren)
			rd_equ_wr_flag <= 1'b0;
	end
		

	dual_ram_template #(
		.DW (DW),
		.AW (AW),
		.MEM_NUM (MEM_NUM)
	)dual_ram_template_isnt
	(
		.clk(clk),
		.rst(rst),
		.wen(wen),
		.w_addr_i(w_addr_i	),
		.w_data_i(w_data_i	),
		.ren(ren),
		.r_addr_i(r_addr_i	),
		.r_data_o(r_data_wire)
	);

endmodule




module dual_ram_template #(
	parameter DW = 32,
	parameter AW = 12,
	parameter MEM_NUM = 4096
)
(
	input wire clk,
	input wire rst,
	input wire wen,
	input wire[AW-1:0] w_addr_i,
	input wire[DW-1:0] w_data_i,
	input wire ren,
	input wire[AW-1:0]	r_addr_i,
	output reg[DW-1:0]  r_data_o
);
	reg[DW-1:0] memory[0:MEM_NUM-1];
	
	
	
	always @(posedge clk)begin
		if(rst && ren)
			r_data_o <= memory[r_addr_i];
	end
	
	always @(posedge clk)begin
		if(rst && wen)
			memory[w_addr_i] <= w_data_i;
	end
endmodule

top_soc

Connect each module of SOC.

module open_risc_v_soc(
	input wire clk,
	input wire rst,
	input wire uart_rxd,
	input wire debug_button,
	output wire led_debug,
	output wire led2
);

	// open_risc_v to rom 
	wire[31:0] open_risc_v_inst_addr_o;
	//rom to open_risc_v
	wire[31:0] rom_inst_o;
	
	// open_risc_v to ram
	wire       open_risc_v_mem_wr_req_o ;
	wire[3:0]  open_risc_v_mem_wr_sel_o ;
	wire[31:0] open_risc_v_mem_wr_addr_o;
	wire[31:0] open_risc_v_mem_wr_data_o;
	wire 	   open_risc_v_mem_rd_req_o ;
	wire[31:0] open_risc_v_mem_rd_addr_o;
	//ram to open_risc_v
	wire[31:0] ram_rd_data_o;
	//uart_debug to rom
	
	wire 		uart_debug_ce;
	wire 		uart_debug_wen;
	wire[31:0]  uart_debug_addr_o;
	wire[31:0]  uart_debug_data_o;	
	
	//debug_button_debounce to debug
	wire debug;
		
	debug_button_debounce debug_button_debounce_inst(
		.clk(clk),
		.rst(rst),
		.debug_button(debug_button),
		.debug(debug),
		.led_debug(led_debug)
		
	);
	
	
	open_risc_v open_risc_v_inst(
		.clk(clk),
		.rst(rst),
		.inst_i(rom_inst_o),
		.inst_addr_o(open_risc_v_inst_addr_o),
		.mem_rd_req_o(open_risc_v_mem_rd_req_o),
		.mem_rd_addr_o(open_risc_v_mem_rd_addr_o),
		.mem_rd_data_i(ram_rd_data_o),
		.mem_wr_req_o(open_risc_v_mem_wr_req_o),
		.mem_wr_sel_o(open_risc_v_mem_wr_sel_o),
		.mem_wr_addr_o(open_risc_v_mem_wr_addr_o),
		.mem_wr_data_o(open_risc_v_mem_wr_data_o)	
    );

	assign led2 = open_risc_v_mem_wr_data_o[2];
	
	ram ram_inst(
		.clk(clk),
		.rst(rst ),
		.wen(open_risc_v_mem_wr_sel_o),
		.w_addr_i(open_risc_v_mem_wr_addr_o),
		.w_data_i(open_risc_v_mem_wr_data_o),
		.ren(open_risc_v_mem_rd_req_o),
		.r_addr_i(open_risc_v_mem_rd_addr_o),
		.r_data_o(ram_rd_data_o)
	);
	
	
	

	rom rom_inst(
		.clk(clk),
		.rst(debug),	
		.wen(uart_debug_wen),//ins_write
		.w_addr_i(uart_debug_addr_o),
		.w_data_i(uart_debug_data_o),
		.ren(1'b1),//ins_read
		.r_addr_i(open_risc_v_inst_addr_o ),
		.r_data_o(rom_inst_o				 )
	);

	uart_debug uart_debug_inst(
		.clk(clk),
		.debug(debug),
		.uart_rxd(uart_rxd), 	
		.ce(uart_debug_ce),
		.wen(uart_debug_wen),
		.addr_o(uart_debug_addr_o),
		.data_o(uart_debug_data_o)

	);

endmodule

testing insts

module tb;

	reg clk;
	reg rst;
	
	wire x3 = tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[3];
	wire x26 = tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[26];
	wire x27 = tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[27];
	
	
	always #10 clk = ~clk;
	
	initial begin
		clk <= 1'b1;
		rst <= 1'b0;
		#30;
		rst <= 1'b1;	
	end
	
	//rom start_value
	initial begin
		$readmemh("./generated/rv32ui-p-lhu.txt",tb.open_risc_v_soc_inst.rom_inst.rom_32bit.dual_ram_template_isnt.memory);
	end
	//get wave
     initial begin
         $dumpfile("tb.vcd");
         $dumpvars(0, tb);
     end
	
	integer r;
	initial begin
		wait(x26 == 32'b1);
		
		#200;
		if(x27 == 32'b1) begin
			$display("############################");
			$display("########  pass  !!!#########");
			$display("############################");
		end
		else begin
			$display("############################");
			$display("########  fail  !!!#########");
			$display("############################");
			$display("fail testnum = %2d", x3);
			for(r = 0;r < 31; r = r + 1)begin
				$display("x%2d register value is %d",r,tb.open_risc_v_soc_inst.open_risc_v_inst.regs_inst.regs[r]);	
			end	
		end
		
		$finish;
	end
	
	open_risc_v_soc open_risc_v_soc_inst(
		.clk   		(clk),
		.rst 		(rst)
	);


	
endmodule

SIM

compile and sim

import os
import subprocess
import sys


def list_binfiles(path):
    files = []
    list_dir = os.walk(path)
    for maindir, subdir, all_file in list_dir:
        for filename in all_file:
            apath = os.path.join(maindir, filename)
            if apath.endswith('.bin'):
                files.append(apath)

    return files
def bin_to_mem(infile, outfile):
    binfile = open(infile, 'rb')
    binfile_content = binfile.read(os.path.getsize(infile))
    datafile = open(outfile, 'w')

    index = 0
    b0 = 0
    b1 = 0
    b2 = 0
    b3 = 0

    for b in binfile_content:
        if index == 0:
            b0 = b
            index = index + 1
        elif index == 1:
            b1 = b
            index = index + 1
        elif index == 2:
            b2 = b
            index = index + 1
        elif index == 3:
            b3 = b
            index = 0
            array = []
            array.append(b3)
            array.append(b2)
            array.append(b1)
            array.append(b0)
            datafile.write(bytearray(array).hex() + '\n')

    binfile.close()
    datafile.close()


def compile():
    rtl_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
    iverilog_cmd = ['iverilog']
    iverilog_cmd += ['-o', r'out.vvp']
    iverilog_cmd += ['-I', rtl_dir + r'/rtl']
    iverilog_cmd.append(rtl_dir + r'/tb/tb.v')


    iverilog_cmd.append(rtl_dir + r'/rtl/defines.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/pc_reg.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/if_id.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/id.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/id_ex.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/ex.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/regs.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/ctrl.v')
    # iverilog_cmd.append(rtl_dir + r'/rtl/ram.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/rom.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/ifetch.v')
    iverilog_cmd.append(rtl_dir + r'/rtl/open_risc_v.v')

    iverilog_cmd.append(rtl_dir + r'/utils/dff_set.v')
    # iverilog_cmd.append(rtl_dir + r'/utils/dual_ram.v')


    iverilog_cmd.append(rtl_dir + r'/tb/open_risc_v_soc.v')


    process = subprocess.Popen(iverilog_cmd)
    process.wait(timeout=5)


def sim():
    compile()
    vvp_cmd = [r'vvp']
    vvp_cmd.append(r'out.vvp')
    process = subprocess.Popen(vvp_cmd)
    try:
        process.wait(timeout=10)
    except subprocess.TimeoutExpired:
        print('!!!Fail, vvp exec timeout!!!')


def run(test_binfile):
    rtl_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
    out_mem = rtl_dir + r'/sim/generated/inst_data.txt'
    bin_to_mem(test_binfile, out_mem)
    sim()


if __name__ == '__main__':
    sys.exit(run(sys.argv[1]))

test_all

import os
import subprocess
import sys

from compile_and_sim import compile
from compile_and_sim import list_binfiles
from compile_and_sim import sim
from compile_and_sim import bin_to_mem


def main():
    rtl_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
    all_bin_files = list_binfiles(rtl_dir + r'/sim/generated/')
    for file_bin in all_bin_files:
        cmd = r'python compile_and_sim.py' + ' ' + file_bin
        f = os.popen(cmd)
        r = f.read()
        index = file_bin.index('-p-')
        print_name = file_bin[index + 3:-4]

        if r.find('pass') != -1:
            print(' ins ' + print_name.ljust(10, ' ') + '    PASS')
        else:
            print('ins  ' + print_name.ljust(10, ' ') + '    !!!FAIL!!!')
        f.close()
if __name__ == '__main__':
    main()

test_one_inst

from compile_and_sim import compile
from compile_and_sim import list_binfiles
from compile_and_sim import sim
from compile_and_sim import bin_to_mem
import sys

def main(name='addi'):
    rtl_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))

    all_bin_files = list_binfiles(rtl_dir + r'/sim/generated/')

    for file in all_bin_files:
        if file.find(name) != -1 and file.find('.bin') != -1:
            test_binfile = file
    out_mem = rtl_dir + r'/sim/generated/inst_data.txt'
    # bin2mem
    bin_to_mem(test_binfile, out_mem)
    sim()
    # get wave
    # gtkwave_cmd = [r'gtkwave']
    # gtkwave_cmd.append(r'tb.vcd')
    # process = subprocess.Popen(gtkwave_cmd)


if __name__ == '__main__':
    sys.exit(main(sys.argv[1]))

second version

Purpose1:Complete mul, div, SRC and other instructions, and add more SOC

Purpose2:Generate bit stream to load into arty-a7-100t

Make changes based on the first version, I added J_TAG bus UART TIMER and DIV to implement csr div mlu and other instructions (but I haven't overcome the OVERFLOW problem.
I have released some snippets of code to record my implementation process (I haven't slept well for a month).

STRUCTURE

RTL

CORE

pc_regs

The main functions are: reset, jump, pause, address increment and other operations on the address signal of the instruction memory, that is, process the address of the instruction to generate the value of the PC register, which will be used as the instruction memory Address signal, used to read instruction content from rom.

    always @ (posedge clk) begin
        
        if (rst == 1'b0 || jtag_reset_flag_i == 1'b1) begin
            pc_o <= 32'h0;
        
        end else if (jump_flag_i == 1'b1) begin
            pc_o <= jump_addr_i;
        
        end else if (hold_flag_i >= 3'b001) begin
            pc_o <= pc_o;
        
        end else begin
            pc_o <= pc_o + 4'h4;
        end
    end

rom

The main function is: store the programmed instruction code, and output the instruction code according to the value of the PC register
Define a 32*4096 two-dimensional array as the space for storing data.
That is to store 32bit instruction codes, up to 4096 instruction codes can be stored, and the dimension of 4096 is the address corresponding to the instruction codes.
In the process of actually transplanting to FPGA, it is necessary to pay attention to the resource capacity of the FPGA used and adjust the size appropriately

module rom(

    input wire clk,
    input wire rst,

    input wire we_i,// write enable
    input wire[31:0] addr_i,// addr
    input wire[31:0] data_i,

    output reg[31:0] data_o// read data

    );

    reg[31:0] _rom[0:4095];//rom total-1


    always @ (posedge clk) begin
        if (we_i == 1'b1) begin
            _rom[addr_i[31:2]] <= data_i;
        end
    end

    always @ (*) begin
        if (rst == 1'b0) begin
            data_o = 32'h0;
        end 
        else begin
            data_o = _rom[addr_i[31:2]];
        end
    end

endmodule

ex

  1. Execute the corresponding operation according to the current instruction (addition, subtraction, multiplication, division, shift, etc.), such as the add instruction, add the value of register 1 to the value of register 2.
  2. If it is a jump instruction, a jump signal is issued.
  3. If it is a memory load command, read the memory data of the corresponding address.
    always @ (*) begin//deal div inst
        div_dividend_o = reg1_rdata_i;
        div_divisor_o = reg2_rdata_i;
        div_op_o = fun3;
        div_reg_waddr_o = reg_waddr_i;
        if ((opcode == 7'b0110011) && (fun7 == 7'b0000001)) begin
            div_wenable = 1'b0;
            div_wdata = 32'h0;
            div_waddr = 32'h0;
            case (fun3)
                3'b100, 3'b101, 3'b110, 3'b111: begin
                    div_start = 1'b1;
                    div_j_flag = 1'b1;
                    div_h_flag = 1'b1;
                    div_j_addr = op1_j_add_op2_j_res;
                end
                default: begin
                    div_start = 1'b0;
                    div_j_flag = 1'b0;
                    div_h_flag = 1'b0;
                    div_j_addr = 32'h0;
                end
            endcase
        end 
        else begin
            div_j_flag = 1'b0;
            div_j_addr = 32'h0;
            if (div_busy_i == 1'b1) begin
                div_start = 1'b1;
                div_wenable = 1'b0;
                div_wdata = 32'h0;
                div_waddr = 32'h0;
                div_h_flag = 1'b1;
            end else begin
                div_start = 1'b0;
                div_h_flag = 1'b0;
                if (div_ready_i == 1'b1) begin
                    div_wdata = div_result_i;
                    div_waddr = div_reg_waddr_i;
                    div_wenable = 1'b1;
                end else begin
                    div_wenable = 1'b0;
                    div_wdata = 32'h0;
                    div_waddr = 32'h0;
                end
            end
        end
    end

Division (this took me several nights)


I implemented this state machine with RTL and added it to the core. The biggest trouble I encountered in the middle was the control of interrupt, which would involve BUS, exe, and ctrl, because my model would use multiple cycles to complete the entire finite state machine
Each division operation requires at least 39 clock cycles.
important:
During the operation of signed data, the complement of the negative number is inverted and one is added. The purpose of inverting and adding one is obvious: it is actually to convert all negative numbers into positive numbers for calculation (because the complement form of negative numbers has a sign bit, so it cannot be directly calculated), and the final calculated result must also be a positive number. Finally, according to the sign of the divisor and the dividend, the quotient is operated (that is, whether to invert and add one)

            case(fsm_st)
                FSM_IDEL:begin
                    if (start_i == 1'b1) begin
                        op_r        <=       op_i;
                        dividend    <= dividend_i;
                        divisor     <=  divisor_i;
                        reg_waddr_o <=reg_waddr_o;
                        fsm_st      <=  FSM_START;
                        busy_o      <=       1'b1;
                    end
                    else begin
                        op_r<=3'h0;
                        reg_waddr_o <=32'h0;
                        dividend    <=32'h0;
                        divisor     <=32'h0;
                        ready_o     <= 1'b0;
                        result_o    <=32'h0;
                        busy_o      <= 1'b0;
                    end
                end
                FSM_START:begin
                    if (start_i==1'b1) begin
                        if (divisor==32'h0) begin
                            if (div_op|divu_op) begin
                                result_o<=32'hffffffff;
                            end
                            else begin
                                result_o<=dividend;
                            end
                            ready_o   <=1'b1    ;
                            fsm_st        <=FSM_IDEL;
                            busy_o    <=1'b0    ;
                        end
                        else begin
                            busy_o      <=1'h1         ;
                            ct          <=32'h40000000 ;
                            fsm_st      <=FSM_CALC     ;
                            div_result  <=32'h0        ;
                            div_remain  <=32'h0        ;
                            if (div_op|rem_op) begin
                                if (dividend[31]==1'b1) begin
                                    dividend<=dividend_neg;
                                    minen<=dividend_neg[31];
                                end
                                else begin
                                    minen<=dividend[31];
                                end
                                if (divisor[31]==1'b1) begin
                                    divisor<=divisor_neg;
                                end
                            end
                            else begin
                                minen<=dividend[31];
                            end
                            if ((div_op && (dividend[31] ^ divisor[31] == 1'b1))||(rem_op && (dividend[31] == 1'b1))) begin
                                invert_result<=1'b1;
                            end
                            else begin
                                invert_result<=1'b0;
                            end
                        end
                    end
                    else begin
                        fsm_st<=FSM_IDEL;
                        result_o<=32'h0;
                        ready_o<=1'b0;
                        busy_o<=1'b0;
                    end
                end
                FSM_CALC:begin
                    if (start_i==1'b1) begin
                        dividend<={dividend[30:0], 1'b0};
                        div_result<=div_result_temp;
                        ct<={1'b0,ct[31:1]};
                        if (|ct) begin
                            minen<= {minen_temp[30:0], dividend[30]};
                        end
                        else begin
                            fsm_st<=FSM_END;
                            if (minen_divisor) begin
                                div_remain<=minen_sub_res;
                            end
                            else begin
                                div_remain<=minen;
                            end
                        end
                    end
                    else begin
                        fsm_st      <=  FSM_IDEL;
                        result_o<=     32'h0;
                        ready_o <=      1'b0;
                        busy_o  <=      1'b0;
                    end
                end
                FSM_END:begin
                    if (start_i==1'b1) begin
                        ready_o<=1'b1;
                        fsm_st<=FSM_IDEL;
                        busy_o<=1'b0;
                        if (div_op|divu_op) begin
                            if (invert_result) begin
                                result_o<=(-div_result);
                            end
                            else begin
                                result_o<=div_result;
                            end
                        end
                        else begin
                            if (invert_result) begin
                                result_o<=(-div_remain);
                            end
                            else begin
                                result_o<=div_remain;
                            end
                        end
                    end
                    else begin
                        fsm_st<=FSM_IDEL;
                        result_o<=32'h0;
                        ready_o<=1'b0;
                        busy_o<=1'b0;
                    end
                end
            endcase
        end
    end

Interrupt

Interrupt type:
External interrupts: interrupts generated by peripherals, interrupts that occur outside the processing core.
Timer interrupt (one of the external interrupts): controlled by the mtie field in the mie register.
Software interrupt: an interrupt triggered by the software (software language such as C language) itself.
Debug Interrupt: Interrupt when Debugging.
Interrupt masking: through the MIE register, to control different types of interrupt enable and mask (external interrupt, timer interrupt, software interrupt).

ctrl


A jump is to change the value of the PC register.
And because whether to jump or not needs to be known at the execution stage, when a jump is required, the pipeline needs to be suspended

//hold_flag[7:0]
module ctrl(
    input wire rst,
    // from ex
    input wire        jump_flag_i,
    input wire[31:0]  jump_addr_i,
    input wire        hold_flag_ex_i,
    // from rib
    input wire hold_flag_rib_i,
    // from jtag
    input wire jtag_halt_flag_i,
    // from clint
    input wire hold_flag_clint_i,
    output reg[2:0] hold_flag_o,
    // to pc_reg
    output reg jump_flag_o,
    output reg[31:0] jump_addr_o
    );
    always @ (*) begin
        jump_addr_o = jump_addr_i;
        jump_flag_o = jump_flag_i;
        hold_flag_o = 3'b000;
        if (jump_flag_i == 1'b1 || hold_flag_ex_i == 1'b1 || hold_flag_clint_i == 1'b1) begin
            hold_flag_o = 3'b011;
        end else if (hold_flag_rib_i == 1'b1) begin
            hold_flag_o = 3'b001;
        end else if (jtag_halt_flag_i == 1'b1) begin
            hold_flag_o = 3'b011;
        end else begin
            hold_flag_o = 3'b000;
        end
    end

endmodule

csr_reg


Machine Trap Vector=t_vec
Machine Exception Cause
Machine Exception PC
Machine Status
Write register: According to the last 12 bits of the write register address, store the data in the ex or clint module in the control and status register (CSR register).

always @(posedge clk) begin//what kind of reg we need to write
    if (rst==1'b1) begin
        t_vec<=32'h0;
        cause<=32'h0;
        epc<=32'h0;
        mei<=32'h0;
        status<=32'h0;
        scratch<=32'h0;
    end
    else begin
        if (we_i==1'b1) begin
            case (waddr_i[11:0])
                12'h305:begin
                    tvec<=data_i;
                end 
                12'h342:begin
                    cause<=data_i;
                end
                12'h341:begin
                    epc<=data_i;
                end
                12'h304:begin
                    mei<=data_i;
                end
                12'h300:begin
                    status<=data_i;
                end
                12'h340:begin
                    scratch<=data_i;
                end
                default:begin
                    
                end 
            endcase    
        end
        else if(clint_we_i == 1'b1) begin
            case (clint_waddr_i[11:0])
                12'h305:begin
                    t_vec<=clint_data_i;
                end 
                12'h342:begin
                    cause<=clint_data_i;
                end
                12'h341:begin
                    epc<=clint_data_i;
                end
                12'h304:begin
                    mei<=clint_data_i;
                end
                12'h300:begin
                    status<=clint_data_i;
                end
                12'h340:begin
                    scratch<=clint_data_i;
                end
                default:begin
                    
                end 
            endcase
        end
    end
end

Read register (combined logic): The address of the read register comes from the decoding id module, and the data read from the register is sent to the decoding id module (according to the last 12 bits of the read register address).

The address of the read register comes from the interrupt clint module, and the data read from the register is sent to the clint module (according to the last 12 bits of the read register address).

assign global_int_en_o=(status[3]==1'b1)?1'b1:1'b0;
assign clint_csr_mtvec      = tvec   ;
assign clint_csr_mepc       = epc    ;
assign clint_csr_mstatus    = status ;

clint

RISC-V interrupts are divided into two types, one is synchronous interrupts, that is, interrupts generated by ECALL, EBREAK and other instructions, and the other is asynchronous interrupts, that is, interrupts generated by peripherals such as GPIO and UART.

When an interrupt (interrupt return) signal is detected, first suspend the entire pipeline, set the jump address as the interrupt entry address, then read and write the necessary CSR registers (mstatus, mepc, mcause, etc.), and wait until these CSR registers are read and written After that, the pipeline suspension is canceled, so that the processor can fetch instructions from the interrupt entry address and enter the interrupt service routine.

  1. steps
    Synchronous interrupt > asynchronous interrupt > interrupt return
  2. CSR register state machine jump
    Extract the address returned by the interrupt and the code that caused the interrupt, as well as the state jump of the CSR register.
  3. Write CSR registers (status, epc, cause)
    First write the interrupt return address epc
    Write mstatus again to turn off the global interrupt
    Write the interrupt exception code to the mcause register
    Interrupt return, the global interrupt bit needs to be restored at the same time of return (status[3]=status[7])
  4. Send interrupt signal to ex
    int_assert_o: interrupt valid signal, when the signal is 1, start to run the interrupt handler.
    inst_flag_i: The interrupt flag signal of the timer interrupt.
    int_assert_o: interrupt valid signal, when the signal is 1, start to run the interrupt handler.
    //def有限狀態機
    localparam INT_IDLE  = 4'b0001;
    localparam INT_SYNC  = 4'b0010;
    localparam INT_ASYNC = 4'b0100;
    localparam INT_MERT  = 4'b1000;
    //csr regs狀態定義
    localparam CSR_IDEL  =5'b00001;
    localparam CSR_STAT  =5'b00010;
    localparam CSR_MEPC  =5'b00100;
    localparam CSR_STMT  =5'b01000;
    localparam CSR_CAUS  =5'b10000;
​​​​always @(*) begin//控制中斷
​​​​    if(rst==1'b0)begin
​​​​        int_st=INT_IDLE;
​​​​    end
​​​​    else begin
​​​​        if(inst_i==32'h73||inst_i==32'h00100073)begin
​​​​            if(div_started_i==1'b0)begin
​​​​                int_st=INT_SYNC;
​​​​            end
​​​​            else begin
​​​​                int_st=INT_IDLE;
​​​​            end
​​​​        end
​​​​        else if(int_flag_i!=8'h0&&global_int_en_i==1'b1)begin
​​​​            int_st=INT_ASYNC;
​​​​        end
​​​​        else if(inst_i==32'h30200073)begin
​​​​            int_st=INT_MERT;
​​​​        end
​​​​        else begin
​​​​            int_st=INT_IDLE;
​​​​        end
​​​​    end
​​​​end
    always @(posedge clk) begin//CSR有限狀態機控制
        if (rst==1'b0) begin
            csr_st     <=CSR_IDEL;
            cause      <=  32'h0;
            int_addr   <=  32'h0;
        end
        else begin
            case(csr_st)
            CSR_IDEL:begin
                if (int_st==INT_SYNC) begin
                    csr_st<=CSR_MEPC;
                    if (jump_flag_i==1'b1) begin
                        int_addr<=jump_addr_i-4'h4;
                    end
                    else begin
                         int_addr<=inst_addr_i;
                    end
                    case(inst_i)
                        32'h73:begin
                            cause<=32'd11;
                        end
                        32'h00100073:begin
                            cause<=32'd3;
                        end
                        default:begin
                            cause<=32'd10;
                        end
                    endcase
                end
                else if (int_st==INT_ASYNC) begin
                    cause<=32'h80000004;
                    csr_st<=CSR_MEPC;
                    if (jump_flag_i==1'b1) begin
                        int_addr<=jump_addr_i;
                    end
                    else if(div_started_i==1'b1) begin
                        int_addr<=inst_addr_i-4'h4;
                    end
                    else begin
                        int_addr<=inst_addr_i;
                    end
                end
                else if (int_st==INT_MERT) begin
                    csr_st<=CSR_STMT;
                end
            end
            CSR_STAT:begin
                csr_st<=CSR_STAT;
            end
            CSR_MEPC:begin
                csr_st<=CSR_CAUS;
            end
            CSR_STMT:begin
                csr_st<=CSR_IDEL;
            end
            CSR_CAUS:begin
                csr_st<=CSR_IDEL;
            end
            default:begin
                csr_st<=CSR_IDEL;
            end
            endcase
        end
    end
//同步中斷&非同步中斷&發送訊號給EX模組
        if (rst == 1'b0) begin
            we_o <= 1'b0;
            waddr_o <= 32'h0;
            data_o <= 32'h0;
        end
        else begin
            case(csr_st)//中斷返回地址會需要IMM_PC_ADDR+4
                CSR_MEPC:begin
                    we_o<=1'b1;
                    waddr_o<={20'h0,12'h341};
                    data_o<=int_addr;
                end 
                CSR_CAUS:begin
                    we_o<=1'b1;
                    waddr_o<={20'h0,12'h342};
                    data_o<=cause;
                end 
                CSR_STAT:begin
                    we_o<=1'b1;
                    waddr_o<={20'h0,12'h300};
                    data_o<={csr_mstatus[31:4],1'b0,csr_mstatus[2:0]};
                end 
                CSR_STMT:begin
                    we_o<=1'b1;
                    waddr_o<={20'h0,12'h300};
                    data_o<={csr_mstatus[31:4],csr_mstatus[7],csr_mstatus[2:0]};
                end 
case(csr_st)
    CSR_CAUS:begin
        int_assert_o<=1'b1;
        int_addr_o<=csr_mtvec;
            end
        CSR_STMT:begin
            int_assert_o<=1'b1;
            int_addr_o<=csr_mepc;
        end

registers

regs


Temporary data storage for decoding and execution
A register regs with a width of 32 bits and a depth of 32 bits is defined in the program.

reg[31:0] regs[0:32 - 1];

1.Write register: store the data in ex or jtag in the register regs

always @ (posedge clk) begin
        if (rst == 1'b1) begin
            if ((we_i == 1'b1) && (waddr_i != 5'h0)) begin
                regs[waddr_i] <= wdata_i;
            end else if ((jtag_we_i == 1'b1) && (jtag_addr_i != 5'h0)) begin
                regs[jtag_addr_i] <= jtag_data_i;
            end
        end
    end

2.2. Read register (combination logic):
The address of the read register comes from the decoding id module, and the data read from the register is sent to the decoding id module (regs).
The address of the read register comes from the jtag module, and the data read from the register is sent to the jtag module (jtag read register).

always @ (*) begin//we==write_enable
        if (raddr1_i == 5'h0) begin
            r_data1_o = 32'h0;
        end 
        else if (raddr1_i == waddr_i && we_i == 1'b1) begin
            r_data1_o = wdata_i;
        end 
        else begin
            r_data1_o = regs[raddr1_i];
        end
    end

    always @ (*) begin
        if (r_addr2_i == 5'h0) begin
            r_data2_o = 32'h0;
        end else if (r_addr2_i == waddr_i && we_i == 1'b1) begin
            r_data2_o = wdata_i;
        end 
        else begin
            r_data2_o = regs[raddr2_i];
        end
    end

    always @ (*) begin
        if (jtag_addr_i == 5'h0) begin
            jtag_data_o = 32'h0;
        end 
        else begin
            jtag_data_o = regs[jtag_addr_i];
        end
    end

Because of the pipeline, when the current instruction is in the execution stage, the next instruction is in the decoding stage. Since the register will not be written in the execution stage, but the register write operation will be performed when the next clock arrives.

If the instruction in the decoding stage requires the result of the previous instruction, the value of the register read at this time is wrong.

For example, the following two instructions: add x1, x2, x3, add x4, x1, x5 The second instruction depends on the result of the first instruction. To solve this problem, if the read register is equal to the write register, the value to be written is directly returned to the read operation.

databus

Assuming that a peripheral has an address bus and a data bus, and there are N peripherals in total, then the processor core has N address buses and N data buses, and each additional peripheral needs to be modified (the change is not small) ) core code.

With the bus, the processor core only needs one address bus and one data bus, which greatly simplifies the connection between the processor core and peripherals.

  1. Bus arbitration mechanism
    First, each host sends an access request req to the bus: we will perform the access priority of each foreign agency according to the demand
Select the main device according to the order of priority through if_else to perform the corresponding access operation.

For the arbitration of the master device, 
the priority order of the master device is: uart serial port download, 
ex.v execution module, 
jtag module, pc_reg fetch module.

and why???
because
2. Download the uart program.
Since the program needs to be updated, it does not matter which step the program executes.
No need to consider other module requests, download directly, and re-run the new program (need to pause the pipeline)

3.ex.v execution module (memory read and write request)
unless the new program code is re-downloaded.
In the case that the program remains unchanged, it is necessary to ensure that the current instruction runs completely in order to ensure that subsequent operations will not go wrong (need to suspend the pipeline)

  1. jtag module
    the previous step instruction is finished,
    The jtag debugging module can modify the debugging parameters, control the execution of the program, including the value, because during the debugging process,
    Setting a breakpoint may suspend the value operation (need to suspend the pipeline)
  2. The pc_reg instruction fetch module
    It is the first step of all the above main equipment modules, serving the above "main equipment".

riscv_bus

Select the slave device that needs to be operated through the case statement, and then pass the write_enable of the master to the slave to be written.

The bus supports multi-master and multi-slave connections, but only supports one master and one slave communication at the same time. A fixed priority arbitration mechanism is adopted between each master device on the RIB bus.

The highest 4 bits of the bus address determine which slave device to access, so up to 16 slave devices are supported.

master_addr_i[31:28]
        case (slave_needed)
            grant0: begin
                case (m0_addr_i[31:28])
                    slave_0: begin
                        s0_we_o = m0_we_i;
                        s0_addr_o = {{4'h0}, {m0_addr_i[27:0]}};
                        s0_data_o = m0_data_i;
                        m0_data_o = s0_data_i;
                    end
                    slave_1: begin
                        s1_we_o = m0_we_i;
                        s1_addr_o = {{4'h0}, {m0_addr_i[27:0]}};
                        s1_data_o = m0_data_i;
                        m0_data_o = s1_data_i;
                    end
                    slave_2: begin
                        s2_we_o = m0_we_i;
                        s2_addr_o = {{4'h0}, {m0_addr_i[27:0]}};
                        s2_data_o = m0_data_i;
                        m0_data_o = s2_data_i;
                    end
                    slave_3: begin
                        s3_we_o = m0_we_i;
                        s3_addr_o = {{4'h0}, {m0_addr_i[27:0]}};
                        s3_data_o = m0_data_i;
                        m0_data_o = s3_data_i;
                    end
                    slave_4: begin
                        s4_we_o = m0_we_i;
                        s4_addr_o = {{4'h0}, {m0_addr_i[27:0]}};
                        s4_data_o = m0_data_i;
                        m0_data_o = s4_data_i;
                    end
                    slave_5: begin
                        s5_we_o = m0_we_i;
                        s5_addr_o = {{4'h0}, {m0_addr_i[27:0]}};
                        s5_data_o = m0_data_i;
                        m0_data_o = s5_data_i;
                    end
                    default: begin

                    end
                endcase
            end

perips

GPIO


Every 2 bits control 1 IO mode, supporting up to 16 IOs
0: high impedance, 1: output, 2: input
Step1: First design two registers: gpio_ctrl (control GPIO input and output mode); gpio_data (store GPIO input or output data).

    reg[31:0] gpio_ctrl;
    reg[31:0] gpio_data;

Step2: Plan addresses for these two registers.

localparam CTRL = 4'h0;  
localparam DATA = 4'h4;  

Step3: Through register addressing, write to the two registers defined above, and realize the input and output of GPIO by configuring the gpio_ctrl register.

    always @ (posedge clk) begin
        if (rst == 1'b0) begin
            gpio_data <= 32'h0;
            gpio_ctrl <= 32'h0;
        end else begin
            if (we_i == 1'b1) begin
                case (addr_i[3:0])
                    CTRL: begin
                        gpio_ctrl <= data_i;
                    end
                    DATA: begin
                        gpio_data <= data_i;
                    end
                endcase
            end else begin
                if (gpio_ctrl[1:0] == 2'b10) begin
                    gpio_data[0] <= iopin_i[0];
                end
                if (gpio_ctrl[3:2] == 2'b10) begin
                    gpio_data[1] <= iopin_i[1];
                end
            end
        end
    end

    always @ (*) begin
        if (rst == 1'b0) begin
            data_o = 32'h0;
        end else begin
            case (addr_i[3:0])
                CTRL: begin
                    data_o = gpio_ctrl;
                end
                DATA: begin
                    data_o = gpio_data;
                end
                default: begin
                    data_o = 32'h0;
                end
            endcase
        end
    end

Note that the following concepts need to be kept in mind when simulating TOP

gpio[0] = (gpio_ctrl[1:0] == 2'b01)? gpio_data[0]: 1'bz;

When the configuration register gpio_ctrl[1:0] is 1, it means that GPIO is in output mode, and gpio_data[0] is output to the corresponding IO port. If gpio_ctrl[1:0] is not 1, it is 0 or 2, corresponding to high Both resistive and input modes set the GPIO to a high-impedance state for the following reasons:

High-impedance state is a common term in digital circuits. It refers to an output state of the circuit, which is neither high nor low. The impact is the same as not connected. If you use a multimeter to measure it, it may be high or low, depending on what is connected behind it.

SPI

wiki
youtube

The SPI protocol specifies 4 logical signal interfaces:

SCLK (Serial Clock, will be issued by the master)
MOSI (Master Out, Slave In)
MISO (Master In, Slave Out)
CS (Chip Select, because a master can communicate with several slaves, so CS is needed to select the slave to communicate with, and usually CS is enabled at low potential)


step1:set write_enable(we)always work

    always @ (posedge clk) begin
        if (rst == 1'b0) begin
            en <= 1'b0;
        end 
        else begin
            if (spi_ctrl[0] == 1'b1) begin
                en <= 1'b1;
            end else if (done == 1'b1) begin
                en <= 1'b0;
            end else begin
                en <= en;
            end
        end
    end

step2:cut_clk count the clk

    always @ (posedge clk) begin
        if (rst == 1'b0) begin
            clk_cnt <= 9'h0;
        end 
        else if (en == 1'b1) begin
            if (clk_cnt == div_cnt) begin
                clk_cnt <= 9'h0;
            end 
            else begin
                clk_cnt <= clk_cnt + 1'b1;
            end
        end 
        else begin
            clk_cnt <= 9'h0;
        end
    end

step3:count SPI_CLK

always @ (posedge clk) begin
        if (rst == 1'b0) begin
            spi_clk_cnt <= 5'h0;
            spi_clk_level <= 1'b0;
        end 
        else if (en == 1'b1) begin
            if (clk_cnt == div_cnt) begin
                if (spi_clk_cnt == 5'd17) begin
                    spi_clk_cnt <= 5'h0;
                    spi_clk_level <= 1'b0;
                end 
                else begin
                    spi_clk_cnt <= spi_clk_cnt + 1'b1;
                    spi_clk_level <= 1'b1;
                end
            end 
            else begin
                spi_clk_level <= 1'b0;
            end
        end 
        else begin
            spi_clk_cnt <= 5'h0;
            spi_clk_level <= 1'b0;
        end
    end

step4:write regs

always @ (posedge clk) begin
        if (rst == 1'b0) begin
            spi_ctrl <= 32'h0;
            spi_data <= 32'h0;
            spi_status <= 32'h0;
        end else begin
            spi_status[0] <= en;
            if (we_i == 1'b1) begin
                case (addr_i[3:0])
                    SPI_CTRL: begin
                        spi_ctrl <= data_i;
                    end
                    SPI_DATA: begin
                        spi_data <= data_i;
                    end
                    default: begin

                    end
                endcase
            end 
            else begin
                spi_ctrl[0] <= 1'b0;
                if (done == 1'b1) begin
                    spi_data <= {24'h0, rdata};
                end
            end
        end
    end

timer

youtube


Step1: Define three registers

  1. Control register: CTRL=4'h0
  2. Counting threshold register: VALUE=4'h4
  3. Current count value register (readonly): COUNT=4'h8
    Step2:regs read&write
    start
    // counter
    always @ (posedge clk) begin
        if (rst == 1'b0) begin
            t_ct <= 32'h0;
        end 
        else begin
            if (t_ctrl[0] == 1'b1) begin
                t_ct <= t_ct + 1'b1;
                if (t_ct >= t_val) begin
                    t_ct <= 32'h0;
                end
            end 
            else begin
                t_ct <= 32'h0;
            end
        end
    end

R&W

    always @ (*) begin
        if (rst == 1'b0) begin
            data_o = 32'h0;
        end 
        else begin
            case (addr_i[3:0])
                VALUE: begin
                    data_o = t_val;
                end
                CTRL: begin
                    data_o = t_ctrl;
                end
                CT: begin
                    data_o = t_ct;
                end
                default: begin
                    data_o = 32'h0;
                end
            endcase
        end
    end
always @ (posedge clk) begin
        if (rst == 1'b0) begin
            t_ctrl <= 32'h0;
            t_val <= 32'h0;
        end 
        else begin
            if (we_i == 1'b1) begin
                case (addr_i[3:0])
                    CTRL: begin
                        t_ctrl <= {data_i[31:3], (t_ctrl[2] & (~data_i[2])), data_i[1:0]};
                    end
                    VALUE: begin
                        t_val <= data_i;
                    end
                endcase
            end 
            else begin
                if ((t_ctrl[0] == 1'b1) && (t_ct >= t_val)) begin
                    t_ctrl[0] <= 1'b0;
                    t_ctrl[2] <= 1'b1;
                end
            end
        end
    end

uart


EXP

  1. UART stands for Universal Asynchronous Receiver Transmitter.

  2. Synchronous serial communication requires both communication parties to transmit data synchronously under the control of the same clock; asynchronous serial communication means that both communication parties use their own clocks to control the sending and receiving process of data.

  3. A frame of data in the sending or receiving process of UART consists of 4 parts, start bit, data bit, parity bit and stop bit
    The rate of serial port communication is represented by baud rate, which represents the number of bits of binary data transmitted per second, and the unit is bps.

thenTX sending

    always @ (posedge clk) begin
        if (rst == 1'b0) begin
            FSM_ <= FSM_IDLE;
            cycle_count <= 16'd0;
            tx_reg <= 1'b0;
            bit_count <= 4'd0;
            tx_data_ready <= 1'b0;
        end 
        else begin
            if (FSM_ == FSM_IDLE) begin
                tx_reg <= 1'b1;
                tx_data_ready <= 1'b0;
                if (tx_data_valid == 1'b1) begin
                    FSM_ <= FSM_START;
                    cycle_count <= 16'd0;
                    bit_count <= 4'd0;
                    tx_reg <= 1'b0;
                end
            end 
            else begin
                cycle_count <= cycle_count + 16'd1;
                if (cycle_count == uart_baud[15:0]) begin
                    cycle_count <= 16'd0;
                    case (FSM_)
                        FSM_START: begin
                            tx_reg <= tx_data[bit_count];
                            FSM_ <= FSM_SEND_BYTE;
                            bit_count <= bit_count + 4'd1;
                        end
                        FSM_SEND_BYTE: begin
                            bit_count <= bit_count + 4'd1;
                            if (bit_count == 4'd8) begin
                                FSM_ <= FSM_STOP;
                                tx_reg <= 1'b1;
                            end
                            else begin                
                                tx_reg <= tx_data[bit_count];
                            end
                        end
                        FSM_STOP: begin
                            tx_reg <= 1'b1;
                            FSM_ <= FSM_IDLE;
                            tx_data_ready <= 1'b1;
                        end
                    endcase
                end
            end
        end
    end

RX reception (partial)

assign rx_neg_edge = rx_q1 && ~rx_q0;

    always @ (posedge clk) begin
        if (rst == 1'b0) begin
            rx_q0 <= 1'b0;
            rx_q1 <= 1'b0;	
        end 
        else begin
            rx_q0 <= rx_pin;
            rx_q1 <= rx_q0;
        end
    end
    always @ (posedge clk) begin
        if (rst == 1'b0) begin
            rx_start <= 1'b0;
        end 
        else begin
            if (uart_ctrl[1]) begin
                if (rx_neg_edge) begin
                    rx_start <= 1'b1;
                end 
                else if (rx_clk_count == 4'd9) begin
                    rx_start <= 1'b0;
                end
            end else begin
                rx_start <= 1'b0;
            end
        end
    end

Specific process:
a. When sending idle (that is, not sending data), (according to the protocol) keep the sending end set to 1; when sending data is valid (C language writes the data to be sent to the register UART_TXDATA), the sending end sends the start bit 0 (a counting cycle)
b. Control the counting threshold of the clock frequency division counter according to the agreed sending rate (baud rate), send data, first send the low bit and then send the high bit, after sending the data, set the sending end to 1, corresponding to the stop bit in the sequence; and update The corresponding bit of the receiving and sending status register UART_STATUS[0] <= 0;
c. Wait for the next sending (that is, the next sending data valid signal)

tips(from my friend)

Since the input and output pins of the FPGA serial port are at TTL level, 3.3V is used to represent the logic"1", 0V represents logic "0"; while the computer serial port uses RS-232 level, which is a negative logic level.
That is, -15V~-5V represents logic "1", and +5V~+15V represents logic "0". Therefore, when the computer communicates with the FPGA, it is necessary to add a level conversion chip

SIM

test_all_inst.py

find all bis files

import os
import subprocess
import sys

def list_binfiles(path):
    files = []
    list_dir = os.walk(path)
    for maindir, subdir, all_file in list_dir:
        for filename in all_file:
            apath = os.path.join(maindir, filename)
            if apath.endswith('.bin'):
                files.append(apath)
    return files

test all bin files

    
def main():
    bin_files = list_binfiles(r'../tests/isa/generated')
    anyfail = False
    for file in bin_files:
        cmd = r'python new_nw.py' + ' ' + file + ' ' + 'inst.data'
        f = os.popen(cmd)
        r = f.read()
        f.close()
        if (r.find('TEST_PASS') != -1):
            print(file + '    nlnlsofun')
        else:
            print(file + '!!!關進熊熊監獄,因為你失敗了!!!')
            anyfail = True
            break
    if (anyfail == False):
        print('恨熊熊,你再水時數阿, All PASS...')
if __name__ == '__main__':
    sys.exit(main())

step2 new_nw.py

turn bin files to mem files

cmd = r'python ../tools/Bin2Mem_CLI.py' + ' ' + sys.argv[1] + ' ' + sys.argv[2]
    f = os.popen(cmd)
    f.close()

compile rtl files

step3 use Iverilog

def main():
    rtl_dir = sys.argv[1]

    if rtl_dir != r'..':
        tb_file = r'/tb/compliance_test/cwwppb_soc_tb.v'
    else:
        tb_file = r'/tb/cwwppb_soc_tb.v'

    # iverilog process
    iverilog_cmd = ['iverilog']
    ...
    ...
    ...

most of the trouble

1.define is not necessarily very convenient, the risc_v manual is your good friend

When you need a lot of "types" of values, but the VALUE is the same, it will be very inconvenient when coding. You need to keep clicking on the prompt, and then you will be crazy. Why is a wire&reg designed like this

2.latch

As a novice, I have never learned logic design. I remember that I was stuck for 14 hours on the fourth day because of a wrong judgment.
EXP

always @(al or b)
begin
    if(al) q <= b;

end
  1. In this "always" block, the if statement ensures that q takes the value of d only when al = 1. This program does not write the result when al = 0, so what happens when al = 0? The variable q retains its original value.
  2. Improper use of case statements (where I am stuck)
    The case where the latch is generated occurs when the default item is missing when using the case statement.

The function of the case statement is to assign different values to another signal (q in this example) when a signal (sel in this example) takes different values. Pay attention to the example on the left side of the figure below, such as sel=00, q takes the value of a, and sel=11, q takes the value of b.

What is not clear in this example is: what value will q be assigned if sel takes on valuesother than 00 and 11? In the example on the left below, the program is written in Verilog HDL, that is, the default is to keep the original value of q, which will automatically generate a latch.

3. vivado is super invincible and difficult to use, modelsim can be very good for you to test normally.

4. Do not directly apply the board file as your project mode

I use the board format starting with xca7100 to write my xdc file

5. When designing a finite state machine, be sure to search for information from multiple sources

When I was writing a division model, I had a big problem with my logic, because I was looking at a strange China websites' guide, until my NTU EE friend told me that I couldn't write it for 20,000 years (I was try to solve this with multiplication of divisors super dumb)

6.J_TAG VS UART

here is the answer look it properly

7.If you want to know something abuot a board go to read the datasheet!

when generating the bitstream,I always thought the I got enought numbers of IO/ports untill I read the datasheet
arty a7 100T

TEST C code

GCC toolchains compare and try:

how to use them
I took riscv64-unknow-elf-gcc as my tool first,but I found that bin file would be too big for our SOC,so I tried -Os as compile method
&reduce size from linker useing strip,but sadly,the all faild,so I tried useing the toolchain for MCU(riscv-none-embed-),that means I have to give up some systemcall on my C code to fit the toolchain:(

set your toolchain

put your toolchain in tools download
I took my homework 1's C code as test code

how to test it

1.test_all_isa
go to sim folder and do this instruction

python test_all_isa.py



2.test C code
go to sim folder and do this instruction

python sim.py ..\tests\example\simple\C_test.bin inst.data

cause I use riscv-none-embed-gcc as my tool on Windows,it means I have no need to use "newlib" and I abandon some systemcalls like printf(),but I have to say riscv-none-embed-gcc can deal with "newlib",it's just my personal chooice.


If success you can see this on your computer:

#include<stdio.h>
#include"..\lib\utils.h"
int main(){
    int arr[]={20,1,0,2,1,16,1,3,2,1,2,17}; 
    int height=12;
    int ans=trap(arr,height);
    if (ans == 141)
        set_test_pass();
    else
        set_test_fail();

    return 0;
    /*printf("%d\n",ans);*/
}
int trap(int* height, int heightSize){
    int maxh=0,maxhi;
    if(heightSize==0||heightSize==1)
        return 0;
    for(int i=0;i<heightSize;i++){
        if(height[i]>maxh){
            maxh=height[i];
            maxhi=i;
        }
    }
    int water_l=0;
    int rain=0;
    for(int i=0;i<maxhi;i++){
        if(height[i]>water_l){
            water_l=height[i];
        }
        rain+=water_l-height[i];
    }
    water_l=0;
    for(int i=heightSize-1;i>maxhi;i--){
        if(height[i]>water_l){
            water_l=height[i];
        }
        rain+=water_l-height[i];
    }
    return rain;
}

clips from C_test.c's dump file(Os as CFLAGS)

000001d8 <trap>:
 1d8:	00100793          	li	a5,1
 1dc:	08b7fe63          	bgeu	a5,a1,278 <trap+0xa0>
 1e0:	00000793          	li	a5,0
 1e4:	00000693          	li	a3,0
 1e8:	02b7c463          	blt	a5,a1,210 <trap+0x38>
 1ec:	00000693          	li	a3,0
 1f0:	00000793          	li	a5,0
 1f4:	00000613          	li	a2,0
 1f8:	0306cc63          	blt	a3,a6,230 <trap+0x58>
 1fc:	fff58593          	addi	a1,a1,-1
 200:	00000693          	li	a3,0
 204:	04b84863          	blt	a6,a1,254 <trap+0x7c>
 208:	00078513          	mv	a0,a5
 20c:	00008067          	ret
 210:	00279713          	slli	a4,a5,0x2
 214:	00e50733          	add	a4,a0,a4
 218:	00072703          	lw	a4,0(a4)
 21c:	00e6d663          	bge	a3,a4,228 <trap+0x50>
 220:	00078813          	mv	a6,a5
 224:	00070693          	mv	a3,a4
 228:	00178793          	addi	a5,a5,1
 22c:	fbdff06f          	j	1e8 <trap+0x10>
 230:	00269713          	slli	a4,a3,0x2
 234:	00e50733          	add	a4,a0,a4
 238:	00072703          	lw	a4,0(a4)
 23c:	00e65463          	bge	a2,a4,244 <trap+0x6c>
 240:	00070613          	mv	a2,a4
 244:	40e60733          	sub	a4,a2,a4
 248:	00e787b3          	add	a5,a5,a4
 24c:	00168693          	addi	a3,a3,1
 250:	fa9ff06f          	j	1f8 <trap+0x20>
 254:	00259713          	slli	a4,a1,0x2
 258:	00e50733          	add	a4,a0,a4
 25c:	00072703          	lw	a4,0(a4)
 260:	00e6d463          	bge	a3,a4,268 <trap+0x90>
 264:	00070693          	mv	a3,a4
 268:	40e68733          	sub	a4,a3,a4
 26c:	00e787b3          	add	a5,a5,a4
 270:	fff58593          	addi	a1,a1,-1
 274:	f91ff06f          	j	204 <trap+0x2c>
 278:	00000793          	li	a5,0
 27c:	f8dff06f          	j	208 <trap+0x30>

perips testing

working

GITHUB

I'm still working on my project
here is Version 2.00
vedio

arty-a7-100t testing(still working)

1.what is xdc???
2.refrence how do I write a xdc file
we got a sucess on generate bit stream
Not in vain I slept less than six hours almost every day this month, and even dropped two courses QAQ
after 14 hours I finaly deal the last problem

non-os booting

I got a question that someone asked me how to boot a non-os machine,
it is a great question.
In risc-v offical datasheet

boot Linux(先暫定中文,完成後我會轉成英文的)(20230201)

add eth_ip

cwwppb_v1.02 bitstream

PetaLinux

安裝過程請"務必要看datasheet"而不是按照網路上的奇怪教學,對好版本,安裝所需的函式庫

心得:

If there are students who want to improve their own strength and are willing to spend time, this class is a blood push, super recommended, you never know where the teacher can push your limit, the teacher is also very serious in class, prepare The teaching materials are also very good, learning things is the second, and some values of the teacher are also worth learning. I was scolded by the teacher for a good sentence: "Are you talking like an engineer? How can an engineer use it?" "Should" and "probably" are used to describe your thoughts", in short, I think it is necessary to take this course to ensure that you can learn everything you want to learn!