On the afternoon of the day when I chose the subject to do, I returned my two classes, because I knew that I would't have too much fun in the next month, and I needed a lot of time to complete this term_project.
從零開始的RISC-V SoC架構設計與Linux核心運行 - 硬體篇
guide
Install VIVADO (please use the ML version, do not need to use the license).
Please use the version after 2020.
There is a problem of file corruption when downloading in 2018 (personal experience, re-running three times)
I use VSCODE to edit my code. (notepad is super bad)
ARTY_A7_100T
Be sure to use the above vivado version
Specification req:
borad file(it got guidence)
ithome
bilibili well,I know learning how to write verilog on this web site sounds funny,but I'M VERY BAD IN english:(
datasheet It is very usefull,you don't have to remember the inst-type when this book is in your computer
book1(this book is very usefull to learn some basic knoledge)MUST READ CH1-CH3 VERY CAREFULLY
book2
github exp
stackoverflow
A 5-STAGE_cpu code from my friend.(x86)
A NTU EE MASTER friend
A lot of money(fpga board is very expensive QQ).
handshakes's RTL
I made two versions on the CPU side (yes, you read that right
Since I don't know how to write verilog, I trained myself from scratch.
At the beginning, I wrote a 3-stage pipeline CPU that can execute some RV32I (except load, store types) instructions.
Then I tried to write a three-stage pipeline CPU, and tried the SOC, and successfully executed the test (RV32I) on modleSim.
Then came the final version, I tried to add division and multiplication instructions, and added some SOC peripherals, simulated by two top-level files, one is used to integrate all the components of the CPU, and the other is used to connect SOC&CPU
The instruction test can be successfully performed on windows, and the bitstream is successfully produced on vivado.
BUT!!!
My board hasn't arrived yet
Alright, a little too much nonsense, just like the teacher said
"talk is meaningless,show me the code!"
Pass the instruction set test of risc-v RV32I
we turned this rv32i code to binary file,and use it to test our RTL code's instruction fetch,decode,exe,mem_op's ability.
Determine the address of the command
pc_reg
According to the opcode to determine which type of instruction.
Use the function code (fun3) (fun7) to determine which command to generate the corresponding signal to the next stage
id_ex
I started by putting calculations into logic without prior announcement, and then my friends said I was stupid XD
I later verified that this would slow down the overall CPU performance
ram
As a first version external device.
rom
As a first version external device.
ctrl
Control the jump of B_TYPE instruction
connect each module.
Connect each module of SOC.
Make changes based on the first version, I added J_TAG bus UART TIMER and DIV to implement csr div mlu and other instructions (but I haven't overcome the OVERFLOW problem.
I have released some snippets of code to record my implementation process (I haven't slept well for a month…).
The main functions are: reset, jump, pause, address increment and other operations on the address signal of the instruction memory, that is, process the address of the instruction to generate the value of the PC register, which will be used as the instruction memory Address signal, used to read instruction content from rom.
The main function is: store the programmed instruction code, and output the instruction code according to the value of the PC register
Define a 32*4096 two-dimensional array as the space for storing data.
That is to store 32bit instruction codes, up to 4096 instruction codes can be stored, and the dimension of 4096 is the address corresponding to the instruction codes.
In the process of actually transplanting to FPGA, it is necessary to pay attention to the resource capacity of the FPGA used and adjust the size appropriately
I implemented this state machine with RTL and added it to the core. The biggest trouble I encountered in the middle was the control of interrupt, which would involve BUS, exe, and ctrl, because my model would use multiple cycles to complete the entire finite state machine
Each division operation requires at least 39 clock cycles.
important:
During the operation of signed data, the complement of the negative number is inverted and one is added. The purpose of inverting and adding one is obvious: it is actually to convert all negative numbers into positive numbers for calculation (because the complement form of negative numbers has a sign bit, so it cannot be directly calculated), and the final calculated result must also be a positive number. Finally, according to the sign of the divisor and the dividend, the quotient is operated (that is, whether to invert and add one)
Interrupt type:
External interrupts: interrupts generated by peripherals, interrupts that occur outside the processing core.
Timer interrupt (one of the external interrupts): controlled by the mtie field in the mie register.
Software interrupt: an interrupt triggered by the software (software language such as C language) itself.
Debug Interrupt: Interrupt when Debugging.
Interrupt masking: through the MIE register, to control different types of interrupt enable and mask (external interrupt, timer interrupt, software interrupt).
A jump is to change the value of the PC register.
And because whether to jump or not needs to be known at the execution stage, when a jump is required, the pipeline needs to be suspended
Machine Trap Vector=t_vec
Machine Exception Cause
Machine Exception PC
Machine Status
Write register: According to the last 12 bits of the write register address, store the data in the ex or clint module in the control and status register (CSR register).
Read register (combined logic): The address of the read register comes from the decoding id module, and the data read from the register is sent to the decoding id module (according to the last 12 bits of the read register address).
The address of the read register comes from the interrupt clint module, and the data read from the register is sent to the clint module (according to the last 12 bits of the read register address).
RISC-V interrupts are divided into two types, one is synchronous interrupts, that is, interrupts generated by ECALL, EBREAK and other instructions, and the other is asynchronous interrupts, that is, interrupts generated by peripherals such as GPIO and UART.
When an interrupt (interrupt return) signal is detected, first suspend the entire pipeline, set the jump address as the interrupt entry address, then read and write the necessary CSR registers (mstatus, mepc, mcause, etc.), and wait until these CSR registers are read and written After that, the pipeline suspension is canceled, so that the processor can fetch instructions from the interrupt entry address and enter the interrupt service routine.
always @(*) begin//控制中斷
if(rst==1'b0)begin
int_st=INT_IDLE;
end
else begin
if(inst_i==32'h73||inst_i==32'h00100073)begin
if(div_started_i==1'b0)begin
int_st=INT_SYNC;
end
else begin
int_st=INT_IDLE;
end
end
else if(int_flag_i!=8'h0&&global_int_en_i==1'b1)begin
int_st=INT_ASYNC;
end
else if(inst_i==32'h30200073)begin
int_st=INT_MERT;
end
else begin
int_st=INT_IDLE;
end
end
end
regs
Temporary data storage for decoding and execution
A register regs with a width of 32 bits and a depth of 32 bits is defined in the program.
1.Write register: store the data in ex or jtag in the register regs
2.2. Read register (combination logic):
The address of the read register comes from the decoding id module, and the data read from the register is sent to the decoding id module (regs).
The address of the read register comes from the jtag module, and the data read from the register is sent to the jtag module (jtag read register).
Because of the pipeline, when the current instruction is in the execution stage, the next instruction is in the decoding stage. Since the register will not be written in the execution stage, but the register write operation will be performed when the next clock arrives.
If the instruction in the decoding stage requires the result of the previous instruction, the value of the register read at this time is wrong.
For example, the following two instructions: add x1, x2, x3, add x4, x1, x5 The second instruction depends on the result of the first instruction. To solve this problem, if the read register is equal to the write register, the value to be written is directly returned to the read operation.
Assuming that a peripheral has an address bus and a data bus, and there are N peripherals in total, then the processor core has N address buses and N data buses, and each additional peripheral needs to be modified (the change is not small) ) core code.
With the bus, the processor core only needs one address bus and one data bus, which greatly simplifies the connection between the processor core and peripherals.
and why???
because
2. Download the uart program.
Since the program needs to be updated, it does not matter which step the program executes.
No need to consider other module requests, download directly, and re-run the new program (need to pause the pipeline)
3.ex.v execution module (memory read and write request)
unless the new program code is re-downloaded.
In the case that the program remains unchanged, it is necessary to ensure that the current instruction runs completely in order to ensure that subsequent operations will not go wrong (need to suspend the pipeline)
Select the slave device that needs to be operated through the case statement, and then pass the write_enable of the master to the slave to be written.
The bus supports multi-master and multi-slave connections, but only supports one master and one slave communication at the same time. A fixed priority arbitration mechanism is adopted between each master device on the RIB bus.
The highest 4 bits of the bus address determine which slave device to access, so up to 16 slave devices are supported.
Every 2 bits control 1 IO mode, supporting up to 16 IOs
0: high impedance, 1: output, 2: input
Step1: First design two registers: gpio_ctrl (control GPIO input and output mode); gpio_data (store GPIO input or output data).
Step2: Plan addresses for these two registers.
Step3: Through register addressing, write to the two registers defined above, and realize the input and output of GPIO by configuring the gpio_ctrl register.
Note that the following concepts need to be kept in mind when simulating TOP
When the configuration register gpio_ctrl[1:0] is 1, it means that GPIO is in output mode, and gpio_data[0] is output to the corresponding IO port. If gpio_ctrl[1:0] is not 1, it is 0 or 2, corresponding to high Both resistive and input modes set the GPIO to a high-impedance state for the following reasons:
High-impedance state is a common term in digital circuits. It refers to an output state of the circuit, which is neither high nor low. The impact is the same as not connected. If you use a multimeter to measure it, it may be high or low, depending on what is connected behind it.
The SPI protocol specifies 4 logical signal interfaces:
SCLK (Serial Clock, will be issued by the master)
MOSI (Master Out, Slave In)
MISO (Master In, Slave Out)
CS (Chip Select, because a master can communicate with several slaves, so CS is needed to select the slave to communicate with, and usually CS is enabled at low potential)
step1:set write_enable(we)always work
step2:cut_clk count the clk
step3:count SPI_CLK
step4:write regs
youtube
Step1: Define three registers
R&W
UART stands for Universal Asynchronous Receiver Transmitter.
Synchronous serial communication requires both communication parties to transmit data synchronously under the control of the same clock; asynchronous serial communication means that both communication parties use their own clocks to control the sending and receiving process of data.
A frame of data in the sending or receiving process of UART consists of 4 parts, start bit, data bit, parity bit and stop bit
The rate of serial port communication is represented by baud rate, which represents the number of bits of binary data transmitted per second, and the unit is bps.
then…TX sending
RX reception (partial)
Specific process:
a. When sending idle (that is, not sending data), (according to the protocol) keep the sending end set to 1; when sending data is valid (C language writes the data to be sent to the register UART_TXDATA), the sending end sends the start bit 0 (a counting cycle)
b. Control the counting threshold of the clock frequency division counter according to the agreed sending rate (baud rate), send data, first send the low bit and then send the high bit, after sending the data, set the sending end to 1, corresponding to the stop bit in the sequence; and update The corresponding bit of the receiving and sending status register UART_STATUS[0] <= 0;
c. Wait for the next sending (that is, the next sending data valid signal)
tips(from my friend)
Since the input and output pins of the FPGA serial port are at TTL level, 3.3V is used to represent the logic"1", 0V represents logic "0"; while the computer serial port uses RS-232 level, which is a negative logic level.
That is, -15V~-5V represents logic "1", and +5V~+15V represents logic "0". Therefore, when the computer communicates with the FPGA, it is necessary to add a level conversion chip
find all bis files
test all bin files
turn bin files to mem files
compile rtl files
When you need a lot of "types" of values, but the VALUE is the same, it will be very inconvenient when coding. You need to keep clicking on the prompt, and then you will be crazy. Why is a wire® designed like this
As a novice, I have never learned logic design. I remember that I was stuck for 14 hours on the fourth day because of a wrong judgment.
EXP
The function of the case statement is to assign different values to another signal (q in this example) when a signal (sel in this example) takes different values. Pay attention to the example on the left side of the figure below, such as sel=00, q takes the value of a, and sel=11, q takes the value of b.
What is not clear in this example is: what value will q be assigned if sel takes on valuesother than 00 and 11? In the example on the left below, the program is written in Verilog HDL, that is, the default is to keep the original value of q, which will automatically generate a latch.
I use the board format starting with xca7100… to write my xdc file
When I was writing a division model, I had a big problem with my logic, because I was looking at a strange China websites' guide, until my NTU EE friend told me that I couldn't write it for 20,000 years (I was try to solve this with multiplication of divisors… super dumb)
here is the answer look it properly
when generating the bitstream,I always thought the I got enought numbers of IO/ports untill I read the datasheet…
arty a7 100T
how to use them
I took riscv64-unknow-elf-gcc as my tool first,but I found that bin file would be too big for our SOC,so I tried -Os as compile method
&reduce size from linker useing strip,but sadly,the all faild,so I tried useing the toolchain for MCU(riscv-none-embed-),that means I have to give up some systemcall on my C code to fit the toolchain:(
put your toolchain in tools download
I took my homework 1's C code as test code
1.test_all_isa
go to sim folder and do this instruction
2.test C code
go to sim folder and do this instruction
cause I use riscv-none-embed-gcc as my tool on Windows,it means I have no need to use "newlib" and I abandon some systemcalls like printf(),but I have to say riscv-none-embed-gcc can deal with "newlib",it's just my personal chooice.
If success you can see this on your computer:
clips from C_test.c's dump file(Os as CFLAGS)
working…
I'm still working on my project…
here is Version 2.00
vedio
1.what is xdc???
2.refrence how do I write a xdc file
we got a sucess on generate bit stream…
Not in vain I slept less than six hours almost every day this month, and even dropped two courses QAQ
after 14 hours I finaly deal the last problem…
I got a question that someone asked me how to boot a non-os machine,
it is a great question.
In risc-v offical datasheet
安裝過程請"務必要看datasheet"而不是按照網路上的奇怪教學,對好版本,安裝所需的函式庫
If there are students who want to improve their own strength and are willing to spend time, this class is a blood push, super recommended, you never know where the teacher can push your limit, the teacher is also very serious in class, prepare The teaching materials are also very good, learning things is the second, and some values of the teacher are also worth learning. I was scolded by the teacher for a good sentence: "Are you talking like an engineer? How can an engineer use it?" "Should" and "probably" are used to describe your thoughts", in short, I think it is necessary to take this course to ensure that you can learn everything you want to learn!