[Computer Architecture 2022] Final Project
Contributed by < Peter
>
Code can be found here: <GitHub
>
Project Overview
In this project, I am going to build a RISC-V 32-bit which support R32I ISA.
There will be 2 version of this CPU:
- non-pipelined
- 5-state pipelined
In order to evaluate the performance of both 2 versions, I write total 3 test:
- Prog0: Evaluate implemented instruction
- Prog1: Sort signed 4-byte integers algorithm
- Prog2: Multiplication signed 4-byte integers
But first, take a look at the architecture.
Architecture
Top testbench overview
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
System block diagram
Non-pipelined version
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
CPU Block diagram non-pipelined version
5-state pipelined
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
CPU Block diagram with 5 stages
DMEM_controller
In charge of control address, data, and control signal for Data memory (DMEM). This unit oversees masking bit with 0 or 1 depending on what instruction is used. For details:
- Read/Write address
A[13:0]
: connect to ALU_output[15:2]
(word-based address)
WEB[3:0]
is write byte enable. It can be inferred from funct3
field of instruction
along with the byte offset which is the last 2 LSB of ALU_output
.
Register_File
Contains of 32 32-bit registers from x0 to x31. Register x0 is connected to ground (value 0). It can be used to read up to 2 different registers and write 1 register in the same clock.
CSR_File
At this moment, it only contains 2 64-bit register csr_cycle and csr_instret. Ability for read and write 1 register at the same time. Read ability support bit masking technique which allows user to hide sensitive information from being read without permission.
Supported CSR:
instret
: number of instruction has been procceed and,
cyclecnt
: number of clock cycle
Imm_Gen
Generating immediate value with 0 or signed extend for ALU; Generated immediate is based on ImmSel signal from Control_Unit.
Value of Imm is based on instruction which is shown in the below table:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Control_Unit
Using instruction to generate control signal for the pipeline. Details:
- ImmSel
- 3’b0: Imm Type I
- 3’b1: Imm Type S
- 3’b2: Imm Type J
- 3’b3: Imm Type U
- 3’b4: Imm for CSR instruction (shamt)
- BrUn: 1 if the current Branch instruction is Unsigned.
- ASel: Selector for ALU_DataA Mux:
- 2’b00: Data RS1
- 2’b01: PC
- 2’b10: Immediate
- 2’b11: 0
- BSel: Selector for ALU_DataB Mux:
- 2’b00: Data RS2
- 2’b01: Immediate
- 2’b10: CSR read data
- 2’b 11: 0
- MemRW: Control signal for DMEM read(0)/write(1) operation
- MemEn: High active Enable signal for Read/Write operation
- RegWEn: High active Enable signal for Register_File write operation.
- WBSel: Selector for WBData Mux:
- 2’b00: DMEM read data
- 2’b01: ALU result
- 2’b10: PC+4
- 2’b 11: 0
- ALUSel: Selector for ALU_Unit:
- 4’d0: Addition
- 4’d1: Subtraction
- 4’d2: Shift left logically
- 4’d3: Smaller than comparison (Unsigned)
- 4’d4: Smaller than comparison (Signed)
- 4’d5: XOR operation
- 4’d6: Shift right logically
- 4’d7: Shift right arithmetically
- 4’d8: OR operation
- 4’d9: AND operation
- 4’d10: Shift left logically using Immediate
- 4’d11: Shift right logically using Immediate
- 4’d12: Shift right arithmetically using Immediate
- 4’d13: AND operation with inverted value
- csr_ren/csr_wen: CSR read/ write Active high Enable
Forwarding_Unit
This unit collect information about Data Hazard with R-Type instruction. The operation for detect Data hazard is described in below picture. ForwardA/ForwardB
selector bit then be transferred to DataRS1/RS2 Mux
respectively.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Branch_Comparator
To deal with the Branch of current PC when B-Type instruction is detected, I create this Unit to compare value of RS1 and RS2 in EX stage. Depended on which B-Type instruction is running, BrEn
signal will be active or not. If BrEn
is 1 then pc_next
will be ALU result data.
Hazard handling process
Structural hazard
- Description: This type of hazard come from 2 or more instructions in the pipeline compete for access to a single physical resource.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Structural RegFile is used in ID and WB [2]
- Solution: Separate read and write port in Register_file. Now each instruction can read upto 2 register in ID stage and write 1 value in WB stage
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Register_File unit with 2 read port and 1 write port [2]
Data hazard- R-type instruction
- Description: Happened when data in previous instruction is stored in Register_File but current instruction already require its value for calculation. For example:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Data hazard when t0 register is used for many sequential instructions.
- Solution: Data forwarding from MEM or (and) WB stage to EX stage. This technique require more hardware (Forwarding)Unit) to detect wherether data is forwarded or not.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Data in MEM and WB stage is forwarded to EX stage in my CPU
- Example with sequence of 5 add t0,t1,t0 instruction
- id_rs1 = 5, id_rs = 6, id_rd= 5 for 5 next cycle
- ForwardASel/BSel: selection bit for MUX Data A and B
- 0: No hazard
- 1: Hazard in 1 next instruction, forward data from EX (cpu_out)
- 2: Hazard in 2 next instruction, forward data from MEM
- 3: Hazard in 2 next instructions, forward from MEM (mem_read_data)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Control hazard
- Description: Branch (beq, bne,…) and Jump (jal, jalr) instructions determine flow of control.
- Solution: Using the
Static Branch Prediction
(always not taking the Branch). In case of Branch instruction, I add a Branch_Comparision that will decide to take the branch or not in EX stage. But in case of Jump instruction, I add just let pipeline to load the 2-next instruction to IF and ID stage. After that, when the next PC is calculated in EX stage, I will flush the pipeline with STALL instruction.

Pipeline before flush (JAL in EX stage) [3]

Pipeline after flush (JAL in MEM stage)[3]
- Example with instruction BLE: func3= 3’b000; opcode=7’h63
- DataRS1= DataRS2 = 0xFFFFF000
- BrUn = 1 means this instruction is signed comparison
- BrEn = 1 which mean PC jump to PC+ Imm (ALU_out)
- Meanwhile, flush ID and IF so that next 2 instruction go to EX stage become NoP (instruction = 0x13)

Makefile explaination
Verification
Prog0: 48 RV32I instructions
prog0 to perform verification for the functionality of instructions. The code is provided by VSLI System Design course, EE department, NCKU and be writen in Assembly.
You can check it under the name main.S
in prog0
folder.
Prog1: Sorting algorithm
Requirement: The number of sorting elements is stored at the address named array_size
. The first element is stored at the address named array_addr
, others are stored at adjacent addresses. The maximum number of elements is 64. All elements are signed 4-byte integers and you should sort them in ascending order. Rearranged data should be stored at the address named _test_start
.
Since the implementation focus on RV32I, the multiplication and division is not necessary but we can still having the same function using C code.
Requirement: The multiplicand is stored at the address named mul1
. The multiplier is stored at the address named mul2
. Their product is signed 8-byte integers and should be stored at the address named _test_start
.
Result
Prog0
- Non-pipeline version
make prog0
- 5-stage pipelined version
make prog0
Prog1
- Non-pipelined version
make prog1
- 5-stage pipelined version
make prog1
Prog2
- Non-pipelined version
make prog2
- 5-stage pipelined version
make prog2
Hardware Usage
In this project, I implement both 2 version into Xilinx kintex-7 FPGA series.
Remember that the Instruction Memory and Data Memory are excluded from this report.
- Non-pipelined version

- 5-stage pipelined version

- Comparision:
- Number of Register increase for
48.24%
.
- Number of LUTs increase for
18.36%
(mostly from increasement of Register).
Timming Report
- Non-pipelined version

- 5-stage pipelined version

- Comparision:
WNS(5-stage) > WNS(non-pp)
which resulted in better maximum clock speed.