蔡承遠, 郭晏愷
The F extension adds 32 floating-point registers, f0–f31, each 32 bits wide, and a floating-point control and status register fcsr, which contains the operating mode and exception status of the f loating-point unit.
Regs | ABI Name | Description |
---|---|---|
f0-f07 | ft0-7 | Temporaries |
f08-09 | fs0-1 | saved regs |
f10-11 | fa0-1 | args/ return values |
f12-17 | fa2-7 | args |
f18-27 | fs2-11 | saved regs |
f28-31 | ft8-11 | Temporaries |
The 2-bit floating-point format field fmt in bit number 26
and 25
is encoded asshown in the table. It is set to S(00) for all instructions in the F extension.
fmt | Mnemonic | Meaning |
---|---|---|
00 | S | 32-bit single-precision |
01 | D | 64-bit double-precision |
10 | - | reserved |
11 | Q | 128-bit quad-precision |
Some instructions for example
Inst. | Name | Description |
---|---|---|
fmadd.s | mul-add | rd = rs1 * rs2 + rs3 |
fadd.s | add | rd = rs1 + rs2 |
fmul.s | mul | rd = rs1 * rs2 |
fdiv.s | div | rd = rs1 / rs2 |
fsqrt.s | square root | rd = sqrt(rs1) |
flt.s | less than | rd = (rs1 < rs2) ? 1 : 0 |
fcvt.s |
rm(12~14, funct3) bit meaning in F instruction is to describe the rounding mode
Rounding Mode | meaning | Mnemonic |
---|---|---|
0d0 | Round to Nearest, ties to Even. | RNE |
0d1 | Round towards Zero | RTZ |
0d2 | Round Down (towards −∞) | RDN |
0d3 | Round Up (towards +∞) | RUP |
0d4 | Round to Nearest, ties to Max Magnitude | RMM |
0d5 | Invalid. | |
0d6 | Invalid. | |
0d7 | In instruction’s rm field, selects dynamic rounding mode; In Rounding Mode register, Invalid. |
rd bit meaning in F-Classify instruction.
bit | meaning |
---|---|
0d0 | rs1 is −∞. |
0d1 | rs1 is a negative normal number. |
0d2 | rs1 is a negative subnormal number. |
0d3 | rs1 is −0. |
0d4 | rs1 is +0. |
0d5 | rs1 is a positive subnormal number. |
0d6 | rs1 is a positive normal number. |
0d7 | rs1 is +∞. |
0d8 | rs1 is a signaling NaN. |
0d9 | rs1 is a quiet NaN. |
For 32-bit single-precision floating point, the value = sign * exponent * fraction.
sign | exponent | fraction | |
---|---|---|---|
bit number | 31 | 23-30 | 22-0 |
length | 1 | 8 | 23 |
special values | |||
0 | 0 | 00000000 | all zero |
-0 | 1 | 00000000 | all zero |
1 | 0 | 01111111 | all zero |
-1 | 1 | 01111111 | all zero |
min. sub-normal number | * | 00000000 | 000 0000 0000 0000 0000 0001 |
max. sub-normal number | * | 00000000 | all one |
min. normal number | * | 00000001 | all zero |
max. normal number | * | 11111110 | all one |
-∞ | 1 | 11111111 | all zero |
∞ | 0 | 11111111 | all zero |
NaN | * | 11111111 | not all zero |
Base on 5-stage-RV32I from kinzafatim. The principle to implement RV32F is make a floating-point unit (FPU) parallel with Arithmetic logic unit (ALU). A FPU control and a FPU register unit are also needed. these three units are compared to ALU, which are ALU, ALU control and ALU register.
The figure below is a quick draw using matlab simulink to know the whole structure. Because matlab simulink doesn't support some feactures like mux selector, so it shows red dotted line in the picture.
src/main/scala/Pipeline/UNits/Control.scala
In order to control and decide the data read/write data in FPU or ALU and simplify the structure that some wires (e.g: data wire between memory and register…) can be common used outside ALU and FPU. We add a FPU enable port fpu_en
to decide this operation using ALU or FPU. Port fpu_operation
decode the F instruction type from opcode. Because we have rs3
for RV32F R4-type instruction, adding operand_C
for operand C source selection is necessary.
src/main/scala/Pipeline/UNits/FPU_Control.scala
While reciving the enable signal fpu_enable
, this module reads the instruction type from fpu_op
port, encodes the operation code in different methods depending on the instruction type. After encoding the operation code, output to FPU via fpu_out
port.
fpu_op
reads the instruction type decoded from control module.
fpu_funct3
reads the function 3 in the instruction.
fpu_funct7
reads the function 7 in the instruction.
fpu_op5
reads the opcode(6, 2) in the instruction.
Below is the encoded operation code lookup table, which built in FPU:
operation Code | instruction | sub-instruction identify in FPU |
---|---|---|
00 | fadd | |
01 | fsub | |
02 | fmul | |
03 | fdiv | |
04 | fsgn | rm(0/1) |
05 | fmin/fmax | rm(0/1) |
11 | fsqrt | |
16 | fmadd | |
17 | fmsub | |
18 | fnmsub | |
19 | fnmadd | |
20 | feq/flt/fle | rm(0/1/2) |
24 | fcvt.w.s/fcvt.wu.s | rs2(0/1) |
26 | fcvt.s.w/fcvt.s.wu | rs2(0/1) |
28 | fmv/fclass | rm(0/1) |
30 | fmv.s.x |
src/main/scala/Pipeline/UNits/FPU.scala
fmt
and rm
are considered only here.Include embedded comparator modules FP_COMP
. FP_COMP
file locate in src/main/scala/Pipeline/UNits/FP_COMP.scala
.
The implemented floating-point operations include addition, subtraction, multiplication, division, square root, fused multiply-add, less-than comparison, and conversions between floating-point and integer types.
We used different io.fpu_Op
values to select and execute specific floating-point operations in our FPU module. Each operation corresponds to a distinct block of logic within the switch
statement, which dynamically determines the behavior based on the provided opcode.
A_data_in
, B_data_in
, C_data_in
are three inputs from the external environment.
COMP
is used for comparison between two values.
carry
is a register indicating whether a carry is generated during mantissa operations.
exp_diff
is an 8-bit register storing the difference between the exponents of two input floating-point numbers, used for mantissa alignment.
A_sign
, B_sign
, C_sign
are three registers representing the sign bit of A_data_in
, B_data_in
, and C_data_in
, respectively.
The adder (FADD) is designed to perform precise addition operations between two single-precision floating-point numbers.
carry
to ensure accuracy.The subtractor (FSUB) operates similarly to the adder (FADD), with the key difference being how the sign bit of input B is handled. The aligned mantissas are either added or subtracted based on the XOR of the sign bits. Therefore we changed the code of Temp_Mantissa
from adder to subtractor.
If the MSB of the
Mantissa_product = 1
(indicating overflow), the mantissa must be right-shifted, and the exponent must be incremented to maintain normalization.= _
=If
Mantissa_prodct = 0
, the exponent remains unchanged.
The target is to solve , where N is the input number.
Newton's method uses the formula:
For , the .Substituting these:
Start with an initial guess , and repeat the iteration formula three times to converge to
Combines multiplication and addition into a single operation to improve efficiency and minimize precision loss.
Utilizes comparator modules (COMP_20 and COMP_21) to perform numerical comparisons.
io.rm
to Select Operation Based on Mode:src/main/scala/Pipeline/UNits/F_Reg.scala
A independent register for FPU, similar structure to the original register file in src/main/scala/Pipeline/UNits/RegisterFile.scala
. But RV32F R4-type needs rs3 operation, here we add F_rs3
and F_rdata3
port.
src/main/scala/Pipeline/Main.scala
Compare to original project, we set 2 parallel mux and use ID_EX_.io.ctrl_FPU_en_out
decide ALU/FPU read data. ID_EX_.io.ctrl_OpA_out === "b01".U
is for UJ-type and JALR, no need to care.
Forwarding A (rs1):
rs2 value controlled by ID_EX_.io.ctrl_FPU_en_out
& Forwarding.io.forward_b
to choose which wire to read. it also connect to EX_MEM_M.io.IDEX_rs2
. Because RS2 also hold the immediate in both ALU/FPU. for ALU/FPU rs2 data input, can be controlled by ID_EX_.io.ctrl_OpB_out
to read the immediate or rs2 value.
Forwarding B (rs2):
Forwarding C (rs3):
Because we have rs3 for R4-type. a series of io for rs3 hazard is needed.
src/main/scala/Pipeline/Hazard Units/Forwarding.scala
EX_HAZARD and MEM_HAZARD operation method for rs1~rs3 are the same.
src/main/scala/Pipeline/Hazard Units/HazardDetection.scala
To avoid rs3 make error message, here we try to pre-processing the rs3, if it is not R4-type. Bypass rs3 hazard detection.
src/main/scala/Pipeline/Hazard Units/StructuralHazard.scala
Same method for rs1-rs3.
src/main/scala/Pipeline/Pipelines/IF_ID.scala
Same as the original structure. No modification is needed.
ID_EX module connects the control module and register file. Add ports for control module is needed. Also, rs3
concerned port added for R4-type (include operand c, OpC). Some ports decode from the instruction like fmt
, op5
and rm
concerned are for FPU and FPU_Control module.
In src/main/scala/Pipeline/Main.scala
, we use fpu enable pin to decide ID-EX register output data goto ALU/FPU, also decide some inputs reads from which wire.
src/main/scala/Pipeline/Pipelines/EX_MEM.scala
Because Some wires are merged like rd, rs2 … etc in front stage and controlled by IDEX_fpu_en
. Here we add fpu_out
port to pass the result from FPU, and add EXMEM_fp_en
pass IDEX_fpu_en
signal.
src/main/scala/Pipeline/Pipelines/MEM_WB.scala
Same as EX-MEM module. Just add a fpu_out
port pass data, and add MEMWB_fp_en
pass EXMEM_fp_en
signal.
Addition or modification is not needed. But we modify the address input for ALU/FPU memory operation. Because data input from io.EXMEM_rs2_out
was selected in fordwarding B section, modifacation is no needed.
In src/main/scala/Pipeline/Main.scala
:
the write-back data pass through wire d to reg and rs1~rs3 data selector.
In src/main/scala/Pipeline/Main.scala
:
Register and F register share a wire from MEMWB_reg_w_out
to reg_write
/F_reg_write
, and a wire from MEMWB_rd_out
to w_reg
/F_w_reg
. these 2 connections controlled by MEM_WB_M.io.MEMWB_fp_en
.
In src/main/scala/Pipeline/Main.scala
:
https://people.eecs.berkeley.edu/~krste/papers/riscv-spec-2.0.pdf
https://github.com/nozomioshi/ChiselRiscV
https://github.com/chadyuu/riscv-chisel-book
https://github.com/kinzafatim/5-Stage-RV32I