林志懋
Make soomrv
support RISC-V rv64 execution.
Frontend:
Execution:
Since the implementation of the fetch stage is tighlty coupled with the memory subsystem, I will skip that part for now. The decode stage mainly produces two info, one being the decoded internal representation, D_UOp
, and one for special instructions that change the control flow like wfi
, ecall
… Adding rv64 instructions won't change the latter. Changes have been committed to commit.
For every valid decode uOPs, D_UOps
, the rename module will allocate a ROB entry for it, specified by the sequence number SqN
. At the same time, the result of the source operands validity will be checked by RenameTable
or register alias table (RAT). Sources that are already present in the physical register will be marked valid in the output R_UOp
(availA
, availB
, availC
) and the index into the ROB are also stored in the uop.
Rename module will enqueue decoded R_UOps
into one of the IssueQueue
s.
R_UOp
s that reside in the queue will update source availability every cycle. When the instruction is ready, it will be issued to its target funcitonal unit.
To support rv64, immediates will need 64-bit to encode. While modifying bitwidth of the immediates in IssueQueue
, I came across the following section and couldn't figure out the special encoding used for these three opcodes. Will need to look into branch unit where these immediates where decoded and used.
// Special handling for jalr
if (HasFU(FU_BRANCH) && enqCandidates[i].fu == FU_BRANCH &&
(enqCandidates[i].opcode == BR_V_JALR || enqCandidates[i].opcode == BR_V_JR ||
enqCandidates[i].opcode == BR_V_RET)) begin
assert(IMM_BITS == 36);
assert(NUM_OPERANDS == 2);
// Use {imm[0], tags[1]} to encode 8 bits of imm12
temp.tags[NUM_OPERANDS-1] = Tag'(enqCandidates[i].imm12[6:0]);
temp.imm[0] = enqCandidates[i].imm12[7];
// rest goes into upper 4 bits of 36 (!) immediate bits
temp.imm[IMM_BITS-1-:4] = enqCandidates[i].imm12[11:8];
// tags[1] is not used for register encoding, thus is always valid
temp.avail[NUM_OPERANDS-1] = 1;
end
So this is just to save registers. We need to send jalr
predicted address down the pipeline, therefore using unused fields to store the immediate and restore when dequeued.
typedef struct packed
{
logic[IMM_BITS-1:0] imm;
logic[NUM_OPERANDS-1:0] avail;
Tag[NUM_OPERANDS-1:0] tags;
...
} R_ST_UOp;
We need to extend IMM_BITS
since the bitwidth of tags
wont change.
Commit
Since uOps piped to IssueQueue
all store register file tags, but not the actual register value, this Load
stage is responsible of reading the actual physical regsiter value. The validity of the value is resolved in IssueQueue
so Load
can guarantee the update to date value.
Commit
ALU will operate on 64-bit instead of 32-bit values.
Commit
rv64 added a new instruction MULW
which operates on 32-bit source registers and produce 64-bit sign extended value of the 32-bit result. All other MUL*
instructions are 64-bit.
Commit
Same with Multiply
, divide added several new instructions that operates on 32-bit values, DIVW
, DIVUW
, REMW
, REMUW
.
Commit
Because rv64 can operate on 64-bit values directly, there is no need to split some CSRs into two halves. mstatus
, mcycle
, and other CSRs now do not require a separate read to the *h
version to form the full CSR value.
Commit
I have left some CSRs unchanged because I'm not interested in them for now.
Load/Store instructions unlike arithmetic instructions require address translation when virtual memory is involved. Start from address generation unit AGU
, the virtual address will be translated into physical address either by looking up the TLB or by a page table walk, performed by a hardware PageTableWalker
. SoomRV
employs VIPT where the cache is indexed by virtual address, so an early load signal eldUOp
is used to signal the cache.
Commit
After obtaining the physical address, the command will be enqueued into one of the load queue or store queue. The store queue makes sure that the memory is consistent with the committed instructions. Only when the store instruction is committed can other components see the change to memory. The load queue enables load bypass, which checks if the load value can be forwarded from the store queue.