Try   HackMD

Superscalar out-of-order RISC-V capable of booting Linux

林志懋

Make soomrv support RISC-V rv64 execution.

Todo

  • uOPs definition

Frontend:

  • Inst fetch
  • Decoder
  • Rename
  • Branch prediction logic

Execution:

  • Isssue Queue
  • Operand collect
  • ALU
  • Branch
  • CSR

Decode

Since the implementation of the fetch stage is tighlty coupled with the memory subsystem, I will skip that part for now. The decode stage mainly produces two info, one being the decoded internal representation, D_UOp, and one for special instructions that change the control flow like wfi, ecall Adding rv64 instructions won't change the latter. Changes have been committed to commit.

Rename

For every valid decode uOPs, D_UOps, the rename module will allocate a ROB entry for it, specified by the sequence number SqN. At the same time, the result of the source operands validity will be checked by RenameTable or register alias table (RAT). Sources that are already present in the physical register will be marked valid in the output R_UOp (availA, availB, availC) and the index into the ROB are also stored in the uop.

IssueQueue

Rename module will enqueue decoded R_UOps into one of the IssueQueues.
R_UOps that reside in the queue will update source availability every cycle. When the instruction is ready, it will be issued to its target funcitonal unit.

To support rv64, immediates will need 64-bit to encode. While modifying bitwidth of the immediates in IssueQueue, I came across the following section and couldn't figure out the special encoding used for these three opcodes. Will need to look into branch unit where these immediates where decoded and used.

  // Special handling for jalr
  if (HasFU(FU_BRANCH) && enqCandidates[i].fu == FU_BRANCH &&
  (enqCandidates[i].opcode == BR_V_JALR || enqCandidates[i].opcode == BR_V_JR ||
  enqCandidates[i].opcode == BR_V_RET)) begin
    assert(IMM_BITS == 36);
    assert(NUM_OPERANDS == 2);

    // Use {imm[0], tags[1]} to encode 8 bits of imm12
    temp.tags[NUM_OPERANDS-1] = Tag'(enqCandidates[i].imm12[6:0]);
    temp.imm[0] = enqCandidates[i].imm12[7];

    // rest goes into upper 4 bits of 36 (!) immediate bits
    temp.imm[IMM_BITS-1-:4] = enqCandidates[i].imm12[11:8];

    // tags[1] is not used for register encoding, thus is always valid
    temp.avail[NUM_OPERANDS-1] = 1;
  end

So this is just to save registers. We need to send jalr predicted address down the pipeline, therefore using unused fields to store the immediate and restore when dequeued.

typedef struct packed
{
    logic[IMM_BITS-1:0] imm;

    logic[NUM_OPERANDS-1:0] avail;
    Tag[NUM_OPERANDS-1:0] tags;
...
} R_ST_UOp;

We need to extend IMM_BITS since the bitwidth of tags wont change.
Commit

Load

Since uOps piped to IssueQueue all store register file tags, but not the actual register value, this Load stage is responsible of reading the actual physical regsiter value. The validity of the value is resolved in IssueQueue so Load can guarantee the update to date value.
Commit

IntALU

ALU will operate on 64-bit instead of 32-bit values.
Commit

Multiplier

rv64 added a new instruction MULW which operates on 32-bit source registers and produce 64-bit sign extended value of the 32-bit result. All other MUL* instructions are 64-bit.
Commit

Divider

Same with Multiply, divide added several new instructions that operates on 32-bit values, DIVW, DIVUW, REMW, REMUW.
Commit

CSR

Because rv64 can operate on 64-bit values directly, there is no need to split some CSRs into two halves. mstatus, mcycle, and other CSRs now do not require a separate read to the *h version to form the full CSR value.
Commit

I have left some CSRs unchanged because I'm not interested in them for now.

LoadStore system

Load/Store instructions unlike arithmetic instructions require address translation when virtual memory is involved. Start from address generation unit AGU, the virtual address will be translated into physical address either by looking up the TLB or by a page table walk, performed by a hardware PageTableWalker. SoomRV employs VIPT where the cache is indexed by virtual address, so an early load signal eldUOp is used to signal the cache.
Commit

After obtaining the physical address, the command will be enqueued into one of the load queue or store queue. The store queue makes sure that the memory is consistent with the committed instructions. Only when the store instruction is committed can other components see the change to memory. The load queue enables load bypass, which checks if the load value can be forwarded from the store queue.

Commit
Commit