# Superscalar out-of-order RISC-V capable of booting Linux
> 林志懋
Make `soomrv` support RISC-V rv64 execution.
## Todo
- [ ] uOPs definition
Frontend:
- [ ] Inst fetch
- [ ] Decoder
- [ ] Rename
- [ ] Branch prediction logic
Execution:
- [ ] Isssue Queue
- [ ] Operand collect
- [ ] ALU
- [ ] Branch
- [ ] CSR
## Decode
Since the implementation of the fetch stage is tighlty coupled with the memory subsystem, I will skip that part for now. The decode stage mainly produces two info, one being the decoded internal representation, `D_UOp`, and one for special instructions that change the control flow like `wfi`, `ecall` ... Adding rv64 instructions won't change the latter. Changes have been committed to [commit](https://github.com/millaker/SoomRV/commit/337226a89d8edd46c446785fe5f9ff571db10b1b).
## Rename
For every valid decode uOPs, `D_UOps`, the rename module will allocate a ROB entry for it, specified by the sequence number `SqN`. At the same time, the result of the source operands validity will be checked by `RenameTable` or register alias table (RAT). Sources that are already present in the physical register will be marked valid in the output `R_UOp` (`availA`, `availB`, `availC`) and the index into the ROB are also stored in the uop.
## IssueQueue
Rename module will enqueue decoded `R_UOps` into one of the `IssueQueue`s.
`R_UOp`s that reside in the queue will update source availability every cycle. When the instruction is ready, it will be issued to its target funcitonal unit.
To support rv64, immediates will need 64-bit to encode. While modifying bitwidth of the immediates in `IssueQueue`, I came across the following section and couldn't figure out the special encoding used for these three opcodes. Will need to look into branch unit where these immediates where decoded and used.
```verilog
// Special handling for jalr
if (HasFU(FU_BRANCH) && enqCandidates[i].fu == FU_BRANCH &&
(enqCandidates[i].opcode == BR_V_JALR || enqCandidates[i].opcode == BR_V_JR ||
enqCandidates[i].opcode == BR_V_RET)) begin
assert(IMM_BITS == 36);
assert(NUM_OPERANDS == 2);
// Use {imm[0], tags[1]} to encode 8 bits of imm12
temp.tags[NUM_OPERANDS-1] = Tag'(enqCandidates[i].imm12[6:0]);
temp.imm[0] = enqCandidates[i].imm12[7];
// rest goes into upper 4 bits of 36 (!) immediate bits
temp.imm[IMM_BITS-1-:4] = enqCandidates[i].imm12[11:8];
// tags[1] is not used for register encoding, thus is always valid
temp.avail[NUM_OPERANDS-1] = 1;
end
```
So this is just to save registers. We need to send `jalr` predicted address down the pipeline, therefore using unused fields to store the immediate and restore when dequeued.
```verilog
typedef struct packed
{
logic[IMM_BITS-1:0] imm;
logic[NUM_OPERANDS-1:0] avail;
Tag[NUM_OPERANDS-1:0] tags;
...
} R_ST_UOp;
```
We need to extend `IMM_BITS` since the bitwidth of `tags` wont change.
[Commit](https://github.com/millaker/SoomRV/commit/6608361c213829c5ab6a4d3c4e4abd8c9bb2dc4f)
## Load
Since uOps piped to `IssueQueue` all store register file tags, but not the actual register value, this `Load` stage is responsible of reading the actual physical regsiter value. The validity of the value is resolved in `IssueQueue` so `Load` can guarantee the update to date value.
[Commit](https://github.com/millaker/SoomRV/commit/11f4a59e98b095e1d4f6b08ae877b6b3e6ff7ca7)
## IntALU
ALU will operate on 64-bit instead of 32-bit values.
[Commit](https://github.com/millaker/SoomRV/commit/32dd8e0f382eb89ad2e7f3b687961305b307c244)
## Multiplier
rv64 added a new instruction `MULW` which operates on 32-bit source registers and produce 64-bit sign extended value of the 32-bit result. All other `MUL*` instructions are 64-bit.
[Commit](https://github.com/millaker/SoomRV/commit/310aaf05c8c1f71279438d6d31ede0acf82e4f1a)
## Divider
Same with `Multiply`, divide added several new instructions that operates on 32-bit values, `DIVW`, `DIVUW`, `REMW`, `REMUW`.
[Commit](https://github.com/millaker/SoomRV/commit/91a60ff2e292fa64d0beadc782e70660e71c3970)
## CSR
Because rv64 can operate on 64-bit values directly, there is no need to split some CSRs into two halves. `mstatus`, `mcycle`, and other CSRs now do not require a separate read to the `*h` version to form the full CSR value.
[Commit](https://github.com/millaker/SoomRV/commit/261c204ceef00c6045894d97ea41d7b782884e5f)
I have left some CSRs unchanged because I'm not interested in them for now.
## LoadStore system
Load/Store instructions unlike arithmetic instructions require address translation when virtual memory is involved. Start from address generation unit `AGU`, the virtual address will be translated into physical address either by looking up the TLB or by a page table walk, performed by a hardware `PageTableWalker`. `SoomRV` employs VIPT where the cache is indexed by virtual address, so an early load signal `eldUOp` is used to signal the cache.
[Commit](https://github.com/millaker/SoomRV/commit/8c879158f26b129fe536e8571ec354cf27f4dc8f)
After obtaining the physical address, the command will be enqueued into one of the load queue or store queue. The store queue makes sure that the memory is consistent with the committed instructions. Only when the store instruction is committed can other components see the change to memory. The load queue enables load bypass, which checks if the load value can be forwarded from the store queue.
[Commit](https://github.com/millaker/SoomRV/commit/7712c5b55f82f14de9f399d9566e72c9848da5de)
[Commit](https://github.com/millaker/SoomRV/commit/fffde980d32bb4f3db3f0a448825d17a4f696098)