# FPGA Project Checkpoint 1 Questions 1. How many stages is the datapath you've drawn? (i.e. How many cycles does it take to execute 1 instruction?) - Three stages - it takes three cycles to execute 1 instruction. We split into IF+ID, EX+MEM, and WB stages as there were dependencies on synchronous elements (e.g. IMEM, DMEM, RegFile write (not read!)). 2. How do you handle ALU → ALU hazards? ``` addi x1, x2, 100 addi x2, x1, 100 ``` - We can extend the mux that feeds into the ALU into two bits so we can data forward our s1_alu_res directly into one of the operands for the next instruction. 3. How do you handle ALU → MEM hazards? ``` addi x1, x2, 100 sw x1, 0(x3) ``` - We can send back the result of the register right after the ALU as inputs to DMEM. We also add two muxes, each before addr and din of DMEM that selects between the forwarded result and the original inputs. 4. How do you handle MEM → ALU hazards? ``` lw x1, 0(x3) addi x1, x1, 100 ``` - We can send back the read data from the DMEM and feed it into one of the inputs the MUXes that feeds into the ALU. 5. How do you handle MEM → MEM hazards? ``` lw x1, 0(x2) sw x1, 4(x2) ``` also consider: ``` lw x1, 0(x2) sw x3, 0(x1) ``` - add another bit to the MUX we had feeding into the address and din of our DMEM to route s1_dataR into either of these commands. 6. Do you need special handling for 2 cycle apart hazards? ``` addi x1, x2, 100 nop addi x1, x1, 100 ``` - Yes, we need to add a MUX right after the read in the regfile to route the s3_wb in the case there is a data dependency. 7. How do you handle branch control hazards? (What is the mispredict latency, what prediction scheme are you using, are you just injecting NOPs until the branch is resolved, what about data hazards in the branch?) - Mispredict latency for branches is 1 cycles as we resolve the branch comparison in Stage 2. Since the blocks that connect to the PCSelect signal and the Comparator is entirely combinational, we can just overwrite the instruction that is being fetched of the mispredict. The newly fetched instruction will be 1 stage apart from the branch instruction, as the branch instruction is now in stage 3 (EX + MEM) and and the newly fetched instruction is in stage 1 (ID). - Given that, we have added a MUX in the decode stage and if we saw a branch in the last 1 instruction, we will insert 1 NOP to stall. Also, we added another input to the MUX for PCSel in the case we don't take the jump but just want to execute PC+4 instruction after stalling - we do so by adding another adder that subtracts 4 to the current PC + 4 so we stop advancing PC for 1 cycle. 8. How do you handle jump control hazards? Consider jal and jalr separately. What optimizations can be made to special-case handle jal? - In this case, we dont need to inject any NOPs for jal, and we need to inject 1 NOP for jalr. - We add an additional ALU in stage 1 (ID) to calculate the offset for jal, which gets forwarded immediately back as the input to the IMEM, so we don't need stalling. - For jalr, since we have to wait for regfile's output, we stall for 1 cycle, similar to how we handle branches. 9. What is the most likely critical path in your design? - We think the following path is the most likely critical path: DMEM output -> data memory MUX -> Load extender -> WB Mux -> (ALU to ALU) data forwarding MUX -> Branch comparator -> control logic -> PC sel 10. Where do the UART modules, instruction, and cycle counters go? How are you going to drive ``` uart_tx_data_in_valid ``` and ``` uart_rx_data_out_ready ``` (give logic expressions)? * The UART modules, instruction, and cycle counters go on the cpu module. Specifically, they are in between the EX+MEM and WB stage. The outputs for the UART directly connect to the wb mux, and the inputs are driven by the following: `uart_tx_data_in_valid = (instruction is store) && addr == 32'h80000004` `uart_rx_data_out_ready = (instruction is load) && addr == 32'h80000008` 11. What is the role of the CSR register? Where does it go? - CSR or control status register is a state element that is stored independent of the RegFile and the memory. These are used to track and monitor the state of our CPU's operating status. It goes in the MEM stage in parallel to the DMEM as it uses the output to the regfile as the data_in for the CSR. The CSR is located in Stage 2 of our 3-Stage pipeline. 12. When do we read from BIOS for instructions? When do we read from IMem for instructions? How do we switch from BIOS address space to IMem address space? In which case can we write to IMem, and why do we need to write to IMem? How do we know if a memory instruction is intended for DMem or any IO device? - We read from BIOS for instructions when the FPGA first starts up or when the reset button is pressed. We read from IMEM for instructions after we receive a signal from UART to jump to an address in IMEM. We need to write to IMEM because user program is not initially loaded, only BIOS is, and we need to load user program by writing to IMEM. We know the instruction is intended for DMem or any IO device by looking at its [31:28] bits and matching them with the address space partitioning table.