# Assignment3: Your Own RISC-V CPU ## Chisel Bootcamp Hardware designs are better expressed as generators, not instances var(mutable) and val(immutable variables), but recommended using *val* to reduce the chances of errors or making the program difficult to read caused by reusing variables. Scala doesn't require `;` at the end, the only time is when you want to fit multiple statements onto one line. Parameters of Function parameters are separated by commas (,), and their names and types must be specified. Unquoted functions with side effects (that don't change anything but simply return a value) do not use parentheses. Functions without side effects should require parentheses. The underscores `_` in `import chisel3._` are treated as wildcard characters (importing all categories and methods) Scala is an object-oriented language. Both `var` and `val` are objects, literal values like 10、"hello" are also objects, and functions themselves are also objects. **An object is an instance of a class**. In Scala’s object-oriented model, objects are considered instances in all important aspects. When defining a class, you must specify: * Data associated with the class: defined using `val` or `var` * Operations that class instances can perform, called methods or functions Inheritance and traits * A class can extend another class * The class being extended → superclass * The extending class → subclass * A subclass inherits the data and methods of its superclass * A class can extend or override inherited members in many useful and controlled ways * A class can inherit from traits * Traits can be viewed as lightweight classes that allow inheritance from multiple superclasses in specific and limited ways Singleton objects * An object is a special kind of Scala class * This is different from the objects (instances) mentioned above ```scala class RepeatString(s: String) { val repeatedString = s + s } // Create instance of RepeatString val myStringInstance = new RepeatString("HI") // Accessing the repeatedString field inside the object val finalResult = myStringInstance.repeatedString // Print the results println(finalResult) ``` Equivalent to `void` in C or `None` in Python, this indicates no return value. Such functions typically exist to perform certain side effects, e.g., writing data to a file, printing messages, or changing the internal state of an object or class. ```scala def myMethod(count: Int, wrap: Boolean, wrapValue: Int = 24): Unit = { ... } ``` ### 2.1 Chisel is a DSL(Domain Specific Language) in Scala ```scala // Chisel Code: Declare a new module definition class Passthrough extends Module { val io = IO(new Bundle { val in = Input(UInt(4.W)) val out = Output(UInt(4.W)) }) io.out := io.in // in combinationally connects 'in' and 'out', so 'in' drives 'out' } ``` `Passthrough` is the name of the Chisel module, with one 4-bit input `in` and one 4-bit output `out` The input and output ports names must be defined as io. When converting to Verilog, the io variables will be converted into a list of ports for the Verilog module.`:=` is a Chisel operator (directioned operator) indicates that the right-hand signal drives the left-hand signal. * elaboration : The process that use Scala to call the Chisel to translate Chisel module into Verilog module ### 2.2 **Never use `var` for hardware construct, because the construct itself may never change once defined.** the only thing that changes is its value (Voltage state on the signal) when running the hardware. `Wires` may be used for parameterized types. It is a signal line structure in Chisel used to declare that can be assigned multiple times (such as in multiple conditional branches). ```scala val utwo = 1.U + 1.U // The sum of two Chisel UInt hardware literals represents a hardware node within the Chisel. // This node represents a 1-bit adder signal wire in the circuit. val twotwo = 1.U + 1 // "type mismatch" will occur because 1.U is a hardware wire of value 1, while the later is a Scalar value of 1 ``` e.g. * 3'h4 : represents 3-bit hexidecimal value 4. * 2'd0 : represents 2-bit decimal value 0. ```scala io.out_add := 1.U + 4.U io.out_sub := 2.U - 1.U io.out_mul := 4.U * 2.U ``` ```verilog // 1.U + 4.U directly calculate 1+4=5, the output is 4bit (4'h5) wire [1:0] _T_3 = 2'h2 - 2'h1; // 2. U requires a minimum of two bits to represent (10), so 2'h2 // 1.U requires at least 1 bit, but since it needs to be aligned with 2.U for operations, it also requires 2 bits. Therefore, 2'h2 assign io_out_sub = {{2'd0}, _T_3}; // _T_3 is a 2-bit string with a value of 1, but the output requires 4 bits, so two leading zeros are added (2'd0) => 0001 wire [4:0] _T_4 = 3'h4 * 2'h2; // 4. U requires at least 3 bits to represent (100), so 3'h4 // 2. U is the same as before, 2'h2 //The result of the multiplication requires (3+2) 5 bits, so wire[4:0] //However, the output requires 4 bits, so _T_4[3:0] is used to extract the low 4 bits. ``` Chisel/Verilog bit width calculation * Addition : `max(a_width, b_width)` * Substraction : 'max(a_width, b_width)' * Multiplication :`a_width + b_width` So we have to align the bit width before addition and substraction `MUX` is like ternary(3 components) operator, with order (select, value if true, value if false) Prefer using `true.B` and `false.B` to create Chisel Bool `Cat` (concatenation) ordering is MSB and LSB, concat more than two values needs multiple `Cat` or some advanced features of Chisel and Scala. The bit width rule of `MUX` is `max(a_width, b_width)`, and the rule for `Cat` is the sum of widths of all the signals. The output bit width of a Normal Addition is equal to the maximum width of the two inputs, e.g., 4. The `+&` operator performs addition with carry; the input width is the larger of the two input widths plus 1. e.g. 4+1 = 5 ## 2.3 Control Flow If a component has multiple connect statements, only the last statement will be considered. ```scala io.out := 1.U io.out := 2.U io.out := 3.U io.out := 4.U ``` ### Conditional logic `when`, `elsewhen`, `otherwise` It's actually similar to if, else if, and else; they need to be used in order, but one of the latter two can be omitted. Differences : * The `when` clause is used to describe the conditional control flow in hardware. Its main body can be a complex block of code, and it can contain nested `when` clauses and related structures. * In Scala, if/else is simply an expression that evaluates and returns a value, so it can be assigned to a value, like this ```scala val scala_result = if (conditional) {valueA} else {valueB} ``` `when` is not an expression, and the `when` block of code does not return a value, so `if` cannot use `when` here. when calling `when(condition) {...}`, what we actually do is calling a method called `when`. This method will return a special intermediate component, and this intermediate has methods called `elsewhen` and `otherwise`. In order to call the methods on this component, we have to use `.` as the dot operator and it's like ```scala when(someBooleanCondition) { // things to do when true }.elsewhen(someOtherBooleanCondition) { // things to do on this condition }.otherwise { // things to do if none of th boolean conditions are true } ``` ### `Wire` construct single driver, there can be multiple receiving ends. Generate four consecutive `UInt` values, `0.U, 1.U, 2.U, 3.U` `... :: ... :: Nil` is a way to define or destruct a List. `::` is called Cons Operator, from left to right, Nil represents an Empty list ```scala val idle :: coding :: writing :: grad :: Nil = Enum(4) ``` ## 2.4 Sequential logic Register denoted `Reg`, it keeps the current output value and would not change until encountering the rising edge of clock will it capture it and update to the current input value. ### Implict clock By default, every Chisel `Module` has an Implicit clock(an input for clock and reset), This eliminates the need to repeatedly specify the same clock frequency in the code. ### Registers Created by calling `Reg(tpe)`, `tpe` is a variable that encodes the type of register we want. ```scala val register = Reg(UInt(12.W)) // means 12-bit ``` `step(n)` is to tell testing framework to simulate `n` times rising clock, which will cause the register to pass its input to its output. `step()` is only used in tests when sequential logic is involved. ```verilog ifdef Randomize // initialized the register to some random variable before simulation starts ``` `register` is updated on `posedge clock` ```scala val myReg = Reg(UInt(2.W)) // Correct because the Reg() function requires a data type as a model. val myReg = Reg(UInt(2.U)) // An error will occur because existing hardware nodes cannot be used as models. ``` ### `RegNext` Equivalent to a `Reg`, it passes its input value to its output in the next clock cycle. When using `RegNext`, you don't need to explicitly specify the register's bit width like with `Reg(UInt(W))`. ```scala io.out := RegNext(io.in + 1.U) // This indicates the new value that the register should hold in the next clock cycle. // The bit width of the temporary register is inferred from io.out. ``` ### `RegInit` Create a regiater that resets to a given value There are two versions: ```scala val myReg = RegInit(UInt(12.W), 0.U) val myReg = RegInit(0.U(12.W)) ``` First argument : type node that specified the datatype and its width Second argument : hardware node tha specified the reset value, 0 in this case(normally use 0). The generated Verilog will include an if(reset) block to reset the register to 0. (Exists in the `always @(posedge clock)` block) Before the reset signal is called, the initial value of Register will be random noise. The `PeekPokeTester` test skeleton will automatically call the reset function before running the test code. `reset(n)` means that the reset signal will remain high for n clock cycles. **reset is synchronous** so we need at least one clock cycle to ensure register is reset successfully. If we initialize the `Reg` as ```scala val reg: UInt = Reg(UInt(4.W)) // This means that `reg` is a `UInt` category and we can do anything that `UInt` can do. ``` `Int` & `BigInt` The former has a fixed range (32 bits in general), while the latter is a class belonging to an object type, which has an unlimited range, limited by system memory, and the number of bits can vary. ```scala val mask BigInt("1"*65, 2) //Generate a mask which can deal with any size (there are 65 1's in the mask) ``` ### Explicit clock and reset Sometimes it's necessary to override the implicit clock and reset signals provided by the Chisel module (because all temporary registers created within the module automatically use them). For example, there might be a black box component that generates its own clock or reset signal. Or, when designing a multi-clock design, different parts might require different clocks. ```scala // So Chisel provide these strutures to deal with it. withClock(){ ... } withReset(){ ... } withClockAndReset(){ ... } // Both being overwritten ``` * Clock signal has its own type in Chisel `Clock`, we should declare it as `Clock` when making declaration * `Bool` signal can be changed into Clock type by calling method `.asClock()` ```scala val signal = Wire(Bool()) val my_clock = signal.asClock() // Tell Chisel compiler to treat this Boolean circuit as a clock. // Then we can utilize it in withClock structure, withClock(my_clock){ // The register here will use signal as clock } ``` ## Module 2.6 chiseltest * poke(write) * c.io.in1.poke(6.U) * peek(read) * c.io.in1.peek() * expect(verify) * c.io.out1.expect(6.U) * step(clock) * c.io.clock.step(1) * initiate(activate) * test(...) { c => ... } Note that values of poked and expected are Chisel literals (like 6.U, true.B) ### Decoupled `Decoupled` takes a chisel data type and provides it with two signals. * `ready` represents if receiver is ready to receive data. * `valid` represents whether the data provided by source is valid. * `bits` : Any Chisel data can be wrapped in a `DecoupledIO` (used as the `bits` field) e.g. ```scala val myChiselData = UInt(8.W) val myDecoupled = Decoupled(myChiselData) ``` Then the `bits` field is Output(UInt(8.W)) ### Queue Module This module path through data whose type is determined by `ioType`, and there are `entries` state elements inside this module. That means it can hold many elements before it exerts backpressure. ```scala class QueueModule[T <: Data](ioType: T, entries: Int) extends MultiIOModule { val in = IO(Flipped(Decoupled(ioType))) // define input port val out = IO(Decoupled(ioType)) // define output port. It's the (receiver) sink of the data flow. out <> Queue(in, entries) } ``` `Decoupled(io.Type)` is for passing data and ready/valid signals. `Flipped` means it turns into source (transmitter) ### Fork & Join When using fork and join like this, `enqueueSeq` will continue to add elements until exhausted. And the second fork block will `expectDequeueSeq` on each cycle when data is available. ```scala val testVector = Seq.tabulate(300){ i => i.U } fork { c.in.enqueueSeq(testVector) }.fork { c.out.expectDequeueSeq(testVector) }.join() ``` ### `.Lit` Method The Scala `Int` x is converted into a Chisel `UInt` literal. `_.` in front of the field name is necessary to bind the name value to the bundle internals. ## Module 3.1 Generators: Parameters Add these `require` in the module to make sure the inputs are sensical. ```scala require(in0Width >= 0) require(in1Width >= 0) require(sumWidth >= 0) ``` ### Map `Map("a" -> 1)` defines a key/value pair if we `val b = map("b")`, `key not found` will occur. Because `map("b")` equals to call `map.apply("b")`, and there's no key called "b". But if we `val b = map.get("b")`, the result would be `None`. ### A class called `Option` It lets user to check Scala types. Generally, if a function cannot return the result, it will return `null`, but when we use `Option`, it returns Some or None to show if the value is found or not. When using `getOrElse`, if we get the value successfully, it will return the actual value, if not, return the default value in `getOrElse(2)`. ```scala val some = Some(1) val none = None println(some.get) // Returns 1 // println(none.get) // Errors! println(some.getOrElse(2)) // Returns 1 println(none.getOrElse(2)) // Returns 2 ``` ### Match/Case Statements `match` is like `switch` in C, ```scala val sequence = Seq("a", 1, 0.0) sequence.foreach { x => // pick out "a" and find which case in `match{}` actually matches it. x match { case s: Int => println(s"$x is an Int") case s: String => println(s"$x is a String") case s: Double => println(s"$x is a Double") case _ => println(s"$x is an unknown type!") } } ``` ### Implicit Arguments Define a singleton object called CatDog, and define a implicit val called numberOfCats with Int type and value 8. And then we define a method called tooManyCats, the first parameter list is nDogs, and the second one is an implicit parameter called nCats. When calling this `tooManyCats` method, if users didn't provide the actual value in the second parameter list, which means implicit `nCats`, then Scala compiler will automatically find a value that has the same type and labeled with `implicit` and pass it. ```scala object CatDog { implicit val numberOfCats: Int = 4 //implicit val numberOfDogs: Int = 5 def tooManyCats(nDogs: Int)(implicit nCats: Int): Boolean = nCats > nDogs val imp = tooManyCats(9) // Argument passed implicitly(find same type implicit value and pass it)! val exp = tooManyCats(2)(1) // Argument passed explicitly! } CatDog.imp CatDog.exp ``` If there are two or more implicit values of a given type, it will fail. `sealed trait` means `Silent` and `Verbose` are the only two possible object for `Verbosity`. ```scala sealed trait Verbosity implicit case object Silent extends Verbosity case object Verbose extends Verbosity ``` ### Implicit Conversions Compiler will automatically find the implicit conversion method that meets the specific condition in the scope and use it. So even if class `Human` doesn't have `species` field, we can call species on it by implementing an **implicit conversion** ```scala class Animal(val name: String, val species: String) class Human(val name: String) implicit def human2animal(h: Human): Animal = new Animal(h.name, "Homo sapiens") val me = new Human("Adam") println(me.species) ``` ### .litValue `.litValue` will turn the result into pure number (BigInt) Otherwise, more debugging information is shown. ### Misc Function Blocks * PopCount : Returns the number of high (1) bits in the input as a `UInt`. * Reverse : Reverse returns the bit-reversed input. * OneHot encoding utilities : * `UIntToOH` : Turns a number `n` into an one-hot encoding, and the nth bit is 1 (counting from right side, start from 0) * `OHToUInt` : Turns an one-hot encoding into a number, output is the nth bit that is 1 * Muxes * PriorityMux : Output the value associated with the lowest-index asserted select signal * Mux1H : If there are more than two select signal are true, sum all of the value. * Counter : It's not a module. It's a counter that can be incremented once every cycle, up to some specified limit. It's value is accessible. ```scala new Module { val io = IO(new Bundle { val count = Input(Bool()) val out = Output(UInt(2.W)) }) val counter = Counter(3) // 3-count Counter (outputs range [0...2]) when(io.count) { counter.inc() } io.out := counter.value } ``` ### `zipWithIndex` `List.zipWithIndex` has type signature `zipWithIndex: List[(A, Int)]` It returns a list like this, each element is a tuple of the original elements with the index. ```scala println(List("a", "b", "c", "d").zipWithIndex) ``` ```markdown List((a,0), (b,1), (c,2), (d,3)) ``` ### Reduce ```scala println(List(1, 2, 3, 4).reduce((a, b) => a + b)) // returns the sum of all the elements // Output : 10 println(List(1, 2, 3, 4).reduce(_ * _)) // returns the product of all the elements // Output : 24 println(List(1, 2, 3, 4).map(_ + 1).reduce(_ + _)) // you can chain reduce onto the result of a map // Output : 14 ``` ```scala println(List[Int]().reduce(_ * _)) // `Reduce` will fail with an empty list ``` ### Fold `fold()` inside () is the initial value, and the operation is similar to `reduce`. For example : ```scala println(List(1, 2, 3, 4).fold(1)(_ + _)) // like above, but accumulation starts at 1 ``` ```scala println(List(1, 2, 3, 4).fold(2)(_ * _)) // The result is 2*(1 * 2 * 3 * 4) ``` ## 1-single-cycle ### Instruction Fetch 1. Control Flow When `jump_flag_id` is true : * PC should jump to the address specified by `jump_address_id` * Including instructions lile `jal, jalr, beq, bne, blt, bge, bltu, bgeu` 2. Sequential Execution When `jump_flag_id` is false : * PC should +4 since the length of instruction in RV32I is fixed 4 bytes So I utilize `Mux` to finish the implementation : ```scala pc := Mux(io.jump_flag_id, io.jump_address_id, pc + 4.U) ``` If memory is not ready, we hold PC and insert `NOP` to prevent illegal instruction execution, this allows pipeline to continue. Finally, output the current PC value. ### Instruction Decode First, I have to determine when to write back from Memory, and when to write back PC+4. The default of writeback data source is ALUResult, including most of the arithmetic/logic instructions. And when the instruction is LOAD, this type of instruction read data from memory and write back to the register. The last condition is when the instructions are jal and jalr, these two instructions have to write back the next instruction address (save PC+4 to the register) ```scala val wbSource = WireDefault(RegWriteSource.ALUResult) when(isLoad) { wbSource := RegWriteSource.Memory } .elsewhen(isJal || isJalr) { wbSource := RegWriteSource.NextInstructionAddress } ``` Second part is to determine when to use PC as first operand. To fill the blank here, I have to consider which instructions need PC-relative addressing or need PC value for operand. For Branch, Jump, and AUIPC these three main types of instructions, PC have to be the first operand to calculate the target address. ```scala val aluOp1Sel = WireDefault(ALUOp1Source.Register) when(isBranch || isJal || isAuipc) { aluOp1Sel := ALUOp1Source.InstructionAddress } ``` For these instructions, they use immediate as second operand. So the condition is like : ```scala val needsImmediate = isLoad || isStore || isOpImm || isBranch || isLui || isAuipc || isJal || isJalr val aluOp2Sel = WireDefault(ALUOp2Source.Register) when(needsImmediate) { aluOp2Sel := ALUOp2Source.Immediate } ``` Here is to complete S-type immediate extension ```scala val immS = Cat( Fill(Parameters.DataBits - 12, instruction(31)), // Sign extension instruction(31, 25), // High 7 bits, which means imm[11:5] instruction(11, 7) // Low 5 bits, which means imm[4:0] ) ``` Also B-type immediate extension ![image](https://hackmd.io/_uploads/HkxuvVUBfWx.png) And the last one is J-type immediate extension ### Execution This stage is to perform ALU computation and determine if there is a branch. To calculate the target address of conditional branch, we need PC-relative immediate => Target = PC + immediate offset ```scala val branchTarget = io.instruction_address + io.immediate ``` The calculation for JAL target address is given by instructor as below ```scala val jalTarget = branchTarget ``` It's same as the way to calculate branch target, To calculate the target address of jalr instruction, we have to sum the value of rs1 register and immediate. The second step is to clear LSB because there's a criterion that instruction address must align the 2-byte boundary, so it's necessary to clear LSB to make sure the address is "even" number. So that's why we use `& -1` to set LSB to 0. ```scala val jalrSum = io.reg1_data + io.immediate val jalrTarget = jalrSum & (~1.U(Parameters.DataBits.W)) ``` ### Memory Access Sign-extension is for dealing with signed integer, to preserve the sign of the numerical value. Zero-extension is for unsigned integer keep all the value positive. The process to deal with LH and LHU is merely the same as the way dealing with LB and LBU, but LH and LHU is to treat 16-bit data. For LW, it doesn't need expansion since it's already 32-bit, so we can directly use it. The following part is to finish store data alignment and byte strobes. RISC-V allows program to store the data in byte and halfword, so we need to move them to the left to align the right position. Then the program generates four strobes to tell the memory precisely that in this 32 bits set, which byte, which 2, or all of the 4 bytes are going to be written in. Before implementing the store logic, I have to understand what does `Byte Strobe` mean `strobeInit` is a vector that inlcudes 4 Boolean values, and they're all initialized to false. `writeStrobes` is a vector with length of 4 So `Byte Strobe` is a control signal in the memory system to indicate which byte should be written in, and which byte should keep unchanged. If mem_address_index = 0, we enable the data update to the Byte 0 and write in a byte. If mem_address_index = 1, enable Byte 1 update, and the data will be moved left 1*2^3^ = 8, that is, to the Byte 1 and write in. The following logic is very similar. ### Write Back In this stage, the data read from memory or the computed data will be written into registers. Here we utilize `MuxLookup` to implement. But here comes the question : * Write enable signal was generated in ID stage, but the correct data will be ready until WB stage. * Will incorrect data be written to the register file before the data is ready? After my reflection and research, I think the answer is a solid NO since there are 3 key assurances : * We have pipeline register, which isolates each of the stage. Each stage has its own register storing status. Signal won't jump to the future stage. * Control signal and the data are transmitted synchronously, they all pass through the same pipeline (data starts from IF). So when they all arrive at WB stage, they will be aligned perfectly. * The last reason is that the data will only be written into register in WB stage, not in Id, EX, MEM stage. ## 2-mmio-trap The second work is to implement RISC-V CPU with MMIO peripherals and trap handling. Here are two new modules for handling traps. ### CSR CSR means Control and Status Registers It's a special set of registers being used for controling the operation mode of processor, handling interrupts and exceptions, storing status information, and providing some priority managements. 1. Machine Mode CSRs * `mstatus` is Machine Status, for controlling the global status of the processor. * `mie` is Machine Interrupt Enable, for controlling which interrputs to be activated. * `mip` is Machine Interrupt Pending, showing which interrupts are pending. * `mtvec` is Machine Trap Vector, which is the entry address of the above processing procedures. * `mepc` is Machine Exception PC, for storing the value of PC when exceptions occur. * `mcause` is Machine Cause, to indicate the reasons for interrupts/exceptions. * `mtval` is Machine Trap Value, providing additional information of exceptions. * `mscratch` is Machine Scratch, temporarily storing a general value, is usually used for storing context And there are some extension instructions : 1. CSRRW (CSR read and write) `csrr rd, csr` CSR -> register `csrw csr, rs` register -> CSR 2. CSRRS (CSR read and set) 3. CSRRC (CSR read and clear) ### CLINT CLINT means Core-Local Interrupt Controller ## 3-pipeline In `Forwarding.scala`, its function is to feed the results from MEM or WB stage back to EX. It's also called **bypassing**. It's an optimized technique for fixing data hazards, especially RAW (Read After Write). The reason why we need to feed back the results from those two stages can be simply described because the instruction behind needs the results from the instruction in the front. But those results haven't yet been written back to the register, but have already been calculated. For the example here : ``` add x1, x2, x3 sub x4, x1, x5 ``` Traditional approach has to wait until `add` finish WB stage, write back result to register, then `sub` can read it. But this causes `sub` have to **stall** for two cycle. So Forwarding approach can store the result in pipeline registers. At the same time, the hardware will create a shortcut to feed the value that hasn't write back to the register into the instruction in EX stage. ==Demand Side== `rs1_ex`/`rs2_ex` denote numbers of the two source registers resuired by the instruction currently in EX stage. Later, forwarding unit will compare these two numbers to the stage behind, checking if there's anyone just want to write in these two registers. ==Provider Side== `rd_mem`/`rd_wb` is numbers of target registers, indicate the numbers that instructions in the front are expected to write in. `reg_write_enable` indicates whether the instruction in the front is sure to write in. ### Early branch resolution The reason why there's also the risk of data hazard in ID stage is we want to persue efficient CPU design, so we expect to know the **result of branch** as early as possible. This could reduce branch penalty. Branch decision is done in ID stage, but if the required value is just calculated by the previous instruction in MEM stage. We want to decide whether to branch earlier, so the result which is just calculated should be bypassed to ID stage. But if the instruction before branch instruction is `LW`, the result is available when MEM stage finished. So even if we use forwarding technique, it's still useless, and CPU would stall for two cycle. * Conclusion: Branch will still stall for at least one cycle even with forwarding technique (in the condition that two instructions are adjacent) * Arithmetic instruction + branch instruction : Stall at least one cycle * Load word instruction + branch instruction : Might stall two cycles , at least one. ### Hazard Detection `Control.scala` is for dealing with three conditions : 1. Load-use Hazard : Data from `LW` will be produced after MEM stage finished, so we have to stall for one cycle. 2. Jump-related Hazard : Since we moved the decision of branch/jump to ID stage, the demand of data becomes really urgent, although this technique reduces the penalty of wrong guess. 3. Control Hazard : When ID stage decided to jump, which means `jump_flag` is true. There are three crucial signals : * `io.id_flush` is to empty ID/EX register. This will abandon the instruction which is originally intended to enter EX stage, and substitute with a **NOP**, which means no operation. * `io.if_stall` signal will freeze IF/ID register. This makes the instruction in ID stage stall for one cycle, which means staying in the same place. * `io.pc_stall` freezes PC register, IF unit will stay in the same address, it won't catch new instructions for the next cycle. ## Waveform Analysis After testing my `q1_uf8 decode/encode` and `fast_rsqrt` on final five-stage pipeline, I kept on examining their waveform. ### Instruction Flow First, to confirm my code is running as expected, I analyze the `inst_fetch`->`if2id`->`id`->`ex` #### IF/IF2ID We can see PC is correctly increasing, and see the content in `io_id_instruction` is the instruction that will enter into ID stage so we can also see these signals in if2id. More importantly, we can observe when the signal from `instruction_io_flush` is 1 , the pipeline will be cleared because branch or jump have occurred. ![image](https://hackmd.io/_uploads/BJBQRr_7Ze.png) #### Register Read & Write To verify every single instruction can correctly update the registers. The values in `registers_4`, `registers_6` and `registers_9` changes when `io_write_enable = 1` and `io_write_data` has something to write in the registers. So we can see when `io_write_enable` is one and there's something to be written in, then the value in the target register changes at clock positive edge. ![image](https://hackmd.io/_uploads/HJz09r_mZx.png) So it can be confirmed that register read and write functionality operates properly. #### Branch/Jump `ex` : Analyze the results of branch decision and target address calculation. `ctrl` : Take a look at the pipeline flush signals Here we can see the instrution is `00008067` which is `ret` in RISC-V, it's an unconditional jump instruction. So `io_Flush` and `io_JumpFlag` signals will be set to 1 to clear the instructions in the pipeline when encounter next clock positive edge. ![image](https://hackmd.io/_uploads/S198RlYmZe.png)