RISC-V - HackMD

--- title: 'Project documentation template' disqus: hackmd --- Build your own RISC-V Core === ## Table of Contents [TOC] ## What is RISC-V ### RISC-V: Open-source instruction set architecture (ISA). Designed to be modular and customizable. Gaining popularity for its open nature and flexibility. ### MIPS: A proprietary instruction set architecture. Historically used in various applications, including early gaming consoles. RISC-based like RISC-V but not open source. ### Comparison: * Openness: RISC-V is open source, while MIPS is proprietary. * Flexibility: RISC-V is highly customizable, allowing for diverse implementations. MIPS (and others like ARM, x86) are more fixed in its architecture. * Popularity: RISC-V is gaining momentum, especially in academia and open-source projects. RISC-V follows a similar structure to MIPS, however its instruction format is actually different. ![image](https://hackmd.io/_uploads/H16QzGy8a.png) ## What are HDLs HDLs, or Hardware Description Languages, are specialized programming languages used for the design and description of electronic circuits and systems. HDLs enable engineers and designers to model the behavior and structure of digital circuits at various levels of abstraction, from high-level functionality to low-level hardware details. The two most common HDLs are Verilog and VHDL. ### Abstraction Levels: HDLs allow designers to work at different levels of abstraction, from high-level descriptions of system behavior down to low-level details of the electronic components. ### Simulation: HDLs support simulation, enabling designers to test and verify the functionality of a digital circuit before it is physically implemented. This helps catch errors and refine designs early in the development process. ### Synthesis: HDL code can be synthesized into netlists or hardware descriptions that define the actual electronic components and their interconnections. This synthesis process is a crucial step in converting high-level design descriptions into a form that can be implemented in hardware. Common HDLs: Verilog and VHDL are the most widely used HDLs. Verilog is prevalent in the industry, especially in the United States, while VHDL is commonly used in Europe. Both languages serve similar purposes, but they have different syntax and styles. Engineers and digital designers use HDLs to describe the intended behavior of digital systems, and these descriptions can be used for simulation, synthesis, and ultimately, the fabrication of electronic circuits. ### TL-Verilog * Introduces simpler syntax unlike other HDL like System Verilog or Verilog, hence reduces the number of lines in a code resulting in fewer bugs. * Is more flexible. Easier to optimize your logic without bugs. * Is “timing abstract” for pipelines, which makes retiming easy and safe. * Knows when signals are valid, which provides easier debug, cleaner design, better error checking, and automated clock gating. * Visual Debug (VIZ) is an additional feature to the Makerchip platform which makes the debugging process much easier. So, we dig into waveforms only when needed!TL-Verilog: ![Screenshot_2023-12-07_17-52-15](https://hackmd.io/_uploads/SJTeUMJI6.png) Makerchip --- You can code, compile, simulate, and debug Verilog designs, all from your browser. Your code, block diagrams, waveforms, and novel visualization capabilities are tightly integrated for a seamless design experience. Makerchip introduces ground-breaking capabilities for advanced Verilog design, it also makes circuit design easy and fun! This is the website will be using for write our code: https://makerchip.com/ ### How to use Week 1 --- ### Digital Logic Basics In digital circuits, wires stabilize to one of two voltages: a high voltage (VDD) or a low voltage (VSS or ground). So, a wire carries a boolean value, where high and low voltages can be viewed as 1/0, true/false, asserted/deasserted, on/off, etc. This provides an important abstraction for composing higher-order logic functions with predictable behavior. Logic gates are the basic building blocks for implementing logic functions. The table below shows basic logic gates. Their function is defined by the "truth tables", which show, for each combination of input values (A & B), what the output value (X) will be. Be sure to understand the behavior of each gate. ![image](https://hackmd.io/_uploads/BkAgOGkIp.png) ![image](https://hackmd.io/_uploads/BJ0BdGJ8p.png) You can use parentheses to group expressions to form more complex logic functions. If a statement is extended to multiple lines, these lines must have greater indentation than the first line. Statements must always end with a semicolon. Always have a space before and after the "=". For example: ```gherkin= $foo = ( $val1 && $val2) || (! $val1 && ! $val2); ``` ## Lab 1 Throughout this course we will be using the TL-Verilog inside of [ Makerchip website](https://makerchip.com/sandbox/#). Your design get saved on the page you are working, this means that you should always save your progress cliking the cloud in the corner and then *save the link* of the website you are currently working at in order to be able to work on it later. ### TL-Verilog Syntax #### Indentation In TL-Verilog (within \TLV code blocks), indentation and whitespace are meaningful. Tabs (which have no consistently-defined behavior) are not permitted. Each level of indentation is 3 spaces (and the Makerchip editor helps with this). #### Signal Names As long as you stick with the suggested signal names throughout this course, you won’t have any trouble, but, for those who might wish to veer off from the script a bit, TL-Verilog is picky about signal names too. While languages typically leave choices like camel-case vs. underscore delimitation up to coding conventions, TL-Verilog enforces these choices. #### Naming restrictions serve several purposes: They enforce consistency. They distinguish types. As TL-Verilog is processed into Verilog, auto-generated logic can use Verilog signal names that cannot conflict with those named by the coder. Specifically, TL-Verilog signal names: Are prefixed with "$". Are composed of tokens delimited by underscores, where each token is a string of lower-case characters followed by zero or more digits . Begin with at least two alphabetic (lower-case) characters. So, for example, these are legal signal names: ``` $my_sig $val1 ```` And these are not: ``` $a $Sig (this is actually a "state signal", which we will not use in this course) $val_1 ``` ### Inverter Go to [makerchip](https://www.makerchip.com/sandbox#) and in place of "//...", type ```$out = ! $in1;```. Be sure to preserve the 3-spaces of indentation, similar to the surrounding expressions. This is an inverter. **Results** ![image](https://hackmd.io/_uploads/BJWbg_lLp.png) ![image](https://hackmd.io/_uploads/HJUMxOgUT.png) ### Full Adder A full adder is a digital circuit that performs addition on three binary digits. The three input bits are usually labeled A, B, and Cin (carry-in), and the outputs are the sum (S) and the carry-out (Cout). The full adder takes into account not only the two bits to be added (A and B) but also a carry bit from the previous stage of addition (Cin). #### Truth Table: ![image](https://hackmd.io/_uploads/r1PG-_gIT.png) - [ ] As you did with the inverter, try other single-gate logic expressions. - [ ] If you are new to hardware description languages (HDLs), try coding the full adder circuit as depicted above. Try it first with each logic gate as a separate statement, then try combining the three gates producing $carry_out into a single statement with parentheses to group subexpressions ![image](https://hackmd.io/_uploads/SkBKgugUT.png) ### More Syntax In TL-Verilog, the most common data types are booleans (as you used in the previous lab) and bit vectors. A vector is declared by providing a bit range in its assignment as so: ```$vect[7:0] = ....;``` Bit ranges are generally not required on the right-hand side of an expression. When they are used, they extract a subrange of bits from a vector signal. In Verilog and TL-Verilog, arithmetic operators, like +, -, *, /, and % (modulo) can be used on vectors. Without these operators, an adder circuit would have to be constructed by replicating the full adder circuit we looked at earlier for each bit position in the adder to create the "ripple-carry adder" circuit depicted below. ![image](https://hackmd.io/_uploads/SyLZfOlU6.png) ### Full Adder Vectorized ```$out[7:0] = $in1[6:0] + $in2[6:0];``` ![image](https://hackmd.io/_uploads/rJKvG_lI6.png) ### Muxes One of the most important logic functions is a multiplexer (or MUX), depicted below. ![image](https://hackmd.io/_uploads/SkhtGueIp.png) A multiplexer selects between two or more inputs (which can be binary values, vectors, or any other data type). The select line(s) identify the input to drive to the output. Most often, the select will be either a binary-encoded input index or a "one-hot" vector in which each bit of the vector corresponds to an input. One and only one of the bits will be asserted to select the corresponding input value. The MUX depicted in the "Two-way single-bit multiplexer" graphic above can be constructed from basic logic gates, as seen below. We might read this implementation as "assert the output if X1 is asserted and selected (by S == 1) OR X2 is asserted and selected (by S == 0)". TL-Verilog favors the use of the ternary (? :) operator, and we will stick with this throughout the course. In its simplest form, the ternary operator is: ```$out = $sel ? $in1 : $in0;``` This can be read, "$out is: if $sel then $in1 otherwise $in0." The ternary operator can be chained to implement multiplexers with more than two input values from which to select. And these inputs can be vectors. We will use very specific code formatting for consistency, illustrated below for a four-way, 8-bit wide multiplexer with a one-hot select. (Here, $in0-3 must be 8-bit vectors.) ``` $out[7:0] = $sel[3] ? $in3 : $sel[2] ? $in2 : $sel[1] ? $in1 : //default $in0; ``` ![image](https://hackmd.io/_uploads/SJrGQOlLp.png) ### Calculator This circuit implements a calculator that can perform +, -, *, or / on two input values. Provide an expression for each of the signals named below, and be sure to use the exact names shown and the select encodings shown on the MUX inputs. This will be important later. ![image](https://hackmd.io/_uploads/S1Mjm_g8a.png) Results ![image](https://hackmd.io/_uploads/H1wrv_eLp.png) ``` $sum[31:0] = $val1[31:0] + $val2[31:0]; $sub[31:0] = $val1[31:0] - $val2[31:0]; $mul[31:0] = $val1[31:0] * $val2[31:0]; $div[31:0] = $val1[31:0] / $val2[31:0]; $out[31:0] = $op[1:0] == 2'b00 ? $sum : $op == 2'b01 ? $sub : $op == 2'b10 ? $mul : $div ; ``` ### Even more syntax Literals If you are familiar with Verilog expression syntax, you may safely skip this "Concept and Syntax" section. This expression: ```$foo[7:0] = 6;``` defines $foo to hold a constant value of 6. In this case, the 6 is coerced to eight bits by the assignment. Often, it is necessary to be explicit about the width of a literal: ```$foo[7:0] = 8'd6;``` explicitly assigns $foo to an 8-bit decimal ("d") value of 6. (To be clear, the "’" is the single-quote character.) Equivalently, we could have written: ```$foo[7:0] = 8'b110; // 8-bit binary six``` or ```$foo[7:0] = 8'h6; // 8-bit hexadecimal``` Concatenation Concatenation of bit vectors is simply the combining of two bit vectors one after the other to form a wider bit vector. The syntax is clear from this example: ```$word[15:0] = {$upper_byte, $lower_byte};``` ### How do we know if it works? We can use concatenation on our previous code to modify the inputs such that we can just see results based on 3 bits rather than 31, as follows: ``` $val1[31:0] = {28'b0,$val1_rand[3:0]}; $val2[31:0] = {28'b0,$val2_rand[3:0]}; $sum[31:0] = $val1[31:0] + $val2[31:0]; $sub[31:0] = $val1 - $val2; $mul[31:0] = $val1 * $val2; $div[31:0] = $val1 / $val2; $out[31:0] = $op[1:0] == 2'b00 ? $sum : $op == 2'b01 ? $sub : $op == 2'b10 ? $mul : $div ; ``` ### Adding some visuals - [ ] Paste this single line below the "m4_makerchip_module" line to include the visualization library: m4_include_lib(['https://raw.githubusercontent.com/stevehoover/LF-Building-a-RISC-V-CPU-Core/main/lib/calc_viz.tlv']). It may be necessary to correct the single-quote characters by retyping them after cut-and-pasting. - [ ] Add this line as the last line in the \TLV region: "m4+calc_viz()" to instantiate the visualization. Press <Ctrl>-Enter. Result: ![image](https://hackmd.io/_uploads/ByEGpuxIT.png) ### Sequential Logic Sequential logic introduces a clock signal. ![image](https://hackmd.io/_uploads/B1gTp_xIT.png) The clock is driven throughout the circuit to "flip-flops" which sequence the logic. Flip-flops come in various flavors, but the simplest and most common type of flip-flop, and the only one we will concern ourselves with, is called a "positive-edge-triggered D-type flip-flop". These drive the value at their input to their output, but only when the clock rises. They hold their output value until the next rising edge of their clock input. ![image](https://hackmd.io/_uploads/H1hT6de86.png) Before getting too theoretical about sequential logic, let’s look at an example. Let’s look at a circuit that computes the Fibonacci sequence. Each number in the Fibonacci sequence is the sum of the previous two numbers: 1, 1, 2, 3, 5, 8, 13, … This circuit will perpetually compute the next number in the sequence: ![image](https://hackmd.io/_uploads/HkAWA_x8T.png) ### Fibonacci ![image](https://hackmd.io/_uploads/H1cVAdxI6.png) ```$num[31:0] = $reset ? 1 : (>>1$num + >>2$num);``` ![image](https://hackmd.io/_uploads/rkPLCugUT.png) ### Counter ![image](https://hackmd.io/_uploads/BJcuCux8a.png) ``` $reset = *reset; $cnt[15:0] = $reset ? 16'b0 : >>1$cnt + 1; ``` ### Sequential Calculator Remember how old calculators work? We are going to modify our previouse one to work like one of those. A real (old-school) calculator displays the result of each calculation. It holds onto this result value and uses it as the first operand in the next computation. If you enter "+ 3" in the calculator, it adds three to the previous result. Let’s update our calculator to act like this. Each cycle, we’ll perform a new calculation, based on the previous result. This previous result is state. And wherever we have state, we must have a $reset that will set that state to a known value. As in a real calculator, we will reset the value to zero. To recirculate the result ($out), and reset it to zero, we would have: ![image](https://hackmd.io/_uploads/HyQFWKxUp.png) - [ ] Return to your combinational calculator project. - [ ] Assign $val1[31:0] to the previous value of $out (replacing its old assignment). - [ ] Add a $reset signal and a new (highest priority) MUX input to reset $out to zero. ``` $reset = *reset; $val1[31:0] = >>1$out; $val2[31:0] = {28'b0,$val2_rand[3:0]}; $sum[31:0] = $val1[31:0] + $val2[31:0]; $sub[31:0] = $val1 - $val2; $mul[31:0] = $val1 * $val2; $div[31:0] = $val1 / $val2; $out[31:0] = $reset ? 32'b0 : $op[1:0] == 2'b00 ? $sum : $op == 2'b01 ? $sub : $op == 2'b10 ? $mul : $div ; ``` ## Lab 2 Likely, you have experience writing programs in languages like Python, JavaScript, Java, C++, etc. These languages are portable and can run on just about any CPU hardware. CPU’s do not execute these languages directly. They execute raw machine instructions that have been encoded into bits as defined by an instruction set architecture (ISA). Popular ISAs include x86, ARM, MIPS, RISC-V, etc. A compiler does the job of translating a program’s source code into a binary file or executable containing machine instructions for a particular ISA. An operating system (and perhaps a runtime environment) does the job of loading the binary file into memory for execution by the CPU hardware that understands the given ISA. ![image](https://hackmd.io/_uploads/ryHIMYxIa.png) ### RISC-V ![image](https://hackmd.io/_uploads/H1l0fYe86.png) RISC-V is also popular for its simplicity and extensibility, which makes it a great choice for this course. "RISC", in fact, stands for "reduced instruction set computing" and contrasts with "complex instruction set computing" (CISC). RISC-V (pronounced "risk five") is the fifth in a series of RISC ISAs from UC Berkeley. You will implement the core instructions of the base RISC-V instruction set (RV32I), which contains just 47 instructions. Of these, you will implement 31 (Of the remaining 16, 10 have to do with the surrounding system, and 6 provide support for storing and loading small values to and from memory). Like other RISC (and even CISC) ISAs, RISC-V is a load-store architecture. It contains a register file capable of storing up to 32 values (well, actually 31). Most instructions read from and write back to the register file. Load and store instructions transfer values between memory and the register file. RISC-V instructions may provide the following fields: * opcode: Provides a general classification of the instruction and determines which of the remaining fields are needed, and how they are laid out, or encoded, in the remaining instruction bits. * function field (funct3/funct7) : Specifies the exact function performed by the instruction, if not fully specified by the opcode. * rs1/rs2 : The indices (0-31) identifying the register(s) in the register file containing the source operand values on which the instruction operates. * rd :The index (0-31) of the register into which the instruction’s result is written. * immediate : A value contained within the instruction bits themselves. This value may provide an offset for indexing into memory or a value upon which to operate (in place of the register value indexed by rs2). All instructions are 32 bits. The R-type encoding provides a general layout of the instruction fields used by all instruction types. R-type instructions have no immediate value. Other instruction types use a subset of the R-type fields and provide an immediate value in the remaining bits. ![image](https://hackmd.io/_uploads/SJsJXYlLa.png) ### Starting point: https://makerchip.com/sandbox?code_url=https:%2F%2Fraw.githubusercontent.com%2Fstevehoover%2FLF-Building-a-RISC-V-CPU-Core%2Fmaster%2Frisc-v_shell.tlv ### Layout ![image](https://hackmd.io/_uploads/rJvzNFg8T.png) 1. PC Logic This logic is responsible for the program counter (PC). The PC identifies the instruction our CPU will execute next. Most instructions execute sequentially, meaning the default behavior of the PC is to increment to the following instruction each clock cycle. Branch and jump instructions, however, are non-sequential. They specify a target instruction to execute next, and the PC logic must update the PC accordingly. 2. Fetch The instruction memory (IMem) holds the instructions to execute. To read the IMem, or "fetch", we simply pull out the instruction pointed to by the PC. 3. Decode Logic Now that we have an instruction to execute, we must interpret, or decode, it. We must break it into fields based on its type. These fields would tell us which registers to read, which operation to perform, etc. Register File Read The register file is a small local storage of values the program is actively working with. We decoded the instruction to determine which registers we need to operate on. Now, we need to read those registers from the register file. 4. Arithmetic Logic Unit (ALU) Now that we have the register values, it’s time to operate on them. This is the job of the ALU. It will add, subtract, multiply, shift, etc, based on the operation specified in the instruction. 5. Register File Write Now the result value from the ALU can be written back to the destination register specified in the instruction. 6. DMem Our test program executes entirely out of the register file and does not require a data memory (DMem). But no CPU is complete without one. The DMem is written to by store instructions and read from by load instructions. In this course, we are focused on the CPU core only. We are ignoring all of the logic that would be necessary to interface with the surrounding system, such as input/output (I/O) controllers, interrupt logic, system timers, etc. Notably, we are making simplifying assumptions about memory. A general-purpose CPU would typically have a large memory holding both instructions and data. At any reasonable clock speed, it would take many clock cycles to access memory. Caches would be used to hold recently-accessed memory data close to the CPU core. We are ignoring all of these sources of complexity. We are choosing to implement separate, and very small, instruction and data memories. It is typical to implement separate, single-cycle instruction and data caches, and our IMem and DMem are not unlike such caches. ### PC Logic ![image](https://hackmd.io/_uploads/B1V5VKlLT.png) Initially, we will implement only sequential fetching, so the PC update will be, for now, simply a counter. Note that: The PC is a byte address, meaning it references the first byte of an instruction in the IMem. Instructions are 4 bytes long, so, although the PC increment is depicted as "+1" (instruction), the actual increment must be by 4 (bytes). The lowest two PC bits must always be zero in normal operation. Instruction fetching should start from address zero, so the first $pc value with $reset deasserted should be zero, as is implemented in the logic diagram below. Unlike our earlier counter circuit, for readability, we use unique names for $pc and $next_pc, by assigning $pc to the previous $next_pc. ![image](https://hackmd.io/_uploads/B122EYlUa.png) ``` $reset = *reset; $pc[31:0] = >>1$next_pc; $next_pc[31:0] = $reset ? 32'b0 : $pc[31:0] + 32'd4; ``` ### IMEM ![image](https://hackmd.io/_uploads/BJwDLYl8a.png) We will implement our IMem by instantiating a Verilog macro. This macro accepts a byte address as input, and produces the 32-bit read data as output. The macro can be instantiated, for example, as: ```` `READONLY_MEM($addr, $$read_data[31:0]) ```` Verilog macro instantiation is preceded by a back-tick (not to be confused with a single quote). - [ ] Instantiate the READONLY_MEM macro after your PC logic, providing $pc as the address and $$instr[31:0] as the output. Be sure to align this with other statements always using three spaces of indentation. ![image](https://hackmd.io/_uploads/Bkz_IKxIa.png) ### Decode Logic ![image](https://hackmd.io/_uploads/Hkb2LFxUT.png) Now that we have an instruction, let’s figure out what it is. Remember, RISC-V defines various instruction types that define the layout of the fields of the instruction, according to this table from the RISC-V specifications: ![image](https://hackmd.io/_uploads/By1NwteIT.png) Before we can interpret the instruction, we must know its type. This is determined by its opcode, in $instr[6:0]. In fact, $instr[1:0] must be 2'b11 for valid RV32I instructions. We will assume all instructions to be valid, so we can simply ignore these two bits. The ISA defines the instruction type to be determined as follows. ![image](https://hackmd.io/_uploads/Sk34vKgUa.png) You'll assign a boolean signal for each instruction type that indicates whether the instruction is of that type. For example, we could decode U-type as: ``` $is_u_instr = $instr[6:2] == 5'b00101 || $instr[6:2] == 5'b01101; ``` - [ ] Add this assignment statement to your code and write the remaining 5 statements for I, R, S, B, and J instruction types. (Gray cells can be ignored as these are not used in RV32I.) ### Decode Logic: Instruction Fields Now, based on the instruction type, we can extract the instruction fields. Most fields always come from the same bits regardless of the instruction type but only have meaning for certain instruction types. The imm field, an "immediate" value embedded in the instruction itself, is the exception. It is constructed from different bits depending on the instruction type. ![image](https://hackmd.io/_uploads/Hksh-JZ86.png) Let’s start with the simpler, non-immediate fields: $funct3, $rs1, $rs2, $rd, $opcode. We will not use $funct7, so you can skip this field. Check the box for each completed step, to ensure none is skipped. - [ ] Extract these fields, for example: ```$rs2[4:0] = $instr[24:20];``` - [ ] Determine when these fields are valid (excluding $opcode, which is always valid). For example: ``` $rs2_valid = $is_r_instr || $is_s_instr || $is_b_instr; ``` Provide $imm_valid as well, asserting for all types but R, even though we haven’t determined $imm yet. #### Immediate The immediate value is a bit more complicated. It is composed of bits from different fields depending on the type. ![image](https://hackmd.io/_uploads/rkyUfkZ8a.png) The immediate value for I-type instructions, for example is formed from 21 copies of instruction bit 31, followed by inst[30:20] (which is broken into three fields above for consistency with other formats). The immediate field can be formed, based on this table, using a logic expression like the following. It uses a combination of bit extraction (e.g. $instr[30:20]), bit replication (e.g. {21{...}}), and bit concatenation (e.g. {..., …}): ``` $imm[31:0] = $is_i_instr ? { {21{$instr[31]}}, $instr[30:20] } : $is_s_instr ? {...} : ... 32'b0; // Default ``` ### Decode Logic: Instruction ![image](https://hackmd.io/_uploads/S1XI41bIa.png) For convenience, concatenate the relevant fields into a single bit vector signal, as: ```$dec_bits[10:0] = {$instr[30],$funct3,$opcode};``` - [ ] For each of the instructions circled in red (we’ll come back and do the rest later), determine if $dec_bits identifies this instruction. For example: ```$is_beq = $dec_bits ==? 11'bx_000_1100011;``` Note that underscores here are optional to help delimit fields. Also note, we’re using "x" as a don’t-care for the instr[30] bit, which is not used by BEQ (or any other instruction in the left column). ### Register File Read ![image](https://hackmd.io/_uploads/SJjZIybUa.png) Like our mini IMem, the register file is a pretty typical array structure, so we can find a library component for it. This time, rather than using a Verilog module or macro as we did for IMem, we will use a TL-Verilog array definition, expanded by the M4 macro preprocessor. ``` //m4+rf(32, 32, $reset, $wr_en, $wr_index[4:0], $wr_data[31:0], $rd1_en, $rd1_index[4:0], $rd1_data, $rd2_en, $rd2_index[4:0], $rd2_data) ``` This would instantiate a 32-entry, 32-bit-wide register file connected to the given input and output signals, as depicted below. Each of the two read ports requires an index to read from and an enable signal that must assert when a read is required, and it produces read data as output (on the same cycle). ![image](https://hackmd.io/_uploads/S1aN8kbIp.png) Please delete the commented m4_rf() macro and replace it with: ``` m4+rf(32, 32, $reset, $rd_valid, $rd[4:0], $result_write_rf[31:0], $rs1_valid, $rs1, $src1_value, $rs2_valid, $rs2, $src2_value) ``` Make sure your variable names matches the ones presented here or adjust them accordingly. ### Arithmetic Logic Unit ![image](https://hackmd.io/_uploads/S176UJWIa.png) Now, you have source values to operate on, so let’s create the ALU. The ALU is much like our initial calculator circuit. It computes, for each possible instruction, the result that it would produce. It then selects, based on the actual instruction, which of those results is the correct one. At this point, we are only going to implement support for the instructions in our test program. Since branch instructions do not produce a result value, we only need to support ADDI (which adds the immediate value to source register 1) and ADD (which adds the two source register values). Check the box one you completed this step. - [ ] Use a structure like the following to assign the ALU $result in a single assignment expression for ADDI and ADD instructions: ``` $result[31:0] = $is_addi ? $src1_value + $imm : ... 32'b0; ``` ### Week 1 Code ``` \m4_TLV_version 1d: tl-x.org \SV m4_include_lib(['https://raw.githubusercontent.com/stevehoover/warp-v_includes/1d1023ccf8e7b0a8cf8e8fc4f0a823ebb61008e3/risc-v_defs.tlv']) m4_include_lib(['https://raw.githubusercontent.com/stevehoover/LF-Building-a-RISC-V-CPU-Core/main/lib/risc-v_shell_lib.tlv']) m4_test_prog() \SV m4_makerchip_module // (Expanded in Nav-TLV pane.) /* verilator lint_on WIDTH */ \TLV $reset = *reset; // Code for the incrementation of the program counter $pc[31:0] = >>1$next_pc; $next_pc[31:0] = $reset ? 32'b0 : $taken_br ? $br_tgt_pc : $is_jal ? $br_tgt_pc : $is_jalr ? $jalr_tgt_pc : ($pc[31:0] + 32'd4); // Macro initiation for instruction retrieval `READONLY_MEM($pc, $$instr[31:0]); // Instruction classification $is_u_instr = $instr[6:2] ==? 5'b0x101; $is_i_instr = $instr[6:2] ==? 5'b0000x || $instr[6:2] ==? 5'b001x0 || $instr[6:2] == 5'b11001; $is_r_instr = $instr[6:2] ==? 5'b011x0 || $instr[6:2] == 5'b01011 || $instr[6:2] == 5'b10100; $is_s_instr = $instr[6:2] ==? 5'b0100x; $is_b_instr = $instr[6:2] == 5'b11000; $is_j_instr = $instr[6:2] == 5'b11011; // Assigning load instructions using only the opcode $is_load = ($opcode ==? 7'b0x00011); // Extracting fields from the instructions $rs2[4:0] = $instr[24:20]; $funct7[6:0] = $instr[31:25]; $rs1[4:0] = $instr[19:15]; $funct3[2:0] = $instr[14:12]; $rd[4:0] = $instr[11:7]; $opcode[6:0] = $instr[6:0]; // Assigning boolean values for the validity of these fields $rd_valid = ~($is_s_instr || $is_b_instr || $instr[11:7] == 5'b0); $imm_valid = ~$is_r_instr; $rs1_valid = ~($is_u_instr || $is_j_instr); $rs2_valid = ($is_r_instr || $is_s_instr || $is_b_instr); `BOGUS_USE($rd $rd_valid $rs1 $rs1_valid $rs2 $rs2_valid $funct3 $funct7 $imm $imm_valid); // Extracting the immediate field from the instruction $imm[31:0] = $is_i_instr ? {{21{$instr[31]}}, $instr[30:20]} : $is_s_instr ? {{21{$instr[31]}}, $instr[30:25], $instr[11:8], $instr[7]} : $is_b_instr ? {{19{$instr[31]}}, {2{$instr[7]}}, $instr[30:25],$instr[11:8], 1'b0} : $is_u_instr ? {$instr[31], $instr[30:20], $instr[19:12], 12'b0} : $is_j_instr ? {{11{$instr[31]}}, $instr[19:12], {2{$instr[20]}}, $instr[30:21], 1'b0} : 32'b0 ; // Decoding the Instruction $dec_bits[10:0] = {$funct7[5], $funct3, $opcode}; $is_beq = $dec_bits ==? 11'bx0001100011; $is_bne = $dec_bits ==? 11'bx0011100011; $is_blt = $dec_bits ==? 11'bx1001100011; $is_bge = $dec_bits ==? 11'bx1011100011; $is_bltu = $dec_bits ==? 11'bx1101100011; $is_bgeu = $dec_bits ==? 11'bx1111100011; $is_addi = $dec_bits ==? 11'bx0000010011; $is_add = $dec_bits ==? 11'b00000110011; $result[31:0] = $is_addi ? $src1_value + $imm: $is_add ? $src1_value[31:0] + $src2_value[31:0]: 32'b0; // Coding the Branching Instruction MUX $taken_br = $is_beq ? ($src1_value == $src2_value ? 1'b1 : 1'b0) : $is_bne ? ($src1_value != $src2_value ? 1'b1 : 1'b0) : $is_blt ? (($src1_value < $src2_value) ^ ($src1_value[31] != $src2_value[31]) ? 1'b1 : 1'b0) : $is_bge ? (($src1_value >= $src2_value) ^ ($src1_value[31] != $src2_value[31]) ? 1'b1 : 1'b0) : $is_bltu ? ($src1_value < $src2_value ? 1'b1 : 1'b0) : $is_bgeu ? ($src1_value >= $src2_value ? 1'b1 : 1'b0) : 1'b0 ; // Assert these to end simulation (before Makerchip cycle limit). m4+tb() *failed = *cyc_cnt > M4_MAX_CYC; m4+rf(32, 32, $reset, $rd_valid, $rd[4:0], $result[31:0], $rs1_valid, $rs1, $src1_value, $rs2_valid, $rs2, $src2_value) m4+cpu_viz() \SV endmodule ``` ## Lab 3 ### Branching ![image](https://hackmd.io/_uploads/BkC5WGQva.png) A conditional branch instruction will branch to a target PC if its condition is true. Conditions are a comparison of the two source register values. Implementing conditional branch instructions will require: Determining whether the instruction is a branch that is taken (`$taken_br`). Computing the branch target (`$br_tgt_pc`). Updating the PC (`$pc`) accordingly. ![image](https://hackmd.io/_uploads/r17pZfXv6.png) Similar to the structure of the ALU, you’ll determine whether a branch is to be taken by selecting the appropriate comparison result. ![image](https://hackmd.io/_uploads/SJYCbzXDT.png) ### Rest of the decoder ![image](https://hackmd.io/_uploads/BkMiXGQwa.png) With the exception of load and store instructions (LB, LH, LW, LBU, LHU, SB, SH, SW), complete the decode logic for the remaining non-circled instructions above ($is_<instr> = …). Remember, you can use "x" for don't-care bits. Our implementation will treat all loads and all stores the same, so assign $is_load based on opcode only. $is_s_instr already identifies stores, so we do not need any additional decode logic for stores. ### Rest of the ALU Now we will add support in the ALU for the remaining instructions. We do this by extending the assignment statement for $result. Since there will be an expression for almost every instruction, there is a lot of code to write here. We’ll provide the expressions, but we’ll ask you to do the typing yourself so you have a chance to reflect on each instruction. A few have common subexpressions, so let’s first create assignments for these subexpressions. Finish Implementing the instructions marked in blue: ![image](https://hackmd.io/_uploads/BJa4SzmDp.png) ``` // SLTU and SLTI (set if less than,unsigned) results: $sltu_rslt[31:0] = {31'b0, $src1_value < $src2_value}; $sltiu_rslt[31:0] = {31'b0, $src1_value < $imm}; // SRA and SRAI (shift right,arithmetic) results: // sign-extended src1 $sext_src1[63:0] = {{32{$src1_value[31]}}, $src1_value}; // 64-bit sign-extended results. to be truncated $sra_rslt[63:0] = $sext_src1 >> $src2_value[4:0]; $srai_rslt[63:0] = $sext_src1 >> $imm[4:0]; ``` ``` $result[31:0] = $is_addi ? $src1_value + $imm: $is_add ? $src1_value[31:0] + $src2_value[31:0]: $is_andi ? $src1_value & $imm: $is_lui ? {$imm[31:12], 12'b0}: $is_auipc ? $pc + {$imm[31:12], 12'b0}: $is_jal ? $pc + 32'd4: $is_jalr ? $pc + 32'd4: $is_slt ? (($src1_value[31] == $src2_value[31]) ? $sltu_rslt : {31'b0, $src1_value[31]}): $is_slti ? (($src1_value[31] == $imm[31]) ? $sltu_rslt : {31'b0, $src1_value[31]}): 32'b0; ``` ### Jump Logic ![image](https://hackmd.io/_uploads/Hkhqrf7wa.png) The ISA, in addition to conditional branches, also supports jump instructions (which some other ISAs refer to as "unconditional branches"). RISC-V has two forms of jump instructions: JAL Jump and link. Jumps to PC + IMM (like branches, so this target is $br_tgt_pc, already assigned). JALR Jump and link register. Jumps to SRC1 + IMM. "And link" refers to the fact that these instructions capture their original PC + 4 in a destination register, as you already coded in the ALU. (The link register is particularly useful for jumps that are used to implement function calls, which must return to the link address after function execution.) Check the box for each completed step, to ensure none is skipped. Compute `$jalr_tgt_pc[31:0] (SRC1 + IMM).` Update the PC logic to select the correct `$next_pc` for JAL ``($br_tgt_pc)`` and JALR ``($jalr_tgt_pc)``. ### Memory So far, all of our instructions are operating on register values. What good is a CPU if it has no memory? Let’s add some. But first, let’s prepare the load and store instructions that will read from and write to this memory. Both load and store instructions require an address from which to read, or to which to write. As with the IMem, this is a byte-address. Loads and stores can read/write single bytes, half-words (2 bytes), or words (4 bytes/32 bits). ![image](https://hackmd.io/_uploads/HJyIUM7PT.png) The address for loads/stores is computed based on the value from a source register and an offset value (often zero) provided as the immediate. `addr = rs1 + imm` A load instruction (LW,LH,LB,LHU,LBU) takes the form: LOAD rd, imm(rs1) It uses the I-type instruction format: ![image](https://hackmd.io/_uploads/BkguUfXwp.png) It writes its destination register with a value read from the specified address of memory, which we can denote as: rd <= DMem[addr] (where, addr = rs1 + imm) ### Store A store instruction (SW,SH,SB) takes the form: STORE rs2, imm(rs1) It has its own S-type instruction format: ![image](https://hackmd.io/_uploads/S1b9IGmwT.png) It writes the specified address of memory with a value from the rs2 source register: DMem[addr] <= rs2 (where, addr = rs1 + imm) - [ ] The address computation, rs1 + imm, is the same computation performed by ADDI. Since load/store instructions do not otherwise require the ALU, we will utilize the ALU for this computation. - [ ] For loads/stores ($is_load/$is_s_instr), compute $result as the address (rs1 + imm) ### Data Memory ![image](https://hackmd.io/_uploads/SJzZwMmDp.png) To keep our simulations zippy, we’ll instantiate a very small data memory--the same size as our register file. Unlike our register file, which is capable of reading two values each cycle and, on the same cycle, writing a value, our memory needs only to read one value or write one value each cycle to process a load or a store instruction. Similar to our register file, our DMem is word-granular. Recall that we are supporting only work loads/stores with naturally-aligned addresses (so the lower two bits zero are assumed to be zero). Based on the discussion above: * write is enabled for stores ($is_s_instr) * read is enabled for loads ($is_load) * the ALU result ($result) provides the read/write address; this is a byte address, while our memory is indexed by 32-bit words * rs2 ($src2_value) provides the write data * the only output of the DMem is the load data (which we'll call $ld_data). Similar to what we did for the register file, there is a commented macro instantiation for `m4+dmem(32, 32, $reset, $addr[4:0], $wr_en, $wr_data[31:0], $rd_en, $rd_data)`. Uncomment it. Provide proper macro arguments to connect the correct input and output signals. Be sure to extract the appropriate bits of the byte address to drive the DMem's word address. Since the memory has a single read port, fewer arguments are needed for the DMem than for the RF. The load data ($ld_data) coming from DMem must be written to the register file. A new multiplexer is needed to select $ld_data for load instructions, as depicted in the figure. - [ ] Add this new multiplexer to write $ld_data, rather than $result, to the register file, when $is_load asserts. ## Congrats! You have your own functioning RISC-V Core.