Extend Lab3 to comply with Vector extension

# Extend Lab3 to comply with Vector extension :::danger Still in progress, 1/14 newly updated. ::: ## Usage Example >memcpy.s ```asm= .text .balign 4 .global memcpy # void *memcpy(void* dest, const void* src, size_t n) # a0=dest, a1=src, a2=n # memcpy: mv a3, a0 # Copy destination loop: vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b vle8.v v0, (a1) # Load bytes add a1, a1, t0 # Bump pointer sub a2, a2, t0 # Decrement count vse8.v v0, (a3) # Store bytes add a3, a3, t0 # Bump pointer bnez a2, loop # Any more? ret # Return ``` * Instructions to be implemented is shown as below: | Instruction | type | OP code | task | |:----------:|:--------:|:-------:|:--------:| | vsetvli | vcfg | 01010111(0x57) | setting vtype、vl、CSRs | | vle8.v | v_load | 0000111(0x7) | load | | vse8.v | v_store | 0100111(0x27)| store | :::warning REMINDER: * VLEN = 128 (VLMAX = VLEN/SEW) * modifying 32-bits ALU to 128-bits ALU * adding instructions to InstructionDecode * adding a new 128-bits register file ::: ## Instructions :::info **vsetvli** A set of instructions is provided to allow rapid configuration of the values in vl and vtype to match application needs. The vsetvli instructions set the vtype and vl CSRs based on their arguments, and write the new value of vl into rd. The vsetvli instruciton would set the vtype、vl、CSR registers and write the new value of 'vl' to 'rd'。 ::: ```asm vsetvli rd, rs1, vtypei # rd = new vl, rs1 = AVL, vtypei = new vtype setting ``` >usage: ```asm= # ta = Tail agnostic # tu = Tail undisturbed # ma = Mask agnostic # mu = Mask undisturbed # e8 = set vl to 8bit integer vsetvli t0, a0, e8, m4, ta, ma # Tail agnostic, mask agnostic vsetvli t0, a0, e8, m4, tu, ma # Tail undisturbed, mask agnostic vsetvli t0, a0, e8, m4, ta, mu # Tail agnostic, mask undisturbed vsetvli t0, a0, e8, m4, tu, mu # Tail undisturbed, mask undisturbed ``` | 0 | zimm[10:0] | rs1 | 111 | rd | Op code | |:----:|:---:|:---:|:---:|:----:|:-------:| | [31] | [30:20] | [19:15] | [14:12] | [11:7] | [6:0] | ![image](https://hackmd.io/_uploads/r1fv9ERu6.png) --- :::info **vle8.v** The vle8.v is used for loading vector. ::: ```asm # Vector unit-stride loads # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) vle8.v vd, (rs1), vm # 8-bit unit-stride load vle16.v vd, (rs1), vm # 16-bit unit-stride load vle32.v vd, (rs1), vm # 32-bit unit-stride load vle64.v vd, (rs1), vm # 64-bit unit-stride load ``` | nf | mew | mop | vm | lumop | rs1 | width | vd | Op code | |:---:|:---:|:---:|:----:|:----------:|:-------:|:-------:|:------:|:-------:| | [31:29] | [28] | [27:26] | [25] | [24:20] | [19:15] | [14:12] | [11:7] | [6:0] | ![image](https://hackmd.io/_uploads/Hkk38BA_a.png) --- :::info **vse8.v** The vle8.v is used for storing vector. ::: ```asm # Vector unit-stride stores # vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>) vse8.v vs3, (rs1), vm # 8-bit unit-stride store vse16.v vs3, (rs1), vm # 16-bit unit-stride store vse32.v vs3, (rs1), vm # 32-bit unit-stride store vse64.v vs3, (rs1), vm # 64-bit unit-stride store ``` ![image](https://hackmd.io/_uploads/H1b9YL0_a.png) # Current Progress ## Adding Parameters In order to add new vector registers, the ```riscv/Parameters.scala``` file must be modified accordingly. Two new values are added into the object "Parameters". ```scala= val NumVectorRegisters = 32 //++++++ val VectorRegisterBits = 128 //++++++ ``` ## Decoding Refer to the [vmem instruction format](https://github.com/riscv/riscv-v-spec/blob/master/vmem-format.adoc), [vcfg instruction format](https://github.com/riscv/riscv-v-spec/blob/master/vcfg-format.adoc). * **vsetvli** >opcode:0x57(1010111) * **vle8.v** >opcode:0x07(0000111) * **vse8.v** >opcode:0x27(0100111) According to the instructions we need, three new instruction types have been added to the current set based on their specific op-code. ```scala= object InstructionTypes { val L = "b0000011".U val I = "b0010011".U val S = "b0100011".U val RM = "b0110011".U val B = "b1100011".U val V_load = "b0000111".U val V_store = "b0100111".U val V_R = "b1010111".U } ``` ```scala= object Instructions { val lui = "b0110111".U val nop = "b0000001".U val jal = "b1101111".U val jalr = "b1100111".U val auipc = "b0010111".U val csr = "b1110011".U val fence = "b0001111".U val vsetvli = "b1010111".U //++++++++ } ``` ```scala= class InstructionDecode_v extends Module { val io = IO(new Bundle { val instruction = Input(UInt(Parameters.InstructionWidth)) val regs_reg1_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) val regs_reg2_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) val ex_immediate = Output(UInt(Parameters.DataWidth)) val ex_aluop1_source = Output(UInt(1.W)) val ex_aluop2_source = Output(UInt(1.W)) val memory_read_enable = Output(Bool()) val memory_write_enable = Output(Bool()) val wb_reg_write_source = Output(UInt(2.W)) val reg_write_enable = Output(Bool()) val reg_write_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) }) val opcode = io.instruction(6, 0) val funct3 = io.instruction(14, 12) val funct7 = io.instruction(31, 25) val rd = io.instruction(11, 7) val rs1 = io.instruction(19, 15) val rs2 = io.instruction(24, 20) val zimm = io.instruction(30,20) //++++++++ io.regs_reg1_read_address := Mux(opcode === Instructions.lui, 0.U(Parameters.PhysicalRegisterAddrWidth), rs1) io.regs_reg2_read_address := rs2 val immediate = MuxLookup( opcode, Cat(Fill(20, io.instruction(31)), io.instruction(31, 20)), IndexedSeq( InstructionTypes.I -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)), InstructionTypes.L -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)), Instructions.jalr -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)), InstructionTypes.S -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 25), io.instruction(11, 7)), InstructionTypes.B -> Cat( Fill(20, io.instruction(31)), io.instruction(7), io.instruction(30, 25), io.instruction(11, 8), 0.U(1.W) ), Instructions.lui -> Cat(io.instruction(31, 12), 0.U(12.W)), Instructions.auipc -> Cat(io.instruction(31, 12), 0.U(12.W)), // jal's imm represents a multiple of 2 bytes. Instructions.jal -> Cat( Fill(12, io.instruction(31)), io.instruction(19, 12), io.instruction(20), io.instruction(30, 21), 0.U(1.W) ), Instructions.vsetvli -> 0.U(32.W) //++++++++ ) ) io.ex_immediate := immediate io.ex_aluop1_source := Mux( opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal, ALUOp1Source.InstructionAddress, ALUOp1Source.Register ) // ALU op2 from reg: R-type, // ALU op2 from imm: L-Type (I-type subtype), // I-type (nop=addi, jalr, csr-class, fence), // J-type (jal), // U-type (lui, auipc), // S-type (rs2 value sent to MemControl, ALU computes rs1 + imm.) // B-type (rs2 compares with rs1 in jump judge unit, ALU computes jump address PC+imm.) io.ex_aluop2_source := Mux( opcode === InstructionTypes.RM, ALUOp2Source.Register, ALUOp2Source.Immediate ) // lab3(InstructionDecode) begin io.memory_read_enable := (opcode === InstructionTypes.L) || (opcode === InstructionTypes.V_load) //++++++++ io.memory_write_enable := (opcode === InstructionTypes.S) || (opcode === InstructionTypes.V_store) //++++++++ // lab3(InstructionDecode) end io.wb_reg_write_source := MuxCase( RegWriteSource.ALUResult, mutable_ArraySeq( (opcode === InstructionTypes.RM || opcode === InstructionTypes.I || opcode === Instructions.lui || opcode === Instructions.auipc) -> RegWriteSource.ALUResult, // same as default (opcode === InstructionTypes.L) -> RegWriteSource.Memory, (opcode === Instructions.jal || opcode === Instructions.jalr) -> RegWriteSource.NextInstructionAddress, (opcode === Instructions.vsetvli) -> RegWriteSource.Vset //++++++++ ) ) io.reg_write_enable := (opcode === InstructionTypes.RM) || (opcode === InstructionTypes.I) || (opcode === InstructionTypes.L) || (opcode === Instructions.auipc) || (opcode === Instructions.lui) || (opcode === Instructions.jal) || (opcode === Instructions.jalr) || (opcode === Instructions.vsetvli) //++++++++ io.reg_write_address := rd } ``` # Issues * While vector register loading from/storing to memory, how to * implementing in a single clock? * vestvli requires writing to two registers (rd and vtype register). Can an additional port be added? :::warning Additional states are required. ::: # Problem Resolved * In the Jupyter Scala Kernel, ```ArraySeq``` is placed in ```scala.collection.mutable``` [But typically](https://www.scala-lang.org/api/2.13.4/scala/collection/immutable/ArraySeq.html), ```ArraySeq```is placed in ```scala.collection.immutable``` ``` cmd33.sc:15: object ArraySeq is not a member of package scala.collection.immutable import scala.collection.immutable.ArraySeq ^Compilation Failed Compilation Failed ``` * Importing only ```ArraySeq``` which follows ```scala.collection.mutable``` and renaming it. ```scala import scala.collection.immutable import scala.collection.mutable.{ArraySeq => mutable_ArraySeq} ``` # Reference 1. [RISC-V "V" Vector Extension Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) 2. [RISCV-V-1.0向量扩展指令集学习](https://www.cnblogs.com/lyc-seu/p/16999784.html) 3. [Scala ArraySeq](https://github.com/scala/scala/blob/v2.13.4/src/library/scala/collection/immutable/ArraySeq.scala#L265) 4. [Penta Five(A RISC-V vector processor written in Chisel HDL.)](https://github.com/madsrumlenordstrom/penta-five/blob/main/src/main/scala/vector/VecDecoder.scala) 5. [The Problem with RISC-V V Mask Bits ](https://www.computerenhance.com/p/the-problem-with-risc-v-v-mask-bits) 6. [RISC-V-Vector-Processor](https://github.com/martinriis/RISC-V-Vector-Processor) 7. [RISC-V-vector-processor-for-the-acceleration-of-Machine-learning-algorithms](https://github.com/Nikola2444/RISC-V-vector-processor) 8. [Vicuna - a RISC-V Zve32x Vector Coprocessor](https://github.com/vproc/vicuna) 9. [Effective element width encoding in vector load/stores ](https://lists.riscv.org/g/tech-vector-ext/topic/effective_element_width/73070232)