# Extend Lab3 to comply with Vector extension
:::danger
Still in progress, 1/14 newly updated.
:::
## Usage Example
>memcpy.s
```asm=
.text
.balign 4
.global memcpy
# void *memcpy(void* dest, const void* src, size_t n)
# a0=dest, a1=src, a2=n
#
memcpy:
mv a3, a0 # Copy destination
loop:
vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b
vle8.v v0, (a1) # Load bytes
add a1, a1, t0 # Bump pointer
sub a2, a2, t0 # Decrement count
vse8.v v0, (a3) # Store bytes
add a3, a3, t0 # Bump pointer
bnez a2, loop # Any more?
ret # Return
```
* Instructions to be implemented is shown as below:
| Instruction | type | OP code | task |
|:----------:|:--------:|:-------:|:--------:|
| vsetvli | vcfg | 01010111(0x57) | setting vtype、vl、CSRs |
| vle8.v | v_load | 0000111(0x7) | load |
| vse8.v | v_store | 0100111(0x27)| store |
:::warning
REMINDER:
* VLEN = 128 (VLMAX = VLEN/SEW)
* modifying 32-bits ALU to 128-bits ALU
* adding instructions to InstructionDecode
* adding a new 128-bits register file
:::
## Instructions
:::info
**vsetvli**
A set of instructions is provided to allow rapid configuration of the values in vl and vtype to match application needs. The vsetvli instructions set the vtype and vl CSRs based on their arguments, and write the new value of vl into rd.
The vsetvli instruciton would set the vtype、vl、CSR registers and write the new value of 'vl' to 'rd'。
:::
```asm
vsetvli rd, rs1, vtypei # rd = new vl, rs1 = AVL, vtypei = new vtype setting
```
>usage:
```asm=
# ta = Tail agnostic
# tu = Tail undisturbed
# ma = Mask agnostic
# mu = Mask undisturbed
# e8 = set vl to 8bit integer
vsetvli t0, a0, e8, m4, ta, ma # Tail agnostic, mask agnostic
vsetvli t0, a0, e8, m4, tu, ma # Tail undisturbed, mask agnostic
vsetvli t0, a0, e8, m4, ta, mu # Tail agnostic, mask undisturbed
vsetvli t0, a0, e8, m4, tu, mu # Tail undisturbed, mask undisturbed
```
| 0 | zimm[10:0] | rs1 | 111 | rd | Op code |
|:----:|:---:|:---:|:---:|:----:|:-------:|
| [31] | [30:20] | [19:15] | [14:12] | [11:7] | [6:0] |

---
:::info
**vle8.v**
The vle8.v is used for loading vector.
:::
```asm
# Vector unit-stride loads
# vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>)
vle8.v vd, (rs1), vm # 8-bit unit-stride load
vle16.v vd, (rs1), vm # 16-bit unit-stride load
vle32.v vd, (rs1), vm # 32-bit unit-stride load
vle64.v vd, (rs1), vm # 64-bit unit-stride load
```
| nf | mew | mop | vm | lumop | rs1 | width | vd | Op code |
|:---:|:---:|:---:|:----:|:----------:|:-------:|:-------:|:------:|:-------:|
| [31:29] | [28] | [27:26] | [25] | [24:20] | [19:15] | [14:12] | [11:7] | [6:0] |

---
:::info
**vse8.v**
The vle8.v is used for storing vector.
:::
```asm
# Vector unit-stride stores
# vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>)
vse8.v vs3, (rs1), vm # 8-bit unit-stride store
vse16.v vs3, (rs1), vm # 16-bit unit-stride store
vse32.v vs3, (rs1), vm # 32-bit unit-stride store
vse64.v vs3, (rs1), vm # 64-bit unit-stride store
```

# Current Progress
## Adding Parameters
In order to add new vector registers, the ```riscv/Parameters.scala``` file must be modified accordingly.
Two new values are added into the object "Parameters".
```scala=
val NumVectorRegisters = 32 //++++++
val VectorRegisterBits = 128 //++++++
```
## Decoding
Refer to the [vmem instruction format](https://github.com/riscv/riscv-v-spec/blob/master/vmem-format.adoc), [vcfg instruction format](https://github.com/riscv/riscv-v-spec/blob/master/vcfg-format.adoc).
* **vsetvli**
>opcode:0x57(1010111)
* **vle8.v**
>opcode:0x07(0000111)
* **vse8.v**
>opcode:0x27(0100111)
According to the instructions we need, three new instruction types have been added to the current set based on their specific op-code.
```scala=
object InstructionTypes {
val L = "b0000011".U
val I = "b0010011".U
val S = "b0100011".U
val RM = "b0110011".U
val B = "b1100011".U
val V_load = "b0000111".U
val V_store = "b0100111".U
val V_R = "b1010111".U
}
```
```scala=
object Instructions {
val lui = "b0110111".U
val nop = "b0000001".U
val jal = "b1101111".U
val jalr = "b1100111".U
val auipc = "b0010111".U
val csr = "b1110011".U
val fence = "b0001111".U
val vsetvli = "b1010111".U //++++++++
}
```
```scala=
class InstructionDecode_v extends Module {
val io = IO(new Bundle {
val instruction = Input(UInt(Parameters.InstructionWidth))
val regs_reg1_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
val regs_reg2_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
val ex_immediate = Output(UInt(Parameters.DataWidth))
val ex_aluop1_source = Output(UInt(1.W))
val ex_aluop2_source = Output(UInt(1.W))
val memory_read_enable = Output(Bool())
val memory_write_enable = Output(Bool())
val wb_reg_write_source = Output(UInt(2.W))
val reg_write_enable = Output(Bool())
val reg_write_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
})
val opcode = io.instruction(6, 0)
val funct3 = io.instruction(14, 12)
val funct7 = io.instruction(31, 25)
val rd = io.instruction(11, 7)
val rs1 = io.instruction(19, 15)
val rs2 = io.instruction(24, 20)
val zimm = io.instruction(30,20) //++++++++
io.regs_reg1_read_address := Mux(opcode === Instructions.lui, 0.U(Parameters.PhysicalRegisterAddrWidth), rs1)
io.regs_reg2_read_address := rs2
val immediate = MuxLookup(
opcode,
Cat(Fill(20, io.instruction(31)), io.instruction(31, 20)),
IndexedSeq(
InstructionTypes.I -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)),
InstructionTypes.L -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)),
Instructions.jalr -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)),
InstructionTypes.S -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 25), io.instruction(11, 7)),
InstructionTypes.B -> Cat(
Fill(20, io.instruction(31)),
io.instruction(7),
io.instruction(30, 25),
io.instruction(11, 8),
0.U(1.W)
),
Instructions.lui -> Cat(io.instruction(31, 12), 0.U(12.W)),
Instructions.auipc -> Cat(io.instruction(31, 12), 0.U(12.W)),
// jal's imm represents a multiple of 2 bytes.
Instructions.jal -> Cat(
Fill(12, io.instruction(31)),
io.instruction(19, 12),
io.instruction(20),
io.instruction(30, 21),
0.U(1.W)
),
Instructions.vsetvli -> 0.U(32.W) //++++++++
)
)
io.ex_immediate := immediate
io.ex_aluop1_source := Mux(
opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal,
ALUOp1Source.InstructionAddress,
ALUOp1Source.Register
)
// ALU op2 from reg: R-type,
// ALU op2 from imm: L-Type (I-type subtype),
// I-type (nop=addi, jalr, csr-class, fence),
// J-type (jal),
// U-type (lui, auipc),
// S-type (rs2 value sent to MemControl, ALU computes rs1 + imm.)
// B-type (rs2 compares with rs1 in jump judge unit, ALU computes jump address PC+imm.)
io.ex_aluop2_source := Mux(
opcode === InstructionTypes.RM,
ALUOp2Source.Register,
ALUOp2Source.Immediate
)
// lab3(InstructionDecode) begin
io.memory_read_enable := (opcode === InstructionTypes.L) || (opcode === InstructionTypes.V_load) //++++++++
io.memory_write_enable := (opcode === InstructionTypes.S) || (opcode === InstructionTypes.V_store) //++++++++
// lab3(InstructionDecode) end
io.wb_reg_write_source := MuxCase(
RegWriteSource.ALUResult,
mutable_ArraySeq(
(opcode === InstructionTypes.RM || opcode === InstructionTypes.I ||
opcode === Instructions.lui || opcode === Instructions.auipc) -> RegWriteSource.ALUResult, // same as default
(opcode === InstructionTypes.L) -> RegWriteSource.Memory,
(opcode === Instructions.jal || opcode === Instructions.jalr) -> RegWriteSource.NextInstructionAddress,
(opcode === Instructions.vsetvli) -> RegWriteSource.Vset //++++++++
)
)
io.reg_write_enable := (opcode === InstructionTypes.RM) || (opcode === InstructionTypes.I) ||
(opcode === InstructionTypes.L) || (opcode === Instructions.auipc) || (opcode === Instructions.lui) ||
(opcode === Instructions.jal) || (opcode === Instructions.jalr) || (opcode === Instructions.vsetvli) //++++++++
io.reg_write_address := rd
}
```
# Issues
* While vector register loading from/storing to memory, how to
* implementing in a single clock?
* vestvli requires writing to two registers (rd and vtype register). Can an additional port be added?
:::warning
Additional states are required.
:::
# Problem Resolved
* In the Jupyter Scala Kernel, ```ArraySeq``` is placed in ```scala.collection.mutable``` [But typically](https://www.scala-lang.org/api/2.13.4/scala/collection/immutable/ArraySeq.html), ```ArraySeq```is placed in ```scala.collection.immutable```
```
cmd33.sc:15: object ArraySeq is not a member of package scala.collection.immutable
import scala.collection.immutable.ArraySeq
^Compilation Failed
Compilation Failed
```
* Importing only ```ArraySeq``` which follows ```scala.collection.mutable``` and renaming it.
```scala
import scala.collection.immutable
import scala.collection.mutable.{ArraySeq => mutable_ArraySeq}
```
# Reference
1. [RISC-V "V" Vector Extension Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc)
2. [RISCV-V-1.0向量扩展指令集学习](https://www.cnblogs.com/lyc-seu/p/16999784.html)
3. [Scala ArraySeq](https://github.com/scala/scala/blob/v2.13.4/src/library/scala/collection/immutable/ArraySeq.scala#L265)
4. [Penta Five(A RISC-V vector processor written in Chisel HDL.)](https://github.com/madsrumlenordstrom/penta-five/blob/main/src/main/scala/vector/VecDecoder.scala)
5. [The Problem with RISC-V V Mask Bits
](https://www.computerenhance.com/p/the-problem-with-risc-v-v-mask-bits)
6. [RISC-V-Vector-Processor](https://github.com/martinriis/RISC-V-Vector-Processor)
7. [RISC-V-vector-processor-for-the-acceleration-of-Machine-learning-algorithms](https://github.com/Nikola2444/RISC-V-vector-processor)
8. [Vicuna - a RISC-V Zve32x Vector Coprocessor](https://github.com/vproc/vicuna)
9. [Effective element width encoding in vector load/stores
](https://lists.riscv.org/g/tech-vector-ext/topic/effective_element_width/73070232)