contributed by < kaeteyaruyo
>
In Algebraic definition, The dot product of two vectors and is defined as:
As in implementation, we store two vectors in two arrays that have same length, travel through them and multiply the same position elements, and add the result into summation, finally we get the dot product.
You can find the source code here. Feel free to fork and modify it.
I use arrays of length = 3 to illustrate the dot product process. You can change the length or element inside them to see if the result is still correct.
In this example, I don't handle addition and multiplication overflow. So please ensure that the result of mul
at line 33 and add
at line 34 won't exceed the range of valid value ( to ).
We test our code using Ripes simulator.
Put code above into editor and we will see that Ripe doesn't execute it literally. Instead, it replace pseudo instruction (p.110) into equivalent one, and change register name from ABI name to sequencial one.
The translated code looks like:
In each row it denotes address in instruction memory, instruction's machine code (in hex) and instruction itself respectively.
Now we can choose a processor to run this code. Ripes provide four kinds of processor for us to choose, including single cycle processor, 5-stage processor, 5-stage with hazard detection and 5-stage with forward and hazard detection. Here we choose the 5 stage processor. Its block diagram look like this:
The "5-stage" means this processor using five-stage pipeline to parallelize instructions. The stages are:
You can see that each stage is separated by pipeline registers (the rectangle block with stage names on its each side) in the block diagram.
Instruction in different type of format will go through 5 stages with different signal turned on. Let's discuss each format in detail with example.
The first instruction in this program is auipc x9 0x10000
. According to RISC-V Manual (p.14):
AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd.
Let's see how it go through each stage.
0x00000000
, so addr
is equal to 0x00000000
0x10000497
(you can look it up in the code snippet above), so instr
is equal to 0x10000497
.0x10000497
is decoded to three part:
opcode
= auipc
Wr idx
= 0x09
imm.
= 0x10000000
(0x10000
in upper 20 bits, filling in the lowest 12 bits with zeros)R1 idx
and R2 idx
are both 0x00
.Reg 1
and Reg 2
read value from 0x00
register, so their value are both 0x00000000
too.0x00000000
) and next PC value (0x00000004
) are just send through this stage, we don't use them.Reg 1
and Reg 2
, but this is an U-type format instruction, we don't use them. So they are filtered by second level multiplexer.Reg 1
and Reg 2
are also send to branch block, but no branch is taken.Op1
and Op2
of ALU.Res
is equal to 0x00000000
.0x00000004
) and Wr idx
(0x09
) are just send through this stage, we don't use them.Res
from ALU is send to 3 ways:
0x10000000
, so Read out
is equal to 0x00000001
. The table below denotes the data section of memory.Reg 2
is send to Data in
, but memory doesn't enable writing.0x00000004
) and Wr idx
(0x09
) are just send through this stage, we don't use them.Res
from ALU as final output. So the output value is 0x10000000
.Wr idx
are send back to registers block. Finally, the value 0x10000000
will be write into x9
register, whose ABI name is s1
.After all these stage are done, the register is updated like this: