In this part, we need to iterate through the entire input array. For each element in the array that is less than zero, we should change it to zero, ensuring that all elements in the input array are non-negative.
In this approach, we begin by assuming the first element of the input array is the maximum value. Starting from the second element, we iterate through each element in the array. At each step, we compare the current element with the current maximum value. If the current element is larger, we update the maximum value accordingly. This process continues until we have examined all elements in the array, leaving us with the maximum value of the input array.
In this section, there are two input arrays, and we need to calculate the precise offset for each element in the arrays. It’s important to also consider the stride of each array. For example:
To access the ith element in an array, we calculate the offset as i * s
, where s
is the stride of the array. Since processors typically use byte addressing mode and a word is 4 bytes by default, we multiply the offset by 4. This allows us to access the correct element from memory.
I am asked to not use M extension instruction, so I should implement multiply function personally. To implement multiplication without using the M extension instruction, I apply the traditional multiplication in binary. Here’s how it works step by step:
Example for the binary multiplication 10002 x 10012 = 10010002 , follow these steps:
Step 1: Check if the Multiplier is Zero
If the multiplier is zero, the product is zero, and the computation is complete. Otherwise, proceed to Step 2.
Step 2: Check the Least Significant Bit (LSB) of the Multiplier. To determine if the least significant bit(LSB) of the multiplier is 1 , perform a bitwise AND operation between the multiplier and 1 .
For example:
Multiplier = 1001
and 0001
-----------------
result 0001
Step 3: Update the Multiplicand and Multiplier Perform the following bit-shift operations:
1000 -> Multiplicand
x 1001 -> Multiplier
---------
1000
0000
0000
+ 1000
---------
1001000 -> Product
After updating, return to Step 2 to continue the process until the multiplier becomes zero.
After loading the correct elements from memory as
While implementing RISC-V assembly, I made some mistakes, such as using the wrong destination register, which ended up overwriting parameters needed later.
To resolve this issue, I used a temporary register as the destination register, which prevents overwriting parameters needed later.
All two-dimensional matrices in this project will be stored as 1D vectors in row-major order. This means the rows of the matrix are concatenated to form a single continuous array. Alternatively, matrices could be stored in column-major order, but in this project, we stick to row-major order.
To implement matrix multiplication, we use nested loops. The outer loop iterates through the rows of the first matrix, while the inner loop iterates through the columns of the second matrix. For each pair of a row from the first matrix and a column from the second matrix, we compute the dot product of the two. This dot product represents the value of the corresponding element in the resulting matrix.
In this section, multiple mul
instructions are used extensively. To improve code readability, I implemented a multiply.s
file as an external function in the feature branch
. Before invoking the multiply function, the required parameter values are stored on the stack using the sw
instruction. After the function call, the results are retrieved from the stack using the lw
instruction.
However, this approach introduces significant execution overhead due to the frequent use of lw / sw
instructions for parameter passing and result retrieval. As a result, the execution time has increased to approximately three times that of the previous version.
To further optimize the implementation, it may be possible to minimize stack operations by leveraging registers more effectively. Alternatively, adopting an inline assembly approach could reduce the function call overhead, potentially improving performance.