PART A: Mathematical Functions

In this section, I focused on implementing fundamental matrix operations commonly used in neural networks. Specifically, I developed functions for dot product, matrix multiplication, element-wise ReLU, and ArgMax.

All matrices were represented as 1D vectors in row-major order. This representation required careful attention to memory access patterns, particularly when working with strides to access elements non-contiguously in memory.

Challenges and Solutions

1. Input Handling: Flattening the 2D Matrix

Challenge:
Transforming a 2D matrix into a 1D vector was the first hurdle. Working with MNIST-like data, I had to ensure that the flattened representation accurately followed the row-major order.

Solution:
Through precise index calculations, I ensured that each row of the 2D matrix was concatenated correctly. This set a solid foundation for subsequent matrix operations.

2. ReLU Implementation

Implementation:
ReLU was implemented by looping through the input array, using the index i as a pointer to process each value individually. Negative values were replaced with zero.

Challenge:
The VENUS system call requires a0 as the control register. During static data testing, using a0 for output caused conflicts when performing system calls.

Solution:
To resolve this, I temporarily stored output in a1 before making system calls, ensuring that a0 could properly control the output.

3. ArgMax Implementation

Implementation:
ArgMax was based on a "max register" function. The algorithm iterated through the array, comparing each element and updating the "reg_kingdom" (max value register) when a new maximum was found. Simultaneously, it tracked the corresponding index of the maximum value.

4. Dot Product Implementation

Thought Process:

Version 1:
The initial implementation focused on functionality, assuming both arrays had the same stride. Sizes size1 and size2 were independently configurable.

⚠️ Key Issues:
- The dot product’s role in matmul assumes stride1 = 1 and stride2 = matrixB_col.
- The sizes of both arrays must match by design.
Version 2 (Improvements):
Added the ability to handle different strides (stride1 and stride2).

🔑 Key Insights:
- For matmul, stride1 is always 1, while stride2 typically exceeds 1 (e.g., the number of columns in matrix B).
Additionally, I reduced the number of registers by combining size1 and size2 into a single size register.

Future Improvements:
Develop a version where both stride1 and stride2 can be independently configurable, enhancing the function's general utility.

5. Matrix Multiplication (MatMul) Implementation

Implementation:
Matrix multiplication was achieved by calculating the dot product between rows of M1 and columns of M2. This process repeated for each pair, with the total number of computations equaling M1_col_number × M2_row_number.

Key Takeaways from PART A

1. Static Memory Allocation for Testing

Throughout PART A, I learned to define static memory in the data segment for testing each function. This approach allowed for easy verification of function outputs without worrying about dynamic memory issues during initial development.

2. Debugging with VENUS Web Simulator

Debugging assembly can be daunting, but VENUS Web Simulator proved invaluable. By stepping through each instruction and observing register values, I could pinpoint errors. Setting breakpoints using EBREAK allowed me to halt execution at critical points, making the debugging process more manageable and systematic.

3. Mastery of Function Calling Conventions

Implementing these functions solidified my understanding of RISC-V calling conventions. Specifically:

Caller-saved registers (reg_a, reg_t) were used for temporary values.
Callee-saved registers (reg_s, reg_ra) ensured that critical values were preserved across function calls.

Through practice, I became adept at crafting robust prologue and epilogue sections for each function, ensuring that register states were properly saved and restored.

Final Reflection on PART A

This section was a deep dive into low-level programming and assembly concepts. It challenged my understanding of memory management, register handling, and function calling conventions. By the end, I felt more confident in writing efficient, bug-free assembly code, a foundational skill for building more complex systems in the future.