# Assignment2: Complete Applications
> Due: ==Nov 18, 2024==
## Project Overview
This project challenges you to develop a simple yet highly practical system using **RISC-V assembly language**. Throughout the process, you will explore essential low-level programming concepts, such as:
- Efficient use of registers for optimized code execution.
- Writing functions and adhering to RISC-V calling conventions for both internal and external functions.
- Memory management by allocating space on the stack and heap.
- Pointer manipulation for working with matrix and vector data structures.
To make the project engaging, you will implement various matrix and vector operations, such as matrix multiplication. These functions will serve as the building blocks for constructing a simple [Artificial Neural Network](https://en.wikipedia.org/wiki/Neural_network_(machine_learning)) (ANN) capable of classifying handwritten digits. Through this exercise, you will observe how ANNs can be implemented using basic numerical operations like vector inner products, matrix multiplications, and non-linear thresholding.
## Overview of Neural Networks
At its core, a **neural network** attempts to approximate a (non-linear) function that maps input data to the desired output. A fundamental building block of a neural network is a **neuron**, which performs a weighted linear combination of inputs followed by a non-linear activation function, such as a threshold.
Consider the following example of a neuron that mimics the logical AND operation:
- With input $A = 0$, $B = 0$, the weighted sum is $0 \times 0.6 + 0 \times 0.6 = 0$. Since this is below the threshold of 1, the output is 0.
- For inputs $A = 0$, $B = 1$ or $A = 1$, $B = 0$, the sum becomes $1 \times 0.6 + 0 \times 0.6 = 0.6$, still below the threshold, resulting in 0.
- Finally, with $A = 1$, $B = 1$, the sum is $1 \times 0.6 + 1 \times 0.6 = 1.2$. This exceeds the threshold, producing an output of 1.
This simple neuron operation can be viewed as the **inner product** of two vectors, $[A, B]^T$ and $[0.6, 0.6]^T$, followed by a non-linear threshold function.
You might wonder how these weights are determined. Weight selection is beyond the scope of this project and involves specialized topics in **numerical linear algebra, signal processing, machine learning, and optimization**. Typically, weights are learned through a process called training, where the network adjusts its parameters to minimize the error between its predictions and the correct outputs. In contrast, the use of trained weights to make predictions is known as inference. For this project, you will only perform inference using pre-trained weights provided by your instructor.
## Handwritten Digit Classification
In this project, you will implement a slightly more advanced network capable of classifying handwritten digits. The input to the network will be images from the [MNIST dataset](https://www.tensorflow.org/datasets/catalog/mnist), which consists of 60,000 28x28 pixel images of digits ranging from 0 to 9. Each image will be treated as a flattened input vector of size 784 × 1.
![handwriting](https://hackmd.io/_uploads/rkH6uViJ1g.png =60%x)
The network will rely on pre-trained weight matrices $m_0$ and $m_1$ to perform matrix multiplications. Unlike the earlier example using a threshold function, this network will apply two types of non-linear activation functions:
- ReLU (Rectified Linear Unit): ReLU sets all negative values to zero, introducing non-linearity.
- ArgMax: This function identifies the output neuron with the highest value, indicating the predicted class.
## Check Lab3 of CMU CS61C
This specification contains essential details about the provided functions, how to run Venus (RISC-V simulator), and how to test your code. Make sure to read through the [entire document](https://cs61c.org/fa24/labs/lab03/) and its [slides](https://docs.google.com/presentation/d/1vsXYoqsyJtZTE1_z2wtnXgsnbZaVFHGLjQdJsckxFIc/edit?usp=sharing) thoroughly, as it provides critical instructions for completing the project successfully.
## Install required software
The following sub-sections contain instructions for specific OSes (operating systems). **Please use the instructions for your specific OS**; using commands meant for the wrong OS may cause errors and potentially break your system! **Note:** Homework assignments will support only **GNU/Linux and macOS**.
### Ubuntu Linux
Ubuntu 22.04 and later versions include the required programs in the default APT repositories. Use the following commands to install them automatically:
```shell
$ sudo apt update
$ sudo apt install curl git openjdk-11-jdk python3 python3-pip
```
Any terminal and any modern web browser should be sufficient.
### macOS
1. Open the **Terminal** app.
2. Install the Xcode Command Line Tools:
```shell
$ xcode-select --install
```
3. Check your Java version:
```shell
$ java -version
```
If you don't already have Java 11 or higher, download and install **Adoptium OpenJDK 11**. We recommend using the `.pkg` installer.
- If you have an **Apple Silicon CPU**, choose the `aarch64` version.
- For **Intel/AMD CPUs**, choose the `x64` version.
4. Check your Python version:
```shell
$ python3 --version
```
If you don't already have Python 3.6 or higher, download and install Python 3.
## Running RISC-V in Venus
All of the code you write for this project will be in RISC-V assembly and executed using the Venus simulator. There are two ways to run your code with Venus: through the web interface or the command line using a Java `.jar` file. We recommend that you primarily develop your code locally using the `.jar` file and only use the web interface for debugging or quick edits to individual files.
To run a RISC-V file with the `.jar` version, use the following command:
```shell
$ java -jar venus.jar <FILENAME>
```
If you encounter an error related to the **maximum instruction count** being reached (common with larger MNIST inputs), you can increase the limit using the `-ms` flag. To remove the limit entirely, set the instruction count to a negative value:
```shell
$ java -jar venus.jar -ms -1 <FILENAME>
```
You can explore additional options by running:
```shell
$ java -jar venus.jar -h
```
Just like the web version, you can disable **mutable text sections** with the `-it` flag.
You can also debug locally by using the print functions provided in `utils.s`, such as `print_int` and `print_int_array`.
### Enabling the Tracer in the `.jar` Version
The tracer feature can be enabled when using the `.jar` file. Use the `-t` flag to turn on tracing, and you can further customize its behavior with:
- `-tf`: Load the trace pattern from a file.
- `-tp`: Specify the trace pattern directly via the command line.
- `-tb`: Define the **base** of the register output (e.g., decimal, hexadecimal).
Here is an example of how to print each instruction along with the program counter (`pc`) and register `x1` in hexadecimal at every step of the program:
```shell
java -jar venus.jar <FILENAME> -t -tb 16 -tp "%decode%\n%pc%\n%x1%\n"
```
## Testing Framework
The `tests/` directory contains RISC-V files for testing the functions you’ll write. Each function you need to implement, except for the `main` function, has a corresponding test file. These tests fall into two categories:
- Pre-provided tests: Fully implemented tests that only require input modification.
- Starter code: Partial tests that you will need to complete.
Keep in mind that the autograder provides only basic sanity checks. It is important to create and run your own tests to thoroughly validate your code.
We have also included pregenerated inputs in the `inputs` directory, which may be helpful in your test creation. Additionally, the `outputs` directory contains the expected outputs for some of the tests run by the test runner.
### Using the Test Runner
To simplify testing, we added scripts to help you run and verify your tests. Running the test runner without any arguments will execute all available tests in the **tests** directory:
```shell
$ bash test.sh all
```
To run specific tests, pass their test IDs as arguments. The test IDs match the names of the `.s` files. For example, to run the argmax tests:
```shell
$ bash test.sh test_argmax
```
Note: Make sure to run the above command within the top-level directory.
## RISC-V Calling Convention
Your code will be evaluated according to the RISC-V calling conventions discussed in lectures. Functions that modify callee-saved register* (registers that must be preserved according to the calling convention) are required to include a **prologue** and **epilogue**. These sections ensure that such register values are saved to the stack at the beginning of the function and restored at the end.
While sanity tests in Part A will not enforce calling conventions, all subsequent tests in Part B and beyond will check for adherence to these conventions. Following these conventions is essential, as your functions will call other functions you write. Maintaining the abstraction barrier provided by the conventions will help you avoid bugs and make your code easier to manage.
We have included `# Prologue` and `# Epilogue` comments in your function templates as reminders. However, depending on your implementation, some functions may not need a prologue or epilogue. If they are not necessary, feel free to remove or ignore the provided comments.
For a more detailed explanation of the **RISC-V calling conventions**, refer to the notes from a former head TA provided with your course materials.
### Important Notes on Autograder Tests
Many of the autograder tests directly evaluate your adherence to the calling conventions. Failing to follow them will result in significant point deductions. Make sure to carefully review the conventions as you implement your functions.
### Common Errors and Troubleshooting Tips
1. "Ran for more than max allowed steps!"
- Issue: Venus terminates your program if it exceeds the maximum step count.
- Cause: This is expected for large inputs (e.g., MNIST), but if it happens with small inputs, you may have an infinite loop.
- Solution: Use the `-ms` flag to increase the instruction limit or remove it with `-ms -1`.
2. "Attempting to access uninitialized memory between the stack and heap."
- Issue: Your code is trying to read or write to an invalid memory region, causing a **segmentation fault**.
- Cause: This often happens when you don't allocate enough memory or access memory at the wrong address.
- Solutio*: Double-check your memory allocations and ensure you are accessing valid addresses within allocated memory.
3. "The magic value for this malloc node is incorrect!"
- Issue: Your code is overwriting the **malloc metadata** or referencing the wrong malloc node.
- Cause: Venus uses a **linked list-based malloc** system. Each node includes metadata stored just below the pointer to the allocated memory. Writing to an incorrect location can corrupt the metadata, triggering this error.
- Solution: Verify that your code correctly indexes and accesses the allocated memory. Ensure you are not writing beyond the allocated bounds to avoid corrupting malloc metadata.
By adhering to the RISC-V conventions and carefully managing memory, you can avoid common pitfalls and ensure your code runs smoothly in Venus.
## Part A: Mathematical Functions
In this section, you will implement essential matrix operations used in neural networks. Specifically, you’ll create functions for dot product, matrix multiplication, element-wise ReLU, and argmax.
Note: Only the sanity test for Part A will not evaluate adherence to the calling convention.
### Matrix Representation
All two-dimensional matrices in this project will be stored as 1D vectors in row-major order. This means the rows of the matrix are concatenated to form a single continuous array. Alternatively, matrices could be stored in column-major order, but in this project, we stick to row-major order.
For more information on **row-major vs. column-major order**, refer to [this Wikipedia page](https://en.wikipedia.org/wiki/Row-_and_column-major_order).
### Vector/Array Strides
The stride of a vector refers to the number of memory locations between consecutive elements, measured in the size of the vector's elements. For example:
- Stride = 1: Elements are stored contiguously (e.g., `a[0], a[1], a[2]`).
- Stride = 4: Elements are spaced apart in memory (e.g., `a[0], a[4], a[8]`).
In RISC-V, accessing the $i$-th element of a vector with stride $s$ is expressed as:
```c
a[i * s] or *(a + i * s)
```
### Task 1: ReLU
In `relu.s`, implement the ReLU function, which applies the transformation:
$$
\text{ReLU}(a) = \max(a, 0)
$$
Each element of the input array will be individually processed by setting negative values to 0. Since the matrix is stored as a **1D row-major vector**, this function operates directly on the flattened array.
Use `test_relu.s` to set up and run tests on your ReLU function. You can define the matrix values in static memory, and the test will print the matrix before and after applying ReLU.
### Task 2: ArgMax
In `argmax.s`, implement the argmax function, which returns the index of the largest element in a given vector. If multiple elements share the largest value, return the smallest index. This function operates on 1D vectors.
Use `test_argmax.s` to test the function. You can modify the static vector and its length in the test file. Running the test will print the index of the largest element.
### Task 3.1: Dot Product
In `dot.s`, implement the dot product function, defined as:
$$
\text{dot}(a, b) = \sum_{i=0}^{n-1} (a_i \cdot b_i)
$$
You will need to account for stride when accessing the vector elements. No overflow handling is required, so you will not need the `mulh` instruction.
Fill out `test_dot.s` using the provided starter code to test your dot product function. Below is an example:
- Vectors:
```
v0 = [1, 2, 3]
v1 = [1, 3, 5]
```
- Result:
```
dot(v0, v1) = 1 * 1 + 2 * 3 + 3 * 5 = 22
```
### Task 3.2: Matrix Multiplication
In `matmul.s`, implement matrix multiplication, where:
$$
C[i][j] = \text{dot}(A[i], B[:,j])
$$
Given matrices $A$ (size $n \times m$) and $B$ (size $m \times k$), the output matrix $C$ will have dimensions $n \times k$.
- Rows of matrix $A$ will have **stride = 1**.
- Columns of matrix $B$ will require calculating the correct starting index and stride.
If the dimensions of the matrices are incompatible, the program should exit with code 4.
Use `test_matmul.s` to test your matrix multiplication. Here is an example:
- Input Matrices
```
m0 = [1, 2, 3
4, 5, 6
7, 8, 9]
m1 = [1, 2, 3
4, 5, 6
7, 8, 9]
```
- Output Matrix:
```
matmul(m0, m1) =
[30, 36, 42
66, 81, 96
102, 126, 150]
```
## Part B: File Operations and Main
This section focuses on reading and writing matrices to files and building the **main function** to perform digit classification using the pretrained MNIST weights.
### Matrix File Format
- Plaintext: Begins with two integers (rows, columns), followed by matrix rows.
- Binary: Stores matrix dimensions in the first 8 bytes (two 4-byte integers), followed by matrix elements in row-major order.
Use the `convert.py` tool to convert between binary and plaintext formats.
### Task 1: Read Matrix
In `read_matrix.s`, implement the function to **read a binary matrix** from a file and load it into memory. If any file operation fails, exit with the following codes:
- 48: Malloc error
- 50: fopen error
- 51: fread error
- 52: fclose error
### Task 2: Write Matrix
In `write_matrix.s`, implement the function to **write a matrix to a binary file**. Use the following exit codes for errors:
- 53: fopen error
- 54: fwrite error
- 55: fclose error
#### Task 3: Classification
In `classify.s`, bring everything together to classify an input using two weight matrices and the ReLU and ArgMax functions. Use the following sequence:
1. Matrix Multiplication:
$\text{hidden_layer} = \text{matmul}(m0, \text{input})$
2. ReLU Activation:
$\text{relu(hidden_layer)}$
3. Second Matrix Multiplication:
$\text{scores} = \text{matmul}(m1, \text{hidden_layer})$
4. Classification:
Call `argmax` to find the index of the highest score.
Ensure that all dynamically allocated memory is freed after use.
## Submission and Grading
Here are the steps to complete your assignment:
1. Fork the repository at [classify-rv32i](https://github.com/sysprog21/classify-rv32i) on GitHub, and make sure that your forked repository is publicly accessible. $\to$ [Fork a repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo)
2. Clone the workspace from your forked repository and navigate to the `classify-rv32i` directory.
3. Run `tools/download_tools.sh` to download the Venus simulator.
4. Execute `./test.sh all` to initiate the test suite. You will encounter multiple failures that require you to implement your RISC-V assembly routines, replacing the original code (in `src` directory except the file `src/utils.s`) to pass the tests. $\to$ [Pushing commits to a remote repository](https://docs.github.com/en/get-started/using-git/pushing-commits-to-a-remote-repository)
5. While making your modifications, you must use **only RV32I instructions**; instructions from the M extension, such as `mul`, are not permitted. Once your changes are complete, use git commands to commit your work and push it to your forked repository.
6. Submit your code for both Part A and Part B. The autograder will run sanity tests, but most grading will rely on hidden tests that evaluate correctness and adherence to **RISC-V calling conventions**. Expected output:
```
test_abs_minus_one (__main__.TestAbs.test_abs_minus_one) ... ok
test_abs_one (__main__.TestAbs.test_abs_one) ... ok
test_abs_zero (__main__.TestAbs.test_abs_zero) ... ok
test_argmax_invalid_n (__main__.TestArgmax.test_argmax_invalid_n) ... ok
test_argmax_length_1 (__main__.TestArgmax.test_argmax_length_1) ... ok
test_argmax_standard (__main__.TestArgmax.test_argmax_standard) ... ok
test_chain_1 (__main__.TestChain.test_chain_1) ... ok
test_classify_1_silent (__main__.TestClassify.test_classify_1_silent) ... ok
test_classify_2_print (__main__.TestClassify.test_classify_2_print) ... ok
test_classify_3_print (__main__.TestClassify.test_classify_3_print) ... ok
test_classify_fail_malloc (__main__.TestClassify.test_classify_fail_malloc) ... ok
test_classify_not_enough_args (__main__.TestClassify.test_classify_not_enough_args) ... ok
test_dot_length_1 (__main__.TestDot.test_dot_length_1) ... ok
test_dot_length_error (__main__.TestDot.test_dot_length_error) ... ok
test_dot_length_error2 (__main__.TestDot.test_dot_length_error2) ... ok
test_dot_standard (__main__.TestDot.test_dot_standard) ... ok
test_dot_stride (__main__.TestDot.test_dot_stride) ... ok
test_dot_stride_error1 (__main__.TestDot.test_dot_stride_error1) ... ok
test_dot_stride_error2 (__main__.TestDot.test_dot_stride_error2) ... ok
test_matmul_incorrect_check (__main__.TestMatmul.test_matmul_incorrect_check) ... ok
test_matmul_length_1 (__main__.TestMatmul.test_matmul_length_1) ... ok
test_matmul_negative_dim_m0_x (__main__.TestMatmul.test_matmul_negative_dim_m0_x) ... ok
test_matmul_negative_dim_m0_y (__main__.TestMatmul.test_matmul_negative_dim_m0_y) ... ok
test_matmul_negative_dim_m1_x (__main__.TestMatmul.test_matmul_negative_dim_m1_x) ... ok
test_matmul_negative_dim_m1_y (__main__.TestMatmul.test_matmul_negative_dim_m1_y) ... ok
test_matmul_nonsquare_1 (__main__.TestMatmul.test_matmul_nonsquare_1) ... ok
test_matmul_nonsquare_2 (__main__.TestMatmul.test_matmul_nonsquare_2) ... ok
test_matmul_nonsquare_outer_dims (__main__.TestMatmul.test_matmul_nonsquare_outer_dims) ... ok
test_matmul_square (__main__.TestMatmul.test_matmul_square) ... ok
test_matmul_unmatched_dims (__main__.TestMatmul.test_matmul_unmatched_dims) ... ok
test_matmul_zero_dim_m0 (__main__.TestMatmul.test_matmul_zero_dim_m0) ... ok
test_matmul_zero_dim_m1 (__main__.TestMatmul.test_matmul_zero_dim_m1) ... ok
test_read_1 (__main__.TestReadMatrix.test_read_1) ... ok
test_read_2 (__main__.TestReadMatrix.test_read_2) ... ok
test_read_3 (__main__.TestReadMatrix.test_read_3) ... ok
test_read_fail_fclose (__main__.TestReadMatrix.test_read_fail_fclose) ... ok
test_read_fail_fopen (__main__.TestReadMatrix.test_read_fail_fopen) ... ok
test_read_fail_fread (__main__.TestReadMatrix.test_read_fail_fread) ... ok
test_read_fail_malloc (__main__.TestReadMatrix.test_read_fail_malloc) ... ok
test_relu_invalid_n (__main__.TestRelu.test_relu_invalid_n) ... ok
test_relu_length_1 (__main__.TestRelu.test_relu_length_1) ... ok
test_relu_standard (__main__.TestRelu.test_relu_standard) ... ok
test_write_1 (__main__.TestWriteMatrix.test_write_1) ... ok
test_write_fail_fclose (__main__.TestWriteMatrix.test_write_fail_fclose) ... ok
test_write_fail_fopen (__main__.TestWriteMatrix.test_write_fail_fopen) ... ok
test_write_fail_fwrite (__main__.TestWriteMatrix.test_write_fail_fwrite) ... ok
----------------------------------------------------------------------
Ran 46 tests in 26.208s
OK
```
7. Update the top-level `README.md` file to document your work, explaining the functionality of the essential operations and detailing how you addressed and overcame the challenges. Of course, you must write in clear and expressive English.
8. Fill the [Google form](https://docs.google.com/forms/d/e/1FAIpQLSfDzAm5uf_ZU-8txiAsB3aE8cVioc5zro_rk0XYS2R1tQMi2A/viewform) to submit the URL of your forked repository.