Due: Nov 18, 2024
This project challenges you to develop a simple yet highly practical system using RISC-V assembly language. Throughout the process, you will explore essential low-level programming concepts, such as:
To make the project engaging, you will implement various matrix and vector operations, such as matrix multiplication. These functions will serve as the building blocks for constructing a simple Artificial Neural Network (ANN) capable of classifying handwritten digits. Through this exercise, you will observe how ANNs can be implemented using basic numerical operations like vector inner products, matrix multiplications, and non-linear thresholding.
At its core, a neural network attempts to approximate a (non-linear) function that maps input data to the desired output. A fundamental building block of a neural network is a neuron, which performs a weighted linear combination of inputs followed by a non-linear activation function, such as a threshold.
Consider the following example of a neuron that mimics the logical AND operation:
This simple neuron operation can be viewed as the inner product of two vectors,
You might wonder how these weights are determined. Weight selection is beyond the scope of this project and involves specialized topics in numerical linear algebra, signal processing, machine learning, and optimization. Typically, weights are learned through a process called training, where the network adjusts its parameters to minimize the error between its predictions and the correct outputs. In contrast, the use of trained weights to make predictions is known as inference. For this project, you will only perform inference using pre-trained weights provided by your instructor.
In this project, you will implement a slightly more advanced network capable of classifying handwritten digits. The input to the network will be images from the MNIST dataset, which consists of 60,000 28x28 pixel images of digits ranging from 0 to 9. Each image will be treated as a flattened input vector of size 784 × 1.
The network will rely on pre-trained weight matrices
This specification contains essential details about the provided functions, how to run Venus (RISC-V simulator), and how to test your code. Make sure to read through the entire document and its slides thoroughly, as it provides critical instructions for completing the project successfully.
The following sub-sections contain instructions for specific OSes (operating systems). Please use the instructions for your specific OS; using commands meant for the wrong OS may cause errors and potentially break your system! Note: Homework assignments will support only GNU/Linux and macOS.
Ubuntu 22.04 and later versions include the required programs in the default APT repositories. Use the following commands to install them automatically:
$ sudo apt update
$ sudo apt install curl git openjdk-11-jdk python3 python3-pip
Any terminal and any modern web browser should be sufficient.
$ xcode-select --install
$ java -version
.pkg
installer.
aarch64
version.x64
version.$ python3 --version
All of the code you write for this project will be in RISC-V assembly and executed using the Venus simulator. There are two ways to run your code with Venus: through the web interface or the command line using a Java .jar
file. We recommend that you primarily develop your code locally using the .jar
file and only use the web interface for debugging or quick edits to individual files.
To run a RISC-V file with the .jar
version, use the following command:
$ java -jar venus.jar <FILENAME>
If you encounter an error related to the maximum instruction count being reached (common with larger MNIST inputs), you can increase the limit using the -ms
flag. To remove the limit entirely, set the instruction count to a negative value:
$ java -jar venus.jar -ms -1 <FILENAME>
You can explore additional options by running:
$ java -jar venus.jar -h
Just like the web version, you can disable mutable text sections with the -it
flag.
You can also debug locally by using the print functions provided in utils.s
, such as print_int
and print_int_array
.
.jar
VersionThe tracer feature can be enabled when using the .jar
file. Use the -t
flag to turn on tracing, and you can further customize its behavior with:
-tf
: Load the trace pattern from a file.-tp
: Specify the trace pattern directly via the command line.-tb
: Define the base of the register output (e.g., decimal, hexadecimal).Here is an example of how to print each instruction along with the program counter (pc
) and register x1
in hexadecimal at every step of the program:
java -jar venus.jar <FILENAME> -t -tb 16 -tp "%decode%\n%pc%\n%x1%\n"
The tests/
directory contains RISC-V files for testing the functions you’ll write. Each function you need to implement, except for the main
function, has a corresponding test file. These tests fall into two categories:
Keep in mind that the autograder provides only basic sanity checks. It is important to create and run your own tests to thoroughly validate your code.
We have also included pregenerated inputs in the inputs
directory, which may be helpful in your test creation. Additionally, the outputs
directory contains the expected outputs for some of the tests run by the test runner.
To simplify testing, we added scripts to help you run and verify your tests. Running the test runner without any arguments will execute all available tests in the tests directory:
$ bash test.sh all
To run specific tests, pass their test IDs as arguments. The test IDs match the names of the .s
files. For example, to run the argmax tests:
$ bash test.sh test_argmax
Note: Make sure to run the above command within the top-level directory.
Your code will be evaluated according to the RISC-V calling conventions discussed in lectures. Functions that modify callee-saved register* (registers that must be preserved according to the calling convention) are required to include a prologue and epilogue. These sections ensure that such register values are saved to the stack at the beginning of the function and restored at the end.
While sanity tests in Part A will not enforce calling conventions, all subsequent tests in Part B and beyond will check for adherence to these conventions. Following these conventions is essential, as your functions will call other functions you write. Maintaining the abstraction barrier provided by the conventions will help you avoid bugs and make your code easier to manage.
We have included # Prologue
and # Epilogue
comments in your function templates as reminders. However, depending on your implementation, some functions may not need a prologue or epilogue. If they are not necessary, feel free to remove or ignore the provided comments.
For a more detailed explanation of the RISC-V calling conventions, refer to the notes from a former head TA provided with your course materials.
Many of the autograder tests directly evaluate your adherence to the calling conventions. Failing to follow them will result in significant point deductions. Make sure to carefully review the conventions as you implement your functions.
-ms
flag to increase the instruction limit or remove it with -ms -1
.By adhering to the RISC-V conventions and carefully managing memory, you can avoid common pitfalls and ensure your code runs smoothly in Venus.
In this section, you will implement essential matrix operations used in neural networks. Specifically, you’ll create functions for dot product, matrix multiplication, element-wise ReLU, and argmax.
Note: Only the sanity test for Part A will not evaluate adherence to the calling convention.
All two-dimensional matrices in this project will be stored as 1D vectors in row-major order. This means the rows of the matrix are concatenated to form a single continuous array. Alternatively, matrices could be stored in column-major order, but in this project, we stick to row-major order.
For more information on row-major vs. column-major order, refer to this Wikipedia page.
The stride of a vector refers to the number of memory locations between consecutive elements, measured in the size of the vector's elements. For example:
a[0], a[1], a[2]
).a[0], a[4], a[8]
).In RISC-V, accessing the
a[i * s] or *(a + i * s)
In relu.s
, implement the ReLU function, which applies the transformation:
Each element of the input array will be individually processed by setting negative values to 0. Since the matrix is stored as a 1D row-major vector, this function operates directly on the flattened array.
Use test_relu.s
to set up and run tests on your ReLU function. You can define the matrix values in static memory, and the test will print the matrix before and after applying ReLU.
In argmax.s
, implement the argmax function, which returns the index of the largest element in a given vector. If multiple elements share the largest value, return the smallest index. This function operates on 1D vectors.
Use test_argmax.s
to test the function. You can modify the static vector and its length in the test file. Running the test will print the index of the largest element.
In dot.s
, implement the dot product function, defined as:
You will need to account for stride when accessing the vector elements. No overflow handling is required, so you will not need the mulh
instruction.
Fill out test_dot.s
using the provided starter code to test your dot product function. Below is an example:
v0 = [1, 2, 3]
v1 = [1, 3, 5]
dot(v0, v1) = 1 * 1 + 2 * 3 + 3 * 5 = 22
In matmul.s
, implement matrix multiplication, where:
Given matrices
If the dimensions of the matrices are incompatible, the program should exit with code 4.
Use test_matmul.s
to test your matrix multiplication. Here is an example:
m0 = [1, 2, 3
4, 5, 6
7, 8, 9]
m1 = [1, 2, 3
4, 5, 6
7, 8, 9]
matmul(m0, m1) =
[30, 36, 42
66, 81, 96
102, 126, 150]
This section focuses on reading and writing matrices to files and building the main function to perform digit classification using the pretrained MNIST weights.
Use the convert.py
tool to convert between binary and plaintext formats.
In read_matrix.s
, implement the function to read a binary matrix from a file and load it into memory. If any file operation fails, exit with the following codes:
In write_matrix.s
, implement the function to write a matrix to a binary file. Use the following exit codes for errors:
In classify.s
, bring everything together to classify an input using two weight matrices and the ReLU and ArgMax functions. Use the following sequence:
argmax
to find the index of the highest score.Ensure that all dynamically allocated memory is freed after use.
Here are the steps to complete your assignment:
classify-rv32i
directory.tools/download_tools.sh
to download the Venus simulator../test.sh all
to initiate the test suite. You will encounter multiple failures that require you to implement your RISC-V assembly routines, replacing the original code (in src
directory except the file src/utils.s
) to pass the tests. mul
, are not permitted. Once your changes are complete, use git commands to commit your work and push it to your forked repository.test_abs_minus_one (__main__.TestAbs.test_abs_minus_one) ... ok
test_abs_one (__main__.TestAbs.test_abs_one) ... ok
test_abs_zero (__main__.TestAbs.test_abs_zero) ... ok
test_argmax_invalid_n (__main__.TestArgmax.test_argmax_invalid_n) ... ok
test_argmax_length_1 (__main__.TestArgmax.test_argmax_length_1) ... ok
test_argmax_standard (__main__.TestArgmax.test_argmax_standard) ... ok
test_chain_1 (__main__.TestChain.test_chain_1) ... ok
test_classify_1_silent (__main__.TestClassify.test_classify_1_silent) ... ok
test_classify_2_print (__main__.TestClassify.test_classify_2_print) ... ok
test_classify_3_print (__main__.TestClassify.test_classify_3_print) ... ok
test_classify_fail_malloc (__main__.TestClassify.test_classify_fail_malloc) ... ok
test_classify_not_enough_args (__main__.TestClassify.test_classify_not_enough_args) ... ok
test_dot_length_1 (__main__.TestDot.test_dot_length_1) ... ok
test_dot_length_error (__main__.TestDot.test_dot_length_error) ... ok
test_dot_length_error2 (__main__.TestDot.test_dot_length_error2) ... ok
test_dot_standard (__main__.TestDot.test_dot_standard) ... ok
test_dot_stride (__main__.TestDot.test_dot_stride) ... ok
test_dot_stride_error1 (__main__.TestDot.test_dot_stride_error1) ... ok
test_dot_stride_error2 (__main__.TestDot.test_dot_stride_error2) ... ok
test_matmul_incorrect_check (__main__.TestMatmul.test_matmul_incorrect_check) ... ok
test_matmul_length_1 (__main__.TestMatmul.test_matmul_length_1) ... ok
test_matmul_negative_dim_m0_x (__main__.TestMatmul.test_matmul_negative_dim_m0_x) ... ok
test_matmul_negative_dim_m0_y (__main__.TestMatmul.test_matmul_negative_dim_m0_y) ... ok
test_matmul_negative_dim_m1_x (__main__.TestMatmul.test_matmul_negative_dim_m1_x) ... ok
test_matmul_negative_dim_m1_y (__main__.TestMatmul.test_matmul_negative_dim_m1_y) ... ok
test_matmul_nonsquare_1 (__main__.TestMatmul.test_matmul_nonsquare_1) ... ok
test_matmul_nonsquare_2 (__main__.TestMatmul.test_matmul_nonsquare_2) ... ok
test_matmul_nonsquare_outer_dims (__main__.TestMatmul.test_matmul_nonsquare_outer_dims) ... ok
test_matmul_square (__main__.TestMatmul.test_matmul_square) ... ok
test_matmul_unmatched_dims (__main__.TestMatmul.test_matmul_unmatched_dims) ... ok
test_matmul_zero_dim_m0 (__main__.TestMatmul.test_matmul_zero_dim_m0) ... ok
test_matmul_zero_dim_m1 (__main__.TestMatmul.test_matmul_zero_dim_m1) ... ok
test_read_1 (__main__.TestReadMatrix.test_read_1) ... ok
test_read_2 (__main__.TestReadMatrix.test_read_2) ... ok
test_read_3 (__main__.TestReadMatrix.test_read_3) ... ok
test_read_fail_fclose (__main__.TestReadMatrix.test_read_fail_fclose) ... ok
test_read_fail_fopen (__main__.TestReadMatrix.test_read_fail_fopen) ... ok
test_read_fail_fread (__main__.TestReadMatrix.test_read_fail_fread) ... ok
test_read_fail_malloc (__main__.TestReadMatrix.test_read_fail_malloc) ... ok
test_relu_invalid_n (__main__.TestRelu.test_relu_invalid_n) ... ok
test_relu_length_1 (__main__.TestRelu.test_relu_length_1) ... ok
test_relu_standard (__main__.TestRelu.test_relu_standard) ... ok
test_write_1 (__main__.TestWriteMatrix.test_write_1) ... ok
test_write_fail_fclose (__main__.TestWriteMatrix.test_write_fail_fclose) ... ok
test_write_fail_fopen (__main__.TestWriteMatrix.test_write_fail_fopen) ... ok
test_write_fail_fwrite (__main__.TestWriteMatrix.test_write_fail_fwrite) ... ok
----------------------------------------------------------------------
Ran 46 tests in 26.208s
OK
README.md
file to document your work, explaining the functionality of the essential operations and detailing how you addressed and overcame the challenges. Of course, you must write in clear and expressive English.