Assignment2: Complete Applications

Due: Nov 18, 2024

Project Overview

This project challenges you to develop a simple yet highly practical system using RISC-V assembly language. Throughout the process, you will explore essential low-level programming concepts, such as:

Efficient use of registers for optimized code execution.
Writing functions and adhering to RISC-V calling conventions for both internal and external functions.
Memory management by allocating space on the stack and heap.
Pointer manipulation for working with matrix and vector data structures.

To make the project engaging, you will implement various matrix and vector operations, such as matrix multiplication. These functions will serve as the building blocks for constructing a simple Artificial Neural Network (ANN) capable of classifying handwritten digits. Through this exercise, you will observe how ANNs can be implemented using basic numerical operations like vector inner products, matrix multiplications, and non-linear thresholding.

Overview of Neural Networks

At its core, a neural network attempts to approximate a (non-linear) function that maps input data to the desired output. A fundamental building block of a neural network is a neuron, which performs a weighted linear combination of inputs followed by a non-linear activation function, such as a threshold.

Consider the following example of a neuron that mimics the logical AND operation:

With input
$A = 0$ ,
$B = 0$ , the weighted sum is
$0 \times 0.6 + 0 \times 0.6 = 0$ . Since this is below the threshold of 1, the output is 0.
For inputs
$A = 0$ ,
$B = 1$ or
$A = 1$ ,
$B = 0$ , the sum becomes
$1 \times 0.6 + 0 \times 0.6 = 0.6$ , still below the threshold, resulting in 0.
Finally, with
$A = 1$ ,
$B = 1$ , the sum is
$1 \times 0.6 + 1 \times 0.6 = 1.2$ . This exceeds the threshold, producing an output of 1.

This simple neuron operation can be viewed as the inner product of two vectors,

[A, B]^{T}

and

[0.6, 0.6]^{T}

, followed by a non-linear threshold function.

You might wonder how these weights are determined. Weight selection is beyond the scope of this project and involves specialized topics in numerical linear algebra, signal processing, machine learning, and optimization. Typically, weights are learned through a process called training, where the network adjusts its parameters to minimize the error between its predictions and the correct outputs. In contrast, the use of trained weights to make predictions is known as inference. For this project, you will only perform inference using pre-trained weights provided by your instructor.

Handwritten Digit Classification

In this project, you will implement a slightly more advanced network capable of classifying handwritten digits. The input to the network will be images from the MNIST dataset, which consists of 60,000 28x28 pixel images of digits ranging from 0 to 9. Each image will be treated as a flattened input vector of size 784 × 1.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

The network will rely on pre-trained weight matrices

m_{0}

and

m_{1}

to perform matrix multiplications. Unlike the earlier example using a threshold function, this network will apply two types of non-linear activation functions:

ReLU (Rectified Linear Unit): ReLU sets all negative values to zero, introducing non-linearity.
ArgMax: This function identifies the output neuron with the highest value, indicating the predicted class.

Check Lab3 of CMU CS61C

This specification contains essential details about the provided functions, how to run Venus (RISC-V simulator), and how to test your code. Make sure to read through the entire document and its slides thoroughly, as it provides critical instructions for completing the project successfully.

Install required software

The following sub-sections contain instructions for specific OSes (operating systems). Please use the instructions for your specific OS; using commands meant for the wrong OS may cause errors and potentially break your system! Note: Homework assignments will support only GNU/Linux and macOS.

Ubuntu Linux

Ubuntu 22.04 and later versions include the required programs in the default APT repositories. Use the following commands to install them automatically:

$ sudo apt update
$ sudo apt install curl git openjdk-11-jdk python3 python3-pip

Any terminal and any modern web browser should be sufficient.

macOS

Open the Terminal app.
Install the Xcode Command Line Tools:
```
$ xcode-select --install
```
Check your Java version:
```
$ java -version
```
If you don't already have Java 11 or higher, download and install Adoptium OpenJDK 11. We recommend using the .pkg installer.
- If you have an Apple Silicon CPU, choose the aarch64 version.
- For Intel/AMD CPUs, choose the x64 version.
Check your Python version:
```
$ python3 --version
```
If you don't already have Python 3.6 or higher, download and install Python 3.

Running RISC-V in Venus

All of the code you write for this project will be in RISC-V assembly and executed using the Venus simulator. There are two ways to run your code with Venus: through the web interface or the command line using a Java .jar file. We recommend that you primarily develop your code locally using the .jar file and only use the web interface for debugging or quick edits to individual files.

To run a RISC-V file with the .jar version, use the following command:

$ java -jar venus.jar <FILENAME>

If you encounter an error related to the maximum instruction count being reached (common with larger MNIST inputs), you can increase the limit using the -ms flag. To remove the limit entirely, set the instruction count to a negative value:

$ java -jar venus.jar -ms -1 <FILENAME>

You can explore additional options by running:

$ java -jar venus.jar -h

Just like the web version, you can disable mutable text sections with the -it flag.

You can also debug locally by using the print functions provided in utils.s, such as print_int and print_int_array.

Enabling the Tracer in the `.jar` Version

The tracer feature can be enabled when using the .jar file. Use the -t flag to turn on tracing, and you can further customize its behavior with:

-tf: Load the trace pattern from a file.
-tp: Specify the trace pattern directly via the command line.
-tb: Define the base of the register output (e.g., decimal, hexadecimal).

Here is an example of how to print each instruction along with the program counter (pc) and register x1 in hexadecimal at every step of the program:

java -jar venus.jar <FILENAME> -t -tb 16 -tp "%decode%\n%pc%\n%x1%\n"

Testing Framework

The tests/ directory contains RISC-V files for testing the functions you’ll write. Each function you need to implement, except for the main function, has a corresponding test file. These tests fall into two categories:

Pre-provided tests: Fully implemented tests that only require input modification.
Starter code: Partial tests that you will need to complete.

Keep in mind that the autograder provides only basic sanity checks. It is important to create and run your own tests to thoroughly validate your code.

We have also included pregenerated inputs in the inputs directory, which may be helpful in your test creation. Additionally, the outputs directory contains the expected outputs for some of the tests run by the test runner.

Using the Test Runner

To simplify testing, we added scripts to help you run and verify your tests. Running the test runner without any arguments will execute all available tests in the tests directory:

$ bash test.sh all

To run specific tests, pass their test IDs as arguments. The test IDs match the names of the .s files. For example, to run the argmax tests:

$ bash test.sh test_argmax

Note: Make sure to run the above command within the top-level directory.

RISC-V Calling Convention

Your code will be evaluated according to the RISC-V calling conventions discussed in lectures. Functions that modify callee-saved register* (registers that must be preserved according to the calling convention) are required to include a prologue and epilogue. These sections ensure that such register values are saved to the stack at the beginning of the function and restored at the end.

While sanity tests in Part A will not enforce calling conventions, all subsequent tests in Part B and beyond will check for adherence to these conventions. Following these conventions is essential, as your functions will call other functions you write. Maintaining the abstraction barrier provided by the conventions will help you avoid bugs and make your code easier to manage.

We have included # Prologue and # Epilogue comments in your function templates as reminders. However, depending on your implementation, some functions may not need a prologue or epilogue. If they are not necessary, feel free to remove or ignore the provided comments.

For a more detailed explanation of the RISC-V calling conventions, refer to the notes from a former head TA provided with your course materials.

Important Notes on Autograder Tests

Many of the autograder tests directly evaluate your adherence to the calling conventions. Failing to follow them will result in significant point deductions. Make sure to carefully review the conventions as you implement your functions.

Common Errors and Troubleshooting Tips

"Ran for more than max allowed steps!"
- Issue: Venus terminates your program if it exceeds the maximum step count.
- Cause: This is expected for large inputs (e.g., MNIST), but if it happens with small inputs, you may have an infinite loop.
- Solution: Use the -ms flag to increase the instruction limit or remove it with -ms -1.
"Attempting to access uninitialized memory between the stack and heap."
- Issue: Your code is trying to read or write to an invalid memory region, causing a segmentation fault.
- Cause: This often happens when you don't allocate enough memory or access memory at the wrong address.
- Solutio*: Double-check your memory allocations and ensure you are accessing valid addresses within allocated memory.
"The magic value for this malloc node is incorrect!"
- Issue: Your code is overwriting the malloc metadata or referencing the wrong malloc node.
- Cause: Venus uses a linked list-based malloc system. Each node includes metadata stored just below the pointer to the allocated memory. Writing to an incorrect location can corrupt the metadata, triggering this error.
- Solution: Verify that your code correctly indexes and accesses the allocated memory. Ensure you are not writing beyond the allocated bounds to avoid corrupting malloc metadata.

By adhering to the RISC-V conventions and carefully managing memory, you can avoid common pitfalls and ensure your code runs smoothly in Venus.

Part A: Mathematical Functions

In this section, you will implement essential matrix operations used in neural networks. Specifically, you’ll create functions for dot product, matrix multiplication, element-wise ReLU, and argmax.

Note: Only the sanity test for Part A will not evaluate adherence to the calling convention.

Matrix Representation

All two-dimensional matrices in this project will be stored as 1D vectors in row-major order. This means the rows of the matrix are concatenated to form a single continuous array. Alternatively, matrices could be stored in column-major order, but in this project, we stick to row-major order.

For more information on row-major vs. column-major order, refer to this Wikipedia page.

Vector/Array Strides

The stride of a vector refers to the number of memory locations between consecutive elements, measured in the size of the vector's elements. For example:

Stride = 1: Elements are stored contiguously (e.g., a[0], a[1], a[2]).
Stride = 4: Elements are spaced apart in memory (e.g., a[0], a[4], a[8]).

In RISC-V, accessing the

i

-th element of a vector with stride

s

is expressed as:

a[i * s] or *(a + i * s)

Task 1: ReLU

In relu.s, implement the ReLU function, which applies the transformation:

ReLU (a) = max (a, 0)

Each element of the input array will be individually processed by setting negative values to 0. Since the matrix is stored as a 1D row-major vector, this function operates directly on the flattened array.

Use test_relu.s to set up and run tests on your ReLU function. You can define the matrix values in static memory, and the test will print the matrix before and after applying ReLU.

Task 2: ArgMax

In argmax.s, implement the argmax function, which returns the index of the largest element in a given vector. If multiple elements share the largest value, return the smallest index. This function operates on 1D vectors.

Use test_argmax.s to test the function. You can modify the static vector and its length in the test file. Running the test will print the index of the largest element.

Task 3.1: Dot Product

In dot.s, implement the dot product function, defined as:

dot (a, b) = \sum_{i = 0}^{n - 1} (a_{i} \cdot b_{i})

You will need to account for stride when accessing the vector elements. No overflow handling is required, so you will not need the mulh instruction.

Fill out test_dot.s using the provided starter code to test your dot product function. Below is an example:

Vectors:

v0 = [1, 2, 3]  
v1 = [1, 3, 5]

Result:

dot(v0, v1) = 1 * 1 + 2 * 3 + 3 * 5 = 22

Task 3.2: Matrix Multiplication

In matmul.s, implement matrix multiplication, where:

C [i] [j] = dot (A [i], B [:, j])

Given matrices

A

(size

n \times m

) and

B

(size

m \times k

), the output matrix

C

will have dimensions

n \times k

Rows of matrix
$A$ will have stride = 1.
Columns of matrix
$B$ will require calculating the correct starting index and stride.

If the dimensions of the matrices are incompatible, the program should exit with code 4.

Use test_matmul.s to test your matrix multiplication. Here is an example:

Input Matrices

m0 = [1, 2, 3
      4, 5, 6
      7, 8, 9]
m1 = [1, 2, 3
      4, 5, 6
      7, 8, 9]

Output Matrix:

matmul(m0, m1) =
     [30, 36, 42
      66, 81, 96
      102, 126, 150]

Part B: File Operations and Main

This section focuses on reading and writing matrices to files and building the main function to perform digit classification using the pretrained MNIST weights.

Matrix File Format

Plaintext: Begins with two integers (rows, columns), followed by matrix rows.
Binary: Stores matrix dimensions in the first 8 bytes (two 4-byte integers), followed by matrix elements in row-major order.

Use the convert.py tool to convert between binary and plaintext formats.

Task 1: Read Matrix

In read_matrix.s, implement the function to read a binary matrix from a file and load it into memory. If any file operation fails, exit with the following codes:

48: Malloc error
50: fopen error
51: fread error
52: fclose error

Task 2: Write Matrix

In write_matrix.s, implement the function to write a matrix to a binary file. Use the following exit codes for errors:

53: fopen error
54: fwrite error
55: fclose error

Task 3: Classification

In classify.s, bring everything together to classify an input using two weight matrices and the ReLU and ArgMax functions. Use the following sequence:

Matrix Multiplication:

$hidden_layer = matmul (m 0, input)$
ReLU Activation:

$relu(hidden_layer)$
Second Matrix Multiplication:

$scores = matmul (m 1, hidden_layer)$
Classification:
Call argmax to find the index of the highest score.

Ensure that all dynamically allocated memory is freed after use.

Submission and Grading

Here are the steps to complete your assignment:

Fork the repository at classify-rv32i on GitHub, and make sure that your forked repository is publicly accessible.
$\to$ Fork a repository
Clone the workspace from your forked repository and navigate to the classify-rv32i directory.
Run tools/download_tools.sh to download the Venus simulator.
Execute ./test.sh all to initiate the test suite. You will encounter multiple failures that require you to implement your RISC-V assembly routines, replacing the original code (in src directory except the file src/utils.s) to pass the tests.
$\to$ Pushing commits to a remote repository
While making your modifications, you must use only RV32I instructions; instructions from the M extension, such as mul, are not permitted. Once your changes are complete, use git commands to commit your work and push it to your forked repository.
Submit your code for both Part A and Part B. The autograder will run sanity tests, but most grading will rely on hidden tests that evaluate correctness and adherence to RISC-V calling conventions. Expected output:

test_abs_minus_one (__main__.TestAbs.test_abs_minus_one) ... ok
test_abs_one (__main__.TestAbs.test_abs_one) ... ok
test_abs_zero (__main__.TestAbs.test_abs_zero) ... ok
test_argmax_invalid_n (__main__.TestArgmax.test_argmax_invalid_n) ... ok
test_argmax_length_1 (__main__.TestArgmax.test_argmax_length_1) ... ok
test_argmax_standard (__main__.TestArgmax.test_argmax_standard) ... ok
test_chain_1 (__main__.TestChain.test_chain_1) ... ok
test_classify_1_silent (__main__.TestClassify.test_classify_1_silent) ... ok
test_classify_2_print (__main__.TestClassify.test_classify_2_print) ... ok
test_classify_3_print (__main__.TestClassify.test_classify_3_print) ... ok
test_classify_fail_malloc (__main__.TestClassify.test_classify_fail_malloc) ... ok
test_classify_not_enough_args (__main__.TestClassify.test_classify_not_enough_args) ... ok
test_dot_length_1 (__main__.TestDot.test_dot_length_1) ... ok
test_dot_length_error (__main__.TestDot.test_dot_length_error) ... ok
test_dot_length_error2 (__main__.TestDot.test_dot_length_error2) ... ok
test_dot_standard (__main__.TestDot.test_dot_standard) ... ok
test_dot_stride (__main__.TestDot.test_dot_stride) ... ok
test_dot_stride_error1 (__main__.TestDot.test_dot_stride_error1) ... ok
test_dot_stride_error2 (__main__.TestDot.test_dot_stride_error2) ... ok
test_matmul_incorrect_check (__main__.TestMatmul.test_matmul_incorrect_check) ... ok
test_matmul_length_1 (__main__.TestMatmul.test_matmul_length_1) ... ok
test_matmul_negative_dim_m0_x (__main__.TestMatmul.test_matmul_negative_dim_m0_x) ... ok
test_matmul_negative_dim_m0_y (__main__.TestMatmul.test_matmul_negative_dim_m0_y) ... ok
test_matmul_negative_dim_m1_x (__main__.TestMatmul.test_matmul_negative_dim_m1_x) ... ok
test_matmul_negative_dim_m1_y (__main__.TestMatmul.test_matmul_negative_dim_m1_y) ... ok
test_matmul_nonsquare_1 (__main__.TestMatmul.test_matmul_nonsquare_1) ... ok
test_matmul_nonsquare_2 (__main__.TestMatmul.test_matmul_nonsquare_2) ... ok
test_matmul_nonsquare_outer_dims (__main__.TestMatmul.test_matmul_nonsquare_outer_dims) ... ok
test_matmul_square (__main__.TestMatmul.test_matmul_square) ... ok
test_matmul_unmatched_dims (__main__.TestMatmul.test_matmul_unmatched_dims) ... ok
test_matmul_zero_dim_m0 (__main__.TestMatmul.test_matmul_zero_dim_m0) ... ok
test_matmul_zero_dim_m1 (__main__.TestMatmul.test_matmul_zero_dim_m1) ... ok
test_read_1 (__main__.TestReadMatrix.test_read_1) ... ok
test_read_2 (__main__.TestReadMatrix.test_read_2) ... ok
test_read_3 (__main__.TestReadMatrix.test_read_3) ... ok
test_read_fail_fclose (__main__.TestReadMatrix.test_read_fail_fclose) ... ok
test_read_fail_fopen (__main__.TestReadMatrix.test_read_fail_fopen) ... ok
test_read_fail_fread (__main__.TestReadMatrix.test_read_fail_fread) ... ok
test_read_fail_malloc (__main__.TestReadMatrix.test_read_fail_malloc) ... ok
test_relu_invalid_n (__main__.TestRelu.test_relu_invalid_n) ... ok
test_relu_length_1 (__main__.TestRelu.test_relu_length_1) ... ok
test_relu_standard (__main__.TestRelu.test_relu_standard) ... ok
test_write_1 (__main__.TestWriteMatrix.test_write_1) ... ok
test_write_fail_fclose (__main__.TestWriteMatrix.test_write_fail_fclose) ... ok
test_write_fail_fopen (__main__.TestWriteMatrix.test_write_fail_fopen) ... ok
test_write_fail_fwrite (__main__.TestWriteMatrix.test_write_fail_fwrite) ... ok

----------------------------------------------------------------------
Ran 46 tests in 26.208s

OK

Update the top-level README.md file to document your work, explaining the functionality of the essential operations and detailing how you addressed and overcame the challenges. Of course, you must write in clear and expressive English.
Fill the Google form to submit the URL of your forked repository.