# Assignment 2: Classify
## Part A: Mathematical Functions
### Task 1: ReLU
- #### Description
Applies ReLU (Rectified Linear Unit) operation in-place: `For each element x in array: x = max(0, x)`
- #### Implementation
First, check the value is positive or negative. If it is positive, then store the original value. Otherwise, store zero back to the array.
- #### modified part:
``` assembly=
loop_start:
ble a1, x0, finish
lw t2, 0(a0)
blt t2, x0, less_than_zero
sw t2, 0(a0) # store the original value into the array
addi a1, a1, -1
addi a0, a0, 4 # update the position
bge a1, x0, loop_start
less_than_zero: # store zero back to the array
sw x0, 0(a0)
addi a1, a1, -1
addi a0, a0, 4
j loop_start
finish:
jr ra
```
### Task 2: ArgMax
- #### Description
Scans an integer array to find its maximum value and returns the position of its first occurrence. In cases where multiple elements share the maximum value, returns the smallest index.
- #### Implementation
`t3` stores the current max value, and `t1` stores the position of the current max value. When there is a value larger than the current value, it will go to `change_max` tag to change the max value and the position index. Finally, return the position of the max value.
- #### modified part:
``` assembly=
loop_start:
addi t3, x0, -999 # initialized to store the max value
add t4, x0, x0 # loop counter
find_max:
ble a1, x0, finish
lw t0, 0(a0)
addi t4, t4, 1
bgt t0, t3, change_max
addi a1, a1, -1
addi a0, a0, 4 # update the position of the array
bge a1, x0, find_max
change_max: # change the max value and the position
add t3, x0, t0
addi t1, t4, -1
addi a1, a1, -1
addi a0, a0, 4
bge a1, x0, find_max
finish:
add a0, x0, t1
jr ra
```
### Task 3: Dot Product
- #### Description
Calculates `sum(arr0[i * stride0] * arr1[i * stride1])` where i ranges from 0 to (element_count - 1)
I stucked at this task for a while. I originally used the **accumulated** way to calculate multiplication, but when I tested `test_chain`, it would take too much time to execute. As a result, I change to use the **shift_and_add** way to implement multiplication.
- #### Implementation
1. `stride_count`: Computes the strides length of two arrays.
2. `product_start`: Load the values from two arrays and initialize the result register `t4`.
3. `product_loop`: Check the LSB of the mutiplier. If it is zero, go to `skip_add` and left shift the multiplicand `t2`, right shift the multiplier `t3`. Otherwise, add multiplicand to the result register.
4. `product_done`: the multiplication is completed, and add the result into the dot result register `t0`. Go back to `loop_start` to check if there are some elements not been compute.
- #### modified part:
``` assembly=
loop_start:
bge t1, a2, loop_end # loop index comparison
stride_count: # compute the stride of two arrays
beqz t1, product_start # the first element do not need to compute
# computing for arr0
add t5, x0, a3
slli t5, t5, 2
add a0, a0, t5
#computing for arr1
add t5, x0, a4
slli t5, t5, 2
add a1, a1, t5
product_start:
add t4, x0, x0 # initialized to store the product
lw t2, 0(a0) # multiplicand
lw t3, 0(a1) # multiplier
product_loop:
beqz t3, product_done
andi t5, t3, 1 # get the LSB of the multiplier
beq t5, x0, skip_add
add t4, t4, t2
skip_add: # the LSB of the multiplier is zero
slli t2, t2, 1
srli t3, t3, 1
j product_loop
product_done:
add t0, t0, t4 # add the product into the dot result
addi t1, t1, 1
j loop_start
loop_end:
mv a0, t0
jr ra
```
### Task 3-2: Matrix Multiplication
- #### Description
Performs operation: D = Matrix A × Matrix B
Where:
- Matrix A is a (rows0 × cols0) matrix
- Matrix B is a (rows1 × cols1) matrix
- D is a (rows0 × cols1) result matrix
- #### Arguments
- First Matrix (A):
- a0: Memory address of first element
- a1: Row count
- a2: Column count
- Second Matrix (B):
- a3: Memory address of first element
- a4: Row count
- a5: Column count
- Output Matrix (D):
- a6: Memory address for result storage
- #### Implementation
To complete this task, it is important to clearly understand the behavior of inner loop and outer loop.
1. Everytime when the `inner_loop` ended, `s0` was added by 1 to go to the next row. `s3` was added by the number of elements in a row of Matrix A to find the position of the first element in the next row.
2. When the `outer_loop` finishing, we need to recover the registers that temporarily stored in the stack.
- #### modified part:
``` assembly=
inner_loop_end: # update the pointer of Matrix A to the next row
addi s0, s0, 1 # update the outer_loop counter
add t0, x0, a2 # the number of elements in each row
slli t0, t0, 2 # multiplied by 4
add s3, s3, t0
j outer_loop_start
outer_loop_end: # recover the registers and return control to the caller
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
lw s4, 20(sp)
lw s5, 24(sp)
addi sp, sp, 28
jr ra
```
## Part B: File Operations and Main
In this part, we only have to modify the `mul` function in the three tasks.
I use the **accumulate** way to implement the multiplication.
### Task 1: Read Matrix
``` assembly=
# mul s1, t1, t2 # s1 is number of elements
# FIXME: Replace 'mul' with your own implementation
# my implementation
li s1, 0
mul_start:
addi t2, t2, -1
add s1, s1, t1
bnez t2, mul_start
```
### Task 2: Write Matrix
``` assembly=
# mul s4, s2, s3 # s4 = total elements
# FIXME: Replace 'mul' with your own implementation
# my implementation
li s4, 0
mul_start:
addi s3, s3, -1
add s4, s4, s2
bnez s3, mul_start
```
### Task 3: Classification
- #### First part
``` assembly=
# mul a0, t0, t1
# FIXME: Replace 'mul' with your own implementation
# my implementation
li a0, 0
mul_start:
addi t1, t1, -1
add a0, a0, t0
bnez t1, mul_start
```
- #### Second part
``` assembly=
# mul a1, t0, t1 # length of h array and set it as second argument
# FIXME: Replace 'mul' with your own implementation
# my implementation
li a1, 0
mul_start1:
addi t1, t1, -1
add a1, a1, t0
bnez t1, mul_start1
```
- #### Third part
``` assembly=
# mul a0, t0, t1
# FIXME: Replace 'mul' with your own implementation
# my implementation
li a0, 0
mul_start2:
addi t1, t1, -1
add a0, a0, t0
bnez t1, mul_start2
```
- #### Forth part
``` assembly=
mul a1, t0, t1 # load length of array into second arg
# FIXME: Replace 'mul' with your own implementation
# my implementation
li a1, 0
mul_start3:
addi t1, t1, -1
add a1, a1, t0
bnez t1, mul_start3
```
## The testing result
```
test_abs_minus_one (__main__.TestAbs) ... ok
test_abs_one (__main__.TestAbs) ... ok
test_abs_zero (__main__.TestAbs) ... ok
test_argmax_invalid_n (__main__.TestArgmax) ... ok
test_argmax_length_1 (__main__.TestArgmax) ... ok
test_argmax_standard (__main__.TestArgmax) ... ok
test_chain_1 (__main__.TestChain) ... ok
test_classify_1_silent (__main__.TestClassify) ... ok
test_classify_2_print (__main__.TestClassify) ... ok
test_classify_3_print (__main__.TestClassify) ... ok
test_classify_fail_malloc (__main__.TestClassify) ... ok
test_classify_not_enough_args (__main__.TestClassify) ... ok
test_dot_length_1 (__main__.TestDot) ... ok
test_dot_length_error (__main__.TestDot) ... ok
test_dot_length_error2 (__main__.TestDot) ... ok
test_dot_standard (__main__.TestDot) ... ok
test_dot_stride (__main__.TestDot) ... ok
test_dot_stride_error1 (__main__.TestDot) ... ok
test_dot_stride_error2 (__main__.TestDot) ... ok
test_matmul_incorrect_check (__main__.TestMatmul) ... ok
test_matmul_length_1 (__main__.TestMatmul) ... ok
test_matmul_negative_dim_m0_x (__main__.TestMatmul) ... ok
test_matmul_negative_dim_m0_y (__main__.TestMatmul) ... ok
test_matmul_negative_dim_m1_x (__main__.TestMatmul) ... ok
test_matmul_negative_dim_m1_y (__main__.TestMatmul) ... ok
test_matmul_nonsquare_1 (__main__.TestMatmul) ... ok
test_matmul_nonsquare_2 (__main__.TestMatmul) ... ok
test_matmul_nonsquare_outer_dims (__main__.TestMatmul) ... ok
test_matmul_square (__main__.TestMatmul) ... ok
test_matmul_unmatched_dims (__main__.TestMatmul) ... ok
test_matmul_zero_dim_m0 (__main__.TestMatmul) ... ok
test_matmul_zero_dim_m1 (__main__.TestMatmul) ... ok
test_read_1 (__main__.TestReadMatrix) ... ok
test_read_2 (__main__.TestReadMatrix) ... ok
test_read_3 (__main__.TestReadMatrix) ... ok
test_read_fail_fclose (__main__.TestReadMatrix) ... ok
test_read_fail_fopen (__main__.TestReadMatrix) ... ok
test_read_fail_fread (__main__.TestReadMatrix) ... ok
test_read_fail_malloc (__main__.TestReadMatrix) ... ok
test_relu_invalid_n (__main__.TestRelu) ... ok
test_relu_length_1 (__main__.TestRelu) ... ok
test_relu_standard (__main__.TestRelu) ... ok
test_write_1 (__main__.TestWriteMatrix) ... ok
test_write_fail_fclose (__main__.TestWriteMatrix) ... ok
test_write_fail_fopen (__main__.TestWriteMatrix) ... ok
test_write_fail_fwrite (__main__.TestWriteMatrix) ... ok
----------------------------------------------------------------------
Ran 46 tests in 48.305s
OK
```