# Final Project: Adapt a non-trivial application running on MyCPU (Homework3)
> contribute by 林子勝
## Non-trivial question
I have decided to use this C code which operating convolution on a matrix to run on my CPU. The two challenges I expect encountering are that don't relate to 'printf' and how to fix or replace 'multiple operating' and 'square root operation'.
I hope the final goal is to achieve the functionality of this program.
:::spoiler goal program
```clike=
#include <stdio.h>
#include <math.h>
int main() {
int convolution_result[25] = {0};
int matrix[25] = {
1, 2, 3, 4, 5,
6, 7, 8, 9, 0,
1, 2, 3, 4, 5,
6, 7, 8, 9, 0,
1, 2, 3, 4, 5
};
int width = 5;
int height = 5;
int horizontal_kernel[3][3] = {{-1, 0, 1}, {-2, 0, 2}, {-1, 0, 1}};
int vertical_kernel[3][3] = {{-1, -2, -1}, {0, 0, 0}, {1, 2, 1}};
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
int horizontal_value = 0;
int vertical_value = 0;
for (int i = -1; i <= 1; i++) {
for (int j = -1; j <= 1; j++) {
int yy = y + i;
int xx = x + j;
if (yy >= 0 && yy < height && xx >= 0 && xx < width) {
horizontal_value += matrix[yy * width + xx] * horizontal_kernel[i + 1][j + 1];
vertical_value += matrix[yy * width + xx] * vertical_kernel[i + 1][j + 1];
convolution_result[y * width + x] = sqrt(pow(horizontal_value, 2) + pow(vertical_value, 2));
}
}
}
}
}
return 0;
}
```
:::
### matrix data
I transform the matrix to one dimension array.
```code=
int main()
{
int matrix[25] = {
1, 2, 3, 4, 5,
6, 7, 8, 9, 0,
1, 2, 3, 4, 5,
6, 7, 8, 9, 0,
1, 2, 3, 4, 5
};
return 0;
}
```
---
> This is the compiled result
```
riscv-none-elf-as -R -march=rv32i_zicsr -mabi=ilp32 -o init.o init.S
riscv-none-elf-gcc -O0 -Wall -march=rv32i_zicsr -mabi=ilp32 -c -o test.o test.c
test.c: In function 'main':
test.c:3:9: warning: unused variable 'mat' [-Wunused-variable]
3 | int mat[25] = {
| ^~~
riscv-none-elf-ld -o test.elf -T link.lds --oformat=elf32-littleriscv test.o init.o
riscv-none-elf-ld: test.o: in function `main':
test.c:(.text+0x30): undefined reference to `memcpy'
make: *** [Makefile:19:test.elf] Error 1
rm init.o
```
But I found that ```quicksort.c``` also declares a one-dimensional array but does not have this problem. After testing, I found that this problem will occur when the size of the array is greater than 16.
:::spoiler quicksort.c
```clike=
static void quicksort(int *arr, int l, int r)
{
if (l >= r)
return;
int pivot = arr[l];
int i = l, j = r;
while (i < j) {
while (arr[j] >= pivot && i < j)
--j;
arr[i] = arr[j];
while (arr[i] < pivot && i < j)
++i;
arr[j] = arr[i];
}
arr[i] = pivot;
quicksort(arr, l, i - 1);
quicksort(arr, i + 1, r);
}
int main()
{
int nums[10] = {6, 2, 4, 10, 5, 1, 0, 9, 7, 8};
quicksort(nums, 0, 9);
for (int i = 1; i <= 10; ++i)
*(volatile int *) (i * 4) = nums[i - 1];
return 0;
}
```
:::
---
After searching on the Internet, I found that it seems to be related to the optimizer of the compiler, but the optimizer selected in my Makefile is already -O0. In principle, it seems that this problem should not occur? So I used another method to initialize it manually.
:::spoiler code
```clike=
int main()
{
int matrix[25];
matrix[0] = 1;
matrix[1] = 2;
matrix[2] = 3;
matrix[3] = 4;
matrix[4] = 5;
matrix[5] = 6;
matrix[6] = 7;
matrix[7] = 8;
matrix[8] = 9;
matrix[9] = 0;
matrix[10] = 1;
matrix[11] = 2;
matrix[12] = 3;
matrix[13] = 4;
matrix[14] = 5;
matrix[15] = 6;
matrix[16] = 7;
matrix[17] = 8;
matrix[18] = 9;
matrix[19] = 0;
matrix[20] = 1;
matrix[21] = 2;
matrix[22] = 3;
matrix[23] = 4;
matrix[24] = 5;
return 0;
}
```
:::
---
Next step, I tried to print the value in the memory when the code testing. The same problem happened... I could run expectantly on the ```quicksort.c``` , but when the array size is greater than 16, It's wrong again.
I adjusted the code below, and the result show as following.
:::spoiler CPUTest.scala
```scala=
class QuicksortTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "perform a quicksort on 10 numbers" in {
test(new TestTopModule("quicksort.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
for (i <- 1 until 10) {
val address = (i * 4).U
c.io.mem_debug_read_address.poke(address)
c.clock.step(1)
val value = c.io.mem_debug_read_data.peek().litValue
println(s"Value at address $address: $value")
c.io.mem_debug_read_address.poke(0.U)
c.clock.step(1)
}
}
}
}
```
:::
<br>
> This is the compiled result
```
Value at address UInt<3>(4): 0
Value at address UInt<4>(8): 1
Value at address UInt<4>(12): 2
Value at address UInt<5>(16): 4
Value at address UInt<5>(20): 5
Value at address UInt<5>(24): 6
Value at address UInt<5>(28): 7
Value at address UInt<6>(32): 8
Value at address UInt<6>(36): 9
```
<br>
There is when I evaluate the array which size is 25 below, all the value are 0.
> This is the compiled result
```
Value at address UInt<1>(0): 0
Value at address UInt<3>(4): 0
Value at address UInt<4>(8): 0
Value at address UInt<4>(12): 0
Value at address UInt<5>(16): 0
Value at address UInt<5>(20): 0
Value at address UInt<5>(24): 0
Value at address UInt<5>(28): 0
Value at address UInt<6>(32): 0
Value at address UInt<6>(36): 0
Value at address UInt<6>(40): 0
Value at address UInt<6>(44): 0
Value at address UInt<6>(48): 0
Value at address UInt<6>(52): 0
Value at address UInt<6>(56): 0
Value at address UInt<6>(60): 0
Value at address UInt<7>(64): 0
Value at address UInt<7>(68): 0
Value at address UInt<7>(72): 0
Value at address UInt<7>(76): 0
Value at address UInt<7>(80): 0
Value at address UInt<7>(84): 0
Value at address UInt<7>(88): 0
Value at address UInt<7>(92): 0
Value at address UInt<7>(96): 0
```
---
Before I figure out the reason, I plan to try another method first. I directly write the value to the address.
:::spoiler code
```clike=
int main()
{
*((volatile int *)4) = 1;
*((volatile int *)8) = 2;
*((volatile int *)12) = 3;
*((volatile int *)16) = 4;
*((volatile int *)20) = 5;
*((volatile int *)24) = 6;
*((volatile int *)28) = 7;
*((volatile int *)32) = 8;
*((volatile int *)36) = 9;
*((volatile int *)40) = 0;
*((volatile int *)44) = 1;
*((volatile int *)48) = 2;
*((volatile int *)52) = 3;
*((volatile int *)56) = 4;
*((volatile int *)60) = 5;
*((volatile int *)64) = 6;
*((volatile int *)68) = 7;
*((volatile int *)72) = 8;
*((volatile int *)76) = 9;
*((volatile int *)80) = 0;
*((volatile int *)84) = 1;
*((volatile int *)88) = 2;
*((volatile int *)92) = 3;
*((volatile int *)96) = 4;
*((volatile int *)100) = 5;
return 0;
}
```
:::
<br>
Although I don't think this is a good method, at least with the pointer, I can decide where I want to check.
> This is the compiled result
```
Value at address UInt<3>(4): 1
Value at address UInt<4>(8): 2
Value at address UInt<4>(12): 3
Value at address UInt<5>(16): 4
Value at address UInt<5>(20): 5
Value at address UInt<5>(24): 6
Value at address UInt<5>(28): 7
Value at address UInt<6>(32): 8
Value at address UInt<6>(36): 9
Value at address UInt<6>(40): 0
Value at address UInt<6>(44): 1
Value at address UInt<6>(48): 2
Value at address UInt<6>(52): 3
Value at address UInt<6>(56): 4
Value at address UInt<6>(60): 5
Value at address UInt<7>(64): 6
Value at address UInt<7>(68): 7
Value at address UInt<7>(72): 8
Value at address UInt<7>(76): 9
Value at address UInt<7>(80): 0
Value at address UInt<7>(84): 1
Value at address UInt<7>(88): 2
Value at address UInt<7>(92): 3
Value at address UInt<7>(96): 4
```
### matrix data operation
Next I plan to do the convolution operation
:::spoiler convolution operation
```clike=
int main() {
int convolution_result[25] = {0};
int matrix[25] = {
1, 2, 3, 4, 5,
6, 7, 8, 9, 0,
1, 2, 3, 4, 5,
6, 7, 8, 9, 0,
1, 2, 3, 4, 5
};
int width = 5;
int height = 5;
int horizontal_kernel[9] = {-1, 0, 1,
-2, 0, 2,
-1, 0, 1};
int vertical_kernel[9] = {-1, -2, -1,
0, 0, 0,
1, 2, 1};
for (int y = 1; y < height - 1; y++) {
for (int x = 1; x < width - 1; x++) {
int horizontal_value = 0;
int vertical_value = 0;
for (int ky = -1; ky <= 1; ky++) {
for (int kx = -1; kx <= 1; kx++) {
int pixel = matrix[(y + ky) * width + (x + kx)];
horizontal_value += pixel * horizontal_kernel[(ky + 1) * 3 + (kx + 1)];
vertical_value += pixel * vertical_kernel[(ky + 1) * 3 + (kx + 1)];
}
}
convolution_result[y * width + x] = horizontal_value + vertical_value;
// convolution_result[y * width + x] = sqrt(horizontal_value * horizontal_value + vertical_value * vertical_value);
}
}
return 0;
}
```
:::
<br>
> This is the compiled result
```
riscv-none-elf-as -R -march=rv32i_zicsr -mabi=ilp32 -o init.o init.S
riscv-none-elf-gcc -O0 -Wall -march=rv32i_zicsr -mabi=ilp32 -lm -c -o test.o test.c
test.c: In function 'main':
test.c:3:9: warning: variable 'convolution_result' set but not used [-Wunused-but-set-variable]
3 | int convolution_result[25] = {0};
| ^~~~~~~~~~~~~~~~~~
riscv-none-elf-ld -o test.elf -T link.lds --oformat=elf32-littleriscv test.o init.o
riscv-none-elf-ld: test.o: in function `main':
test.c:(.text+0x24): undefined reference to `memset'
riscv-none-elf-ld: test.c:(.text+0x4c): undefined reference to `memcpy'
riscv-none-elf-ld: test.c:(.text+0x184): undefined reference to `__mulsi3'
riscv-none-elf-ld: test.c:(.text+0x1e4): undefined reference to `__mulsi3'
riscv-none-elf-ld: test.c:(.text+0x208): undefined reference to `__mulsi3'
riscv-none-elf-ld: test.c:(.text+0x268): undefined reference to `__mulsi3'
riscv-none-elf-ld: test.c:(.text+0x28c): undefined reference to `__mulsi3'
make: *** [Makefile:19:test.elf] Error 1
```
I guess when the compilation is completed, the ```memcpy``` ```memset``` ... function is available, but they are not linked to in the link part. So, I tried to link these function libraries, and the result seems to be workable. This also solves the previous problem of matrix assignment one by one.
There is what I fixed in ``Makefile``.
```makefile=17
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o \
-L/home/zisheng/riscv-none-elf-gcc/riscv-none-elf/lib -lc \
-L/home/zisheng/riscv-none-elf-gcc/lib/gcc/riscv-none-elf/13.2.0 -lgcc
```
---
Although it compiles successfully, it cannot execute on the MyCPU. That is not a feasible way. I finally decided to modify the c code. In addition to assigning values one by one, I also need to solve the problem of multiplication operations. That is my code and result below. We can see that is a feasible way.
```clike=
static void convolution(int* matrix) {
int temp[25];
for(int i=0; i<25; i++)
{
temp[i] = matrix[i];
}
int horizontal_kernel[9];
horizontal_kernel[0] = 1;
horizontal_kernel[1] = 0;
horizontal_kernel[2] = 1;
horizontal_kernel[3] = 2;
horizontal_kernel[4] = 0;
horizontal_kernel[5] = 2;
horizontal_kernel[6] = 1;
horizontal_kernel[7] = 0;
horizontal_kernel[8] = 1;
int vertical_kernel[9];
vertical_kernel[0] = 1;
vertical_kernel[1] = 2;
vertical_kernel[2] = 1;
vertical_kernel[3] = 0;
vertical_kernel[4] = 0;
vertical_kernel[5] = 0;
vertical_kernel[6] = 1;
vertical_kernel[7] = 2;
vertical_kernel[8] = 1;
int matrix_index = 5; // Skipping the first row, starting from the first element of the second row
for (int y = 1; y < 5 - 1; y++) {
matrix_index++; // Skipping the first element of each row
for (int x = 1; x < 5 - 1; x++) {
int horizontal_value = 0;
int vertical_value = 0;
int kernel_index = 0; // Used for accessing convolution kernel elements
int row_start_index = matrix_index - 1 - 5; // Starting index of the current row
for (int ky = 0; ky <= 2; ky++) {
int local_temp_index = row_start_index;
for (int kx = 0; kx <= 2; kx++) {
int pixel = temp[local_temp_index + kx];
// Using looped addition instead of multiplication
for (int n = 0; n < horizontal_kernel[kernel_index]; n++) {
horizontal_value += pixel;
}
for (int n = 0; n < vertical_kernel[kernel_index]; n++) {
vertical_value += pixel;
}
kernel_index++;
}
row_start_index += 5; // Moving to the next row
}
matrix[matrix_index] = horizontal_value + vertical_value;
// matrix[matrix_index] = sqrt(horizontal_value*horizontal_value + vertical_value*vertical_value);
matrix_index++; // Moving to the next element
}
matrix_index++; // Skipping the last element of each row
}
}
int main() {
int matrix[25];
matrix[0] = 1;
matrix[1] = 2;
matrix[2] = 3;
matrix[3] = 4;
matrix[4] = 5;
matrix[5] = 6;
matrix[6] = 7;
matrix[7] = 8;
matrix[8] = 9;
matrix[9] = 0;
matrix[10] = 1;
matrix[11] = 2;
matrix[12] = 3;
matrix[13] = 4;
matrix[14] = 5;
matrix[15] = 6;
matrix[16] = 7;
matrix[17] = 8;
matrix[18] = 9;
matrix[19] = 0;
matrix[20] = 1;
matrix[21] = 2;
matrix[22] = 3;
matrix[23] = 4;
matrix[24] = 5;
convolution(matrix);
*(volatile int *)4 = matrix[6];
*(volatile int *)8 = matrix[7];
*(volatile int *)12 = matrix[8];
*(volatile int *)16 = matrix[11];
*(volatile int *)20 = matrix[12];
*(volatile int *)24 = matrix[13];
*(volatile int *)28 = matrix[16];
*(volatile int *)32 = matrix[17];
*(volatile int *)36 = matrix[18];
return 0;
}
```
```command=
[info] welcome to sbt 1.9.7 (Ubuntu Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/zisheng/Computer_Architecture_11201/homework_3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/zisheng/Computer_Architecture_11201/homework_3/ca2023-lab3/)
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
Value at address UInt<3>(4): 52
Value at address UInt<4>(8): 68
Value at address UInt<4>(12): 64
Value at address UInt<5>(16): 92
Value at address UInt<5>(20): 108
Value at address UInt<5>(24): 84
Value at address UInt<5>(28): 52
Value at address UInt<6>(32): 68
Value at address UInt<6>(36): 64
[info] test:
[info] test to evaluate
[info] - should perform a 9 numbers matrix
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 17 seconds, 915 milliseconds.
[info] Total number of tests run: 7
[info] Suites: completed 5, aborted 0
[info] Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 19 s, completed Jan 14, 2024, 1:19:47 AM
```
---
### waveform
We can see that the value of our calculated result is correctly placed at the specified address.
