Final Project: Adapt a non-trivial application running on MyCPU (Homework3)

# Final Project: Adapt a non-trivial application running on MyCPU (Homework3) > contribute by 林子勝 ## Non-trivial question I have decided to use this C code which operating convolution on a matrix to run on my CPU. The two challenges I expect encountering are that don't relate to 'printf' and how to fix or replace 'multiple operating' and 'square root operation'. I hope the final goal is to achieve the functionality of this program. :::spoiler goal program ```clike= #include <stdio.h> #include <math.h> int main() { int convolution_result[25] = {0}; int matrix[25] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5 }; int width = 5; int height = 5; int horizontal_kernel[3][3] = {{-1, 0, 1}, {-2, 0, 2}, {-1, 0, 1}}; int vertical_kernel[3][3] = {{-1, -2, -1}, {0, 0, 0}, {1, 2, 1}}; for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) { int horizontal_value = 0; int vertical_value = 0; for (int i = -1; i <= 1; i++) { for (int j = -1; j <= 1; j++) { int yy = y + i; int xx = x + j; if (yy >= 0 && yy < height && xx >= 0 && xx < width) { horizontal_value += matrix[yy * width + xx] * horizontal_kernel[i + 1][j + 1]; vertical_value += matrix[yy * width + xx] * vertical_kernel[i + 1][j + 1]; convolution_result[y * width + x] = sqrt(pow(horizontal_value, 2) + pow(vertical_value, 2)); } } } } } return 0; } ``` ::: ### matrix data I transform the matrix to one dimension array. ```code= int main() { int matrix[25] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5 }; return 0; } ``` --- > This is the compiled result ``` riscv-none-elf-as -R -march=rv32i_zicsr -mabi=ilp32 -o init.o init.S riscv-none-elf-gcc -O0 -Wall -march=rv32i_zicsr -mabi=ilp32 -c -o test.o test.c test.c: In function 'main': test.c:3:9: warning: unused variable 'mat' [-Wunused-variable] 3 | int mat[25] = { | ^~~ riscv-none-elf-ld -o test.elf -T link.lds --oformat=elf32-littleriscv test.o init.o riscv-none-elf-ld: test.o: in function `main': test.c:(.text+0x30): undefined reference to `memcpy' make: *** [Makefile:19：test.elf] Error 1 rm init.o ``` But I found that ```quicksort.c``` also declares a one-dimensional array but does not have this problem. After testing, I found that this problem will occur when the size of the array is greater than 16. :::spoiler quicksort.c ```clike= static void quicksort(int *arr, int l, int r) { if (l >= r) return; int pivot = arr[l]; int i = l, j = r; while (i < j) { while (arr[j] >= pivot && i < j) --j; arr[i] = arr[j]; while (arr[i] < pivot && i < j) ++i; arr[j] = arr[i]; } arr[i] = pivot; quicksort(arr, l, i - 1); quicksort(arr, i + 1, r); } int main() { int nums[10] = {6, 2, 4, 10, 5, 1, 0, 9, 7, 8}; quicksort(nums, 0, 9); for (int i = 1; i <= 10; ++i) *(volatile int *) (i * 4) = nums[i - 1]; return 0; } ``` ::: --- After searching on the Internet, I found that it seems to be related to the optimizer of the compiler, but the optimizer selected in my Makefile is already -O0. In principle, it seems that this problem should not occur? So I used another method to initialize it manually. :::spoiler code ```clike= int main() { int matrix[25]; matrix[0] = 1; matrix[1] = 2; matrix[2] = 3; matrix[3] = 4; matrix[4] = 5; matrix[5] = 6; matrix[6] = 7; matrix[7] = 8; matrix[8] = 9; matrix[9] = 0; matrix[10] = 1; matrix[11] = 2; matrix[12] = 3; matrix[13] = 4; matrix[14] = 5; matrix[15] = 6; matrix[16] = 7; matrix[17] = 8; matrix[18] = 9; matrix[19] = 0; matrix[20] = 1; matrix[21] = 2; matrix[22] = 3; matrix[23] = 4; matrix[24] = 5; return 0; } ``` ::: --- Next step, I tried to print the value in the memory when the code testing. The same problem happened... I could run expectantly on the ```quicksort.c``` , but when the array size is greater than 16, It's wrong again. I adjusted the code below, and the result show as following. :::spoiler CPUTest.scala ```scala= class QuicksortTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "perform a quicksort on 10 numbers" in { test(new TestTopModule("quicksort.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } for (i <- 1 until 10) { val address = (i * 4).U c.io.mem_debug_read_address.poke(address) c.clock.step(1) val value = c.io.mem_debug_read_data.peek().litValue println(s"Value at address $address: $value") c.io.mem_debug_read_address.poke(0.U) c.clock.step(1) } } } } ``` ::: <br> > This is the compiled result ``` Value at address UInt<3>(4): 0 Value at address UInt<4>(8): 1 Value at address UInt<4>(12): 2 Value at address UInt<5>(16): 4 Value at address UInt<5>(20): 5 Value at address UInt<5>(24): 6 Value at address UInt<5>(28): 7 Value at address UInt<6>(32): 8 Value at address UInt<6>(36): 9 ``` <br> There is when I evaluate the array which size is 25 below, all the value are 0. > This is the compiled result ``` Value at address UInt<1>(0): 0 Value at address UInt<3>(4): 0 Value at address UInt<4>(8): 0 Value at address UInt<4>(12): 0 Value at address UInt<5>(16): 0 Value at address UInt<5>(20): 0 Value at address UInt<5>(24): 0 Value at address UInt<5>(28): 0 Value at address UInt<6>(32): 0 Value at address UInt<6>(36): 0 Value at address UInt<6>(40): 0 Value at address UInt<6>(44): 0 Value at address UInt<6>(48): 0 Value at address UInt<6>(52): 0 Value at address UInt<6>(56): 0 Value at address UInt<6>(60): 0 Value at address UInt<7>(64): 0 Value at address UInt<7>(68): 0 Value at address UInt<7>(72): 0 Value at address UInt<7>(76): 0 Value at address UInt<7>(80): 0 Value at address UInt<7>(84): 0 Value at address UInt<7>(88): 0 Value at address UInt<7>(92): 0 Value at address UInt<7>(96): 0 ``` --- Before I figure out the reason, I plan to try another method first. I directly write the value to the address. :::spoiler code ```clike= int main() { *((volatile int *)4) = 1; *((volatile int *)8) = 2; *((volatile int *)12) = 3; *((volatile int *)16) = 4; *((volatile int *)20) = 5; *((volatile int *)24) = 6; *((volatile int *)28) = 7; *((volatile int *)32) = 8; *((volatile int *)36) = 9; *((volatile int *)40) = 0; *((volatile int *)44) = 1; *((volatile int *)48) = 2; *((volatile int *)52) = 3; *((volatile int *)56) = 4; *((volatile int *)60) = 5; *((volatile int *)64) = 6; *((volatile int *)68) = 7; *((volatile int *)72) = 8; *((volatile int *)76) = 9; *((volatile int *)80) = 0; *((volatile int *)84) = 1; *((volatile int *)88) = 2; *((volatile int *)92) = 3; *((volatile int *)96) = 4; *((volatile int *)100) = 5; return 0; } ``` ::: <br> Although I don't think this is a good method, at least with the pointer, I can decide where I want to check. > This is the compiled result ``` Value at address UInt<3>(4): 1 Value at address UInt<4>(8): 2 Value at address UInt<4>(12): 3 Value at address UInt<5>(16): 4 Value at address UInt<5>(20): 5 Value at address UInt<5>(24): 6 Value at address UInt<5>(28): 7 Value at address UInt<6>(32): 8 Value at address UInt<6>(36): 9 Value at address UInt<6>(40): 0 Value at address UInt<6>(44): 1 Value at address UInt<6>(48): 2 Value at address UInt<6>(52): 3 Value at address UInt<6>(56): 4 Value at address UInt<6>(60): 5 Value at address UInt<7>(64): 6 Value at address UInt<7>(68): 7 Value at address UInt<7>(72): 8 Value at address UInt<7>(76): 9 Value at address UInt<7>(80): 0 Value at address UInt<7>(84): 1 Value at address UInt<7>(88): 2 Value at address UInt<7>(92): 3 Value at address UInt<7>(96): 4 ``` ### matrix data operation Next I plan to do the convolution operation :::spoiler convolution operation ```clike= int main() { int convolution_result[25] = {0}; int matrix[25] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5 }; int width = 5; int height = 5; int horizontal_kernel[9] = {-1, 0, 1, -2, 0, 2, -1, 0, 1}; int vertical_kernel[9] = {-1, -2, -1, 0, 0, 0, 1, 2, 1}; for (int y = 1; y < height - 1; y++) { for (int x = 1; x < width - 1; x++) { int horizontal_value = 0; int vertical_value = 0; for (int ky = -1; ky <= 1; ky++) { for (int kx = -1; kx <= 1; kx++) { int pixel = matrix[(y + ky) * width + (x + kx)]; horizontal_value += pixel * horizontal_kernel[(ky + 1) * 3 + (kx + 1)]; vertical_value += pixel * vertical_kernel[(ky + 1) * 3 + (kx + 1)]; } } convolution_result[y * width + x] = horizontal_value + vertical_value; // convolution_result[y * width + x] = sqrt(horizontal_value * horizontal_value + vertical_value * vertical_value); } } return 0; } ``` ::: <br> > This is the compiled result ``` riscv-none-elf-as -R -march=rv32i_zicsr -mabi=ilp32 -o init.o init.S riscv-none-elf-gcc -O0 -Wall -march=rv32i_zicsr -mabi=ilp32 -lm -c -o test.o test.c test.c: In function 'main': test.c:3:9: warning: variable 'convolution_result' set but not used [-Wunused-but-set-variable] 3 | int convolution_result[25] = {0}; | ^~~~~~~~~~~~~~~~~~ riscv-none-elf-ld -o test.elf -T link.lds --oformat=elf32-littleriscv test.o init.o riscv-none-elf-ld: test.o: in function `main': test.c:(.text+0x24): undefined reference to `memset' riscv-none-elf-ld: test.c:(.text+0x4c): undefined reference to `memcpy' riscv-none-elf-ld: test.c:(.text+0x184): undefined reference to `__mulsi3' riscv-none-elf-ld: test.c:(.text+0x1e4): undefined reference to `__mulsi3' riscv-none-elf-ld: test.c:(.text+0x208): undefined reference to `__mulsi3' riscv-none-elf-ld: test.c:(.text+0x268): undefined reference to `__mulsi3' riscv-none-elf-ld: test.c:(.text+0x28c): undefined reference to `__mulsi3' make: *** [Makefile:19：test.elf] Error 1 ``` I guess when the compilation is completed, the ```memcpy``` ```memset``` ... function is available, but they are not linked to in the link part. So, I tried to link these function libraries, and the result seems to be workable. This also solves the previous problem of matrix assignment one by one. There is what I fixed in ``Makefile``. ```makefile=17 %.elf: %.c init.o $(CC) $(CFLAGS) -c -o $(@:.elf=.o) $< $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o \ -L/home/zisheng/riscv-none-elf-gcc/riscv-none-elf/lib -lc \ -L/home/zisheng/riscv-none-elf-gcc/lib/gcc/riscv-none-elf/13.2.0 -lgcc ``` --- Although it compiles successfully, it cannot execute on the MyCPU. That is not a feasible way. I finally decided to modify the c code. In addition to assigning values one by one, I also need to solve the problem of multiplication operations. That is my code and result below. We can see that is a feasible way. ```clike= static void convolution(int* matrix) { int temp[25]; for(int i=0; i<25; i++) { temp[i] = matrix[i]; } int horizontal_kernel[9]; horizontal_kernel[0] = 1; horizontal_kernel[1] = 0; horizontal_kernel[2] = 1; horizontal_kernel[3] = 2; horizontal_kernel[4] = 0; horizontal_kernel[5] = 2; horizontal_kernel[6] = 1; horizontal_kernel[7] = 0; horizontal_kernel[8] = 1; int vertical_kernel[9]; vertical_kernel[0] = 1; vertical_kernel[1] = 2; vertical_kernel[2] = 1; vertical_kernel[3] = 0; vertical_kernel[4] = 0; vertical_kernel[5] = 0; vertical_kernel[6] = 1; vertical_kernel[7] = 2; vertical_kernel[8] = 1; int matrix_index = 5; // Skipping the first row, starting from the first element of the second row for (int y = 1; y < 5 - 1; y++) { matrix_index++; // Skipping the first element of each row for (int x = 1; x < 5 - 1; x++) { int horizontal_value = 0; int vertical_value = 0; int kernel_index = 0; // Used for accessing convolution kernel elements int row_start_index = matrix_index - 1 - 5; // Starting index of the current row for (int ky = 0; ky <= 2; ky++) { int local_temp_index = row_start_index; for (int kx = 0; kx <= 2; kx++) { int pixel = temp[local_temp_index + kx]; // Using looped addition instead of multiplication for (int n = 0; n < horizontal_kernel[kernel_index]; n++) { horizontal_value += pixel; } for (int n = 0; n < vertical_kernel[kernel_index]; n++) { vertical_value += pixel; } kernel_index++; } row_start_index += 5; // Moving to the next row } matrix[matrix_index] = horizontal_value + vertical_value; // matrix[matrix_index] = sqrt(horizontal_value*horizontal_value + vertical_value*vertical_value); matrix_index++; // Moving to the next element } matrix_index++; // Skipping the last element of each row } } int main() { int matrix[25]; matrix[0] = 1; matrix[1] = 2; matrix[2] = 3; matrix[3] = 4; matrix[4] = 5; matrix[5] = 6; matrix[6] = 7; matrix[7] = 8; matrix[8] = 9; matrix[9] = 0; matrix[10] = 1; matrix[11] = 2; matrix[12] = 3; matrix[13] = 4; matrix[14] = 5; matrix[15] = 6; matrix[16] = 7; matrix[17] = 8; matrix[18] = 9; matrix[19] = 0; matrix[20] = 1; matrix[21] = 2; matrix[22] = 3; matrix[23] = 4; matrix[24] = 5; convolution(matrix); *(volatile int *)4 = matrix[6]; *(volatile int *)8 = matrix[7]; *(volatile int *)12 = matrix[8]; *(volatile int *)16 = matrix[11]; *(volatile int *)20 = matrix[12]; *(volatile int *)24 = matrix[13]; *(volatile int *)28 = matrix[16]; *(volatile int *)32 = matrix[17]; *(volatile int *)36 = matrix[18]; return 0; } ``` ```command= [info] welcome to sbt 1.9.7 (Ubuntu Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /home/zisheng/Computer_Architecture_11201/homework_3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/zisheng/Computer_Architecture_11201/homework_3/ca2023-lab3/) [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly Value at address UInt<3>(4): 52 Value at address UInt<4>(8): 68 Value at address UInt<4>(12): 64 Value at address UInt<5>(16): 92 Value at address UInt<5>(20): 108 Value at address UInt<5>(24): 84 Value at address UInt<5>(28): 52 Value at address UInt<6>(32): 68 Value at address UInt<6>(36): 64 [info] test: [info] test to evaluate [info] - should perform a 9 numbers matrix [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] Run completed in 17 seconds, 915 milliseconds. [info] Total number of tests run: 7 [info] Suites: completed 5, aborted 0 [info] Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 19 s, completed Jan 14, 2024, 1:19:47 AM ``` --- ### waveform We can see that the value of our calculated result is correctly placed at the specified address. ![image](https://hackmd.io/_uploads/BkiRcBxFT.png)