# Lab2: Build Array from Permutation
###### tags: `Computer Architecture 2022`
## Problem Description
### Description
Given a zero-based permutation nums (0-indexed), build an array ans of the same length where ans[i] = nums[nums[i]] for each 0 <= i < nums.length and return it.
### Motivation
The form of this problem makes the person who answered the question better understand of the concept of pointer and memory access which is important in actual CPI performed in RISCV.
## C code Implementation
The original code is lack of the corresponding library and the main function. So I revised it as the code below:
```cpp=
#include <stdio.h>
#include <stdlib.h>
int* buildArray(int *nums, int numsSize, int *returnSize)
{
int c0=0;
int *p=malloc(sizeof(int) * numsSize);
for(c0=0; c0<numsSize; c0++){
p[c0] = nums[nums[c0]];
}
*returnSize = numsSize;
return p;
}
int main(void)
{
int nums[6] = {0, 2, 1, 5, 4, 3};
int* len;
int* k = buildArray(nums, 6, len);
int i;
for(i=0; i<*len; i++){
printf("%d ", k[i]);
}
printf("\n");
return 0;
}
```
## Order of Processing
### Compile
:::info
* riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O0 -o hw2_0 hw1.c
* I made hw2_1, hw2_2, hw2_3 using O1, O2, O3 optimization respectively
:::
### Size
:::info
* riscv-none-elf-size hw2_0 > hw2_0_size.txt
:::
### ELF FILE header
:::info
* riscv-none-elf-readelf -h hw2_0 > hw2_0_head.txt
:::
### Assembly code
:::info
* riscv-none-elf-objdump -d hw2_0 > asm0.txt
:::
## Compare Size
### O0
```
text data bss dec hex filename
75464 2816 812 79092 134f4 hw2_0
```
### O1
```
text data bss dec hex filename
75392 2816 812 79020 134ac hw2_1
```
### O2
```
text data bss dec hex filename
75392 2816 812 79020 134ac hw2_2
```
### O3
```
text data bss dec hex filename
75392 2816 812 79020 134ac hw2_3
```
:::info
O0->O1: slightly changed
O1->O2->O3: basically not changed
:::
## Compare ELF Header
### O0
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100dc
Start of program headers: 52 (bytes into file)
Start of section headers: 95216 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
### O1
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100dc
Start of program headers: 52 (bytes into file)
Start of section headers: 95216 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
### O2
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x10190
Start of program headers: 52 (bytes into file)
Start of section headers: 95232 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
### O3
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x10190
Start of program headers: 52 (bytes into file)
Start of section headers: 95232 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
:::info
The elf file header are almostly same, just a bit difference in the entrypoint.
:::
## Compare Assembly Code
### O0
```cpp=
00010184 <buildArray>:
10184: fd010113 addi sp,sp,-48
10188: 02112623 sw ra,44(sp)
1018c: 02812423 sw s0,40(sp)
10190: 03010413 addi s0,sp,48
10194: fca42e23 sw a0,-36(s0)
10198: fcb42c23 sw a1,-40(s0)
1019c: fcc42a23 sw a2,-44(s0)
101a0: fe042623 sw zero,-20(s0)
101a4: fd842783 lw a5,-40(s0)
101a8: 00279793 slli a5,a5,0x2
101ac: 00078513 mv a0,a5
101b0: 254000ef jal ra,10404 <malloc>
101b4: 00050793 mv a5,a0
101b8: fef42423 sw a5,-24(s0)
101bc: fe042623 sw zero,-20(s0)
101c0: 0480006f j 10208 <buildArray+0x84>
101c4: fec42783 lw a5,-20(s0)
101c8: 00279793 slli a5,a5,0x2
101cc: fdc42703 lw a4,-36(s0)
101d0: 00f707b3 add a5,a4,a5
101d4: 0007a783 lw a5,0(a5)
101d8: 00279793 slli a5,a5,0x2
101dc: fdc42703 lw a4,-36(s0)
101e0: 00f70733 add a4,a4,a5
101e4: fec42783 lw a5,-20(s0)
101e8: 00279793 slli a5,a5,0x2
101ec: fe842683 lw a3,-24(s0)
101f0: 00f687b3 add a5,a3,a5
101f4: 00072703 lw a4,0(a4)
101f8: 00e7a023 sw a4,0(a5)
101fc: fec42783 lw a5,-20(s0)
10200: 00178793 addi a5,a5,1
10204: fef42623 sw a5,-20(s0)
10208: fec42703 lw a4,-20(s0)
1020c: fd842783 lw a5,-40(s0)
10210: faf74ae3 blt a4,a5,101c4 <buildArray+0x40>
10214: fd442783 lw a5,-44(s0)
10218: fd842703 lw a4,-40(s0)
1021c: 00e7a023 sw a4,0(a5)
10220: fe842783 lw a5,-24(s0)
10224: 00078513 mv a0,a5
10228: 02c12083 lw ra,44(sp)
1022c: 02812403 lw s0,40(sp)
10230: 03010113 addi sp,sp,48
10234: 00008067 ret
00010238 <main>:
10238: fc010113 addi sp,sp,-64
1023c: 02112e23 sw ra,60(sp)
10240: 02812c23 sw s0,56(sp)
10244: 04010413 addi s0,sp,64
10248: 000227b7 lui a5,0x22
1024c: 9a478793 addi a5,a5,-1628 # 219a4 <__clzsi2+0x90>
10250: 0007a503 lw a0,0(a5)
10254: 0047a583 lw a1,4(a5)
10258: 0087a603 lw a2,8(a5)
1025c: 00c7a683 lw a3,12(a5)
10260: 0107a703 lw a4,16(a5)
10264: 0147a783 lw a5,20(a5)
10268: fca42623 sw a0,-52(s0)
1026c: fcb42823 sw a1,-48(s0)
10270: fcc42a23 sw a2,-44(s0)
10274: fcd42c23 sw a3,-40(s0)
10278: fce42e23 sw a4,-36(s0)
1027c: fef42023 sw a5,-32(s0)
10280: fcc40793 addi a5,s0,-52
10284: fe842603 lw a2,-24(s0)
10288: 00600593 li a1,6
1028c: 00078513 mv a0,a5
10290: ef5ff0ef jal ra,10184 <buildArray>
10294: fea42223 sw a0,-28(s0)
10298: fe042623 sw zero,-20(s0)
1029c: 0340006f j 102d0 <main+0x98>
102a0: fec42783 lw a5,-20(s0)
102a4: 00279793 slli a5,a5,0x2
102a8: fe442703 lw a4,-28(s0)
102ac: 00f707b3 add a5,a4,a5
102b0: 0007a783 lw a5,0(a5)
102b4: 00078593 mv a1,a5
102b8: 000227b7 lui a5,0x22
102bc: 9a078513 addi a0,a5,-1632 # 219a0 <__clzsi2+0x8c>
102c0: 209000ef jal ra,10cc8 <printf>
102c4: fec42783 lw a5,-20(s0)
102c8: 00178793 addi a5,a5,1
102cc: fef42623 sw a5,-20(s0)
102d0: fe842783 lw a5,-24(s0)
102d4: 0007a783 lw a5,0(a5)
102d8: fec42703 lw a4,-20(s0)
102dc: fcf742e3 blt a4,a5,102a0 <main+0x68>
102e0: 00a00513 li a0,10
102e4: 239000ef jal ra,10d1c <putchar>
102e8: 00000793 li a5,0
102ec: 00078513 mv a0,a5
102f0: 03c12083 lw ra,60(sp)
102f4: 03812403 lw s0,56(sp)
102f8: 04010113 addi sp,sp,64
102fc: 00008067 ret
```
### O1
```cpp=
00010184 <buildArray>:
10184: fe010113 addi sp,sp,-32
10188: 00112e23 sw ra,28(sp)
1018c: 00812c23 sw s0,24(sp)
10190: 00912a23 sw s1,20(sp)
10194: 01212823 sw s2,16(sp)
10198: 01312623 sw s3,12(sp)
1019c: 00050413 mv s0,a0
101a0: 00058913 mv s2,a1
101a4: 00060993 mv s3,a2
101a8: 00259493 slli s1,a1,0x2
101ac: 00048513 mv a0,s1
101b0: 20c000ef jal ra,103bc <malloc>
101b4: 03205863 blez s2,101e4 <buildArray+0x60>
101b8: 00040713 mv a4,s0
101bc: 00050693 mv a3,a0
101c0: 00848833 add a6,s1,s0
101c4: 00072783 lw a5,0(a4)
101c8: 00279793 slli a5,a5,0x2
101cc: 00f407b3 add a5,s0,a5
101d0: 0007a783 lw a5,0(a5)
101d4: 00f6a023 sw a5,0(a3)
101d8: 00470713 addi a4,a4,4
101dc: 00468693 addi a3,a3,4
101e0: ff0712e3 bne a4,a6,101c4 <buildArray+0x40>
101e4: 0129a023 sw s2,0(s3)
101e8: 01c12083 lw ra,28(sp)
101ec: 01812403 lw s0,24(sp)
101f0: 01412483 lw s1,20(sp)
101f4: 01012903 lw s2,16(sp)
101f8: 00c12983 lw s3,12(sp)
101fc: 02010113 addi sp,sp,32
10200: 00008067 ret
00010204 <main>:
10204: fd010113 addi sp,sp,-48
10208: 02112623 sw ra,44(sp)
1020c: 02812423 sw s0,40(sp)
10210: 02912223 sw s1,36(sp)
10214: 03212023 sw s2,32(sp)
10218: 000227b7 lui a5,0x22
1021c: 95c78793 addi a5,a5,-1700 # 2195c <__clzsi2+0x90>
10220: 0007a503 lw a0,0(a5)
10224: 0047a583 lw a1,4(a5)
10228: 0087a603 lw a2,8(a5)
1022c: 00c7a683 lw a3,12(a5)
10230: 0107a703 lw a4,16(a5)
10234: 0147a783 lw a5,20(a5)
10238: 00a12423 sw a0,8(sp)
1023c: 00b12623 sw a1,12(sp)
10240: 00c12823 sw a2,16(sp)
10244: 00d12a23 sw a3,20(sp)
10248: 00e12c23 sw a4,24(sp)
1024c: 00f12e23 sw a5,28(sp)
10250: 00000413 li s0,0
10254: 00040613 mv a2,s0
10258: 00600593 li a1,6
1025c: 00810513 addi a0,sp,8
10260: f25ff0ef jal ra,10184 <buildArray>
10264: 00042783 lw a5,0(s0)
10268: 02f05663 blez a5,10294 <main+0x90>
1026c: 00050413 mv s0,a0
10270: 00000493 li s1,0
10274: 00022937 lui s2,0x22
10278: 00042583 lw a1,0(s0)
1027c: 95890513 addi a0,s2,-1704 # 21958 <__clzsi2+0x8c>
10280: 201000ef jal ra,10c80 <printf>
10284: 00148493 addi s1,s1,1
10288: 00440413 addi s0,s0,4
1028c: 00002783 lw a5,0(zero) # 0 <exit-0x10094>
10290: fef4c4e3 blt s1,a5,10278 <main+0x74>
10294: 00a00513 li a0,10
10298: 23d000ef jal ra,10cd4 <putchar>
1029c: 00000513 li a0,0
102a0: 02c12083 lw ra,44(sp)
102a4: 02812403 lw s0,40(sp)
102a8: 02412483 lw s1,36(sp)
102ac: 02012903 lw s2,32(sp)
102b0: 03010113 addi sp,sp,48
102b4: 00008067 ret
```
### O2
```cpp=
00010238 <buildArray>:
10238: fe010113 addi sp,sp,-32
1023c: 00912a23 sw s1,20(sp)
10240: 00259493 slli s1,a1,0x2
10244: 00812c23 sw s0,24(sp)
10248: 01212823 sw s2,16(sp)
1024c: 00050413 mv s0,a0
10250: 00058913 mv s2,a1
10254: 00048513 mv a0,s1
10258: 01312623 sw s3,12(sp)
1025c: 00112e23 sw ra,28(sp)
10260: 00060993 mv s3,a2
10264: 158000ef jal ra,103bc <malloc>
10268: 03205863 blez s2,10298 <buildArray+0x60>
1026c: 00040713 mv a4,s0
10270: 00050693 mv a3,a0
10274: 00848833 add a6,s1,s0
10278: 00072783 lw a5,0(a4)
1027c: 00468693 addi a3,a3,4
10280: 00470713 addi a4,a4,4
10284: 00279793 slli a5,a5,0x2
10288: 00f407b3 add a5,s0,a5
1028c: 0007a783 lw a5,0(a5)
10290: fef6ae23 sw a5,-4(a3)
10294: fee812e3 bne a6,a4,10278 <buildArray+0x40>
10298: 01c12083 lw ra,28(sp)
1029c: 01812403 lw s0,24(sp)
102a0: 0129a023 sw s2,0(s3)
102a4: 01412483 lw s1,20(sp)
102a8: 01012903 lw s2,16(sp)
102ac: 00c12983 lw s3,12(sp)
102b0: 02010113 addi sp,sp,32
102b4: 00008067 ret
000100c4 <main>:
100c4: 000227b7 lui a5,0x22
100c8: 95c78793 addi a5,a5,-1700 # 2195c <__clzsi2+0x90>
100cc: fd010113 addi sp,sp,-48
100d0: 0007a303 lw t1,0(a5)
100d4: 0047a883 lw a7,4(a5)
100d8: 0087a803 lw a6,8(a5)
100dc: 00c7a683 lw a3,12(a5)
100e0: 0107a703 lw a4,16(a5)
100e4: 0147a783 lw a5,20(a5)
100e8: 02812423 sw s0,40(sp)
100ec: 00000413 li s0,0
100f0: 00040613 mv a2,s0
100f4: 00600593 li a1,6
100f8: 00810513 addi a0,sp,8
100fc: 00f12e23 sw a5,28(sp)
10100: 02112623 sw ra,44(sp)
10104: 02912223 sw s1,36(sp)
10108: 03212023 sw s2,32(sp)
1010c: 00612423 sw t1,8(sp)
10110: 01112623 sw a7,12(sp)
10114: 01012823 sw a6,16(sp)
10118: 00d12a23 sw a3,20(sp)
1011c: 00e12c23 sw a4,24(sp)
10120: 118000ef jal ra,10238 <buildArray>
10124: 00042783 lw a5,0(s0)
10128: 02f05663 blez a5,10154 <main+0x90>
1012c: 00050413 mv s0,a0
10130: 00000493 li s1,0
10134: 00022937 lui s2,0x22
10138: 00042583 lw a1,0(s0)
1013c: 95890513 addi a0,s2,-1704 # 21958 <__clzsi2+0x8c>
10140: 00148493 addi s1,s1,1
10144: 33d000ef jal ra,10c80 <printf>
10148: 00002783 lw a5,0(zero) # 0 <exit-0x10094>
1014c: 00440413 addi s0,s0,4
10150: fef4c4e3 blt s1,a5,10138 <main+0x74>
10154: 00a00513 li a0,10
10158: 37d000ef jal ra,10cd4 <putchar>
1015c: 02c12083 lw ra,44(sp)
10160: 02812403 lw s0,40(sp)
10164: 02412483 lw s1,36(sp)
10168: 02012903 lw s2,32(sp)
1016c: 00000513 li a0,0
10170: 03010113 addi sp,sp,48
10174: 00008067 ret
```
### O3
```cpp=
00010238 <buildArray>:
10238: fe010113 addi sp,sp,-32
1023c: 00912a23 sw s1,20(sp)
10240: 00259493 slli s1,a1,0x2
10244: 00812c23 sw s0,24(sp)
10248: 01212823 sw s2,16(sp)
1024c: 00050413 mv s0,a0
10250: 00058913 mv s2,a1
10254: 00048513 mv a0,s1
10258: 01312623 sw s3,12(sp)
1025c: 00112e23 sw ra,28(sp)
10260: 00060993 mv s3,a2
10264: 158000ef jal ra,103bc <malloc>
10268: 03205863 blez s2,10298 <buildArray+0x60>
1026c: 00040713 mv a4,s0
10270: 00050693 mv a3,a0
10274: 00848833 add a6,s1,s0
10278: 00072783 lw a5,0(a4)
1027c: 00468693 addi a3,a3,4
10280: 00470713 addi a4,a4,4
10284: 00279793 slli a5,a5,0x2
10288: 00f407b3 add a5,s0,a5
1028c: 0007a783 lw a5,0(a5)
10290: fef6ae23 sw a5,-4(a3)
10294: fee812e3 bne a6,a4,10278 <buildArray+0x40>
10298: 01c12083 lw ra,28(sp)
1029c: 01812403 lw s0,24(sp)
102a0: 0129a023 sw s2,0(s3)
102a4: 01412483 lw s1,20(sp)
102a8: 01012903 lw s2,16(sp)
102ac: 00c12983 lw s3,12(sp)
102b0: 02010113 addi sp,sp,32
102b4: 00008067 ret
000100c4 <main>:
100c4: 000227b7 lui a5,0x22
100c8: 95c78793 addi a5,a5,-1700 # 2195c <__clzsi2+0x90>
100cc: fd010113 addi sp,sp,-48
100d0: 0007a303 lw t1,0(a5)
100d4: 0047a883 lw a7,4(a5)
100d8: 0087a803 lw a6,8(a5)
100dc: 00c7a683 lw a3,12(a5)
100e0: 0107a703 lw a4,16(a5)
100e4: 0147a783 lw a5,20(a5)
100e8: 02812423 sw s0,40(sp)
100ec: 00000413 li s0,0
100f0: 00040613 mv a2,s0
100f4: 00600593 li a1,6
100f8: 00810513 addi a0,sp,8
100fc: 00f12e23 sw a5,28(sp)
10100: 02112623 sw ra,44(sp)
10104: 02912223 sw s1,36(sp)
10108: 03212023 sw s2,32(sp)
1010c: 00612423 sw t1,8(sp)
10110: 01112623 sw a7,12(sp)
10114: 01012823 sw a6,16(sp)
10118: 00d12a23 sw a3,20(sp)
1011c: 00e12c23 sw a4,24(sp)
10120: 118000ef jal ra,10238 <buildArray>
10124: 00042783 lw a5,0(s0)
10128: 02f05663 blez a5,10154 <main+0x90>
1012c: 00050413 mv s0,a0
10130: 00000493 li s1,0
10134: 00022937 lui s2,0x22
10138: 00042583 lw a1,0(s0)
1013c: 95890513 addi a0,s2,-1704 # 21958 <__clzsi2+0x8c>
10140: 00148493 addi s1,s1,1
10144: 33d000ef jal ra,10c80 <printf>
10148: 00002783 lw a5,0(zero) # 0 <exit-0x10094>
1014c: 00440413 addi s0,s0,4
10150: fef4c4e3 blt s1,a5,10138 <main+0x74>
10154: 00a00513 li a0,10
10158: 37d000ef jal ra,10cd4 <putchar>
1015c: 02c12083 lw ra,44(sp)
10160: 02812403 lw s0,40(sp)
10164: 02412483 lw s1,36(sp)
10168: 02012903 lw s2,32(sp)
1016c: 00000513 li a0,0
10170: 03010113 addi sp,sp,48
10174: 00008067 ret
```
:::info
Optimization: O0->O1->O2->O3
code lines : 97->79->79->79
register usage: 10(sp+ra+zero+a0-a5+s0)->13(sp+ra+s0-s3+a0-a6)->14(sp+ra+s0-s3+a0-a6+t1)->14(sp+ra+s0-s3+a0-a6+t1)
stack usage: 64->48->48->48
load usage: 32->20->20->20
store usage: 22->17->17->17
:::
## Observation
:::info
O0->O1 : lesser stack usage and load/store usage, more aggresive scheduling on register, and the text size slightly changed
O1->O2 : more aggresive scheduling on register, but almost the same except the usage of temporary register
O2->O3 : almost exactly the same
:::