MoonChin
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Assignment2: Complete Applications Contributed by <[moonchin819](https://github.com/moonchin819)> ## Implementation Here's the [source code](https://github.com/moonchin819/computer_architecture_hw2) ## [Homework 1 uf8](https://hackmd.io/@sysprog/arch2025-quiz1-sol#Problem-B) To make the program from First Quiz Problem B run in a bare-metal environment using rv32emu’s system emulation, we need five files. * Makefile - To specify the cross compiler and link script (linker.ld) * Define all .S/.c files to be compiled in the project * Control build target : `all`, `clean`, `run`, `dump`, etc... * .S, C programs → link → generate elf file which can be executed. * linker.ld - Define memory layout of the program * To tell linker * Where to put .text(program) * .data (initialized data) * .bss (Uninitialized global variables) * Where are the stack and heap * Replace the operating system loader in bare-metal Because there's no OS to help allocate memory, so this linker.ld file is the only one which tells CPU where to start, matching the `_start` entry point in start.S file. * start.S - Startup code for the project, responsible for all the initialization * `_start` entry point assigned by linker.ld * Initialize stack pointer, clear .bss section * Call `main()` after setting * perfcounter.S - Performance counters * To read CSR counter of RISC-V * Help measure execution information in the situation without OS. * main.c * Provide function `main()` and called by start.S * As a high-level control, to call the assembly language I write * Can also call the function provided by `perfcounter.S` to measure the efficacy. * hw2.S - Main algorithm implementation * Include all the assembly code related to the quiz, assignment, or whatever the project is going to deal with. * We usually define it as a function, so that it could be called by `main.c` * Might include some bitwise operation and results printing. ### Bare-metal programming I have to modify the original program. In the first line, `.section .text` or `.text` should be added. It tells compiler and linker, the following instructions belong to "executable code section" of the program. Under the OS like Linux, compiler and linker will automatically put the program into the section such as `.text`, `.data`, `.bss`. But in bare-metal environment, there's no OS and standart startup code, so we need to clearly tell the compiler and linker "which section is code, and which is data" :::success I only show the modified parts from Homework 1 Problem B, so the encoding part will not be present since the changes are same as decoding part. ::: ```asm .section .text clz: li t0, 32 # n = 32, c = 16 srli t2, a0, 16 # y = x >> 16 beqz t2, c_equal_8 # branch if y is equal to zero addi t0, t0, -16 # n -= 16 add a0, t2, x0 # x = y c_equal_8: srli t2, a0, 8 # y = x >> 8 beqz t2, c_equal_4 addi t0, t0, -8 # n -= 8 add a0, t2, x0 # x = y c_equal_4: srli t2, a0, 4 # y = x >> 4 beqz t2, c_equal_2 addi t0, t0, -4 # n -= 4 add a0, t2, x0 # x = y c_equal_2: srli t2, a0, 2 # y = x >> 2 beqz t2, c_equal_1 addi t0, t0, -2 # n -= 2 add a0, t2, x0 # x = y c_equal_1: srli t2, a0, 1 # y = x >> 1 beqz t2, iter_end addi t0, t0, -1 # n -= 1 add a0, t2, x0 # x = y iter_end: sub a0, t0, a0 # return n - x jr ra offset_table: .word 0, 16, 48, 112, 240, 496, 1008, 2032, 4080, 8176, 16368, 32752, 65520, 131056, 262128, 524272 ``` `.type uf8_decode, %function` is a symbol type label for the linker and debugger. It tells linker that symbol `uf8_decode` represents a "function", but not "variable" The instruction `.align 2` aligns the starting address of the following code to a 2^2 = 4 bytes boundary. * Because the length of each instruction in RISC-V is 4 bytes, if the starting address of the function didn't align to 4 byte, CPU might occur some misaligned fetch exception. * So, here we use `.align 2`, since it's the safest, and most common approach. * It can also avoid unnecessary instruction fetching overhead. ```asm .globl uf8_decode .type uf8_decode,%function .align 2 uf8_decode: andi t0, a0, 0x0F # mantissa = f1 & 0xF srli t1, a0, 4 # exponent = f1 >> 4 la t2, offset_table slli t3, t1, 2 # offset << 4 add t2, t2, t3 lw t2, 0(t2) sll t0, t0, t1 # mantissa << exponent add a0, t0, t2 # mantissa + offset jr ra .size uf8_decode,.-uf8_decode ``` At the end of the code, I add `.size uf8_decode,.-uf8_encode`, although it is unnecessary line. `.size` is a symbol table information instruction for the linker, it tells linker the length of the function, that is, from where to where. ### Test Result ```tex 16:18:00 INFO src/main.c:315: RISC-V emulator is created and ready to run hw1b: PASSED Cycles: 28490 Instructions: 28490 === All Tests Completed ===16:18:00 INFO src/main.c:338: RISC-V emulator is destroyed ``` ## [Quiz 2 Problem A](https://hackmd.io/@sysprog/arch2025-quiz2-sol#Problem-A) ### Hanoi Tower In the game loop, we need to iterate 8 steps, because 3 disks for hanoi tower need 2^3-1=7 steps, with initial state we have 8 conditions in total. ```asm addi x8, x0, 1 # Counter starts from 1 game_loop: # BLANK 4: Check loop termination (2^3 moves). addi x5, x0, 8 # 8 steps beq x8, x5, finish_game # If x8 == 8, skip to the end ``` Gray Code calculation : It's an binary encoding, the number nearby has only one bit difference, which is used for deciding which disk to move every time. ```asm # BLANK 5: What is k for Gray code? srli x5, x8, 1 # BLANK 6: Complete Gray(n) calculation xor x6, x8, x5 ``` Gray(n) XOR Gray(n-1) will get the number with only one "1", it indicates which disk has to move. ```asm addi x7, x8, -1 srli x28, x7, 1 # BLANK 9: Generate Gray(n-1) xor x7, x7, x28 # BLANK 10: Which bits changed? xor x5, x6, x7 ``` To determine which disk to move, first we set the number of the disk to default 0. ```asm addi x9, x0, 0 # BLANK 11: Mask for testing LSB andi x6, x5, 1 # BLANK 12: Branch if disk 0 moves bne x6, x0, disk_found # BLANK 13: Set disk 1 addi x9, x0, 1 # BLANK 14: Test second bit with proper mask andi x6, x5, 2 bne x6, x0, disk_found # BLANK 15: Last disk number addi x9, x0, 2 ``` It's wrong if we have bit 0 and bit 2 at the same time. But this will not happen under normal circumstances. ```asm disk_found: # BLANK 16: Check impossible pattern (multiple bits) andi x30, x5, 5 addi x31, x0, 5 beq x30, x31, pattern_match jal x0, continue_move ``` To calculate the position of the target, we load the current position, and then we calculate the new position. ```asm continue_move: # BLANK 17: Word-align disk index (multiply by what?) slli x5, x9, 2 # BLANK 18: Base offset for disk array addi x5, x5, 20 add x5, x2, x5 lw x18, 0(x5) ``` Calculate the new place for smallest disk 0. The smallest disk will always jump over a peg when moved, and that's why add 2 in the second instruction here. The fourth instruction means that if the new position is less than 3, we don't have to adjust that, otherwise, we keep on to the next line to subtract 3 to adjust. (Because we only have three pegs.) After the adjustment, we can move on to the display the movement. ```asm bne x9, x0, handle_large # BLANK 19: Small disk moves by how many positions? addi x19, x18, 2 # BLANK 20: Number of pegs addi x6, x0, 3 blt x19, x6, display_move sub x19, x19, x6 jal x0, display_move ``` For two larger disks : disk 1 and disk 2. Move the large disks to the only peg which is not occupied by the smallest disk. ```asm handle_large: # BLANK 21: Load reference disk position lw x6, 20(x2) # BLANK 22: Sum of all peg indices (0+1+2) addi x19, x0, 3 sub x19, x19, x18 sub x19, x19, x6 ``` ```asm display_move: la x20, obdata # Load the address of obdata section to x20 add x5, x20, x18 # Calculate the address of the source peg # x5 points to the item number "x18" in obdata array # BLANK 23: Load byte from obfuscated data lbu x11, 0(x5) # Load 1 byte, which means the lowest 8 bits+ # BLANK 24: Decode constant (0x6F) li x6, 0x6F # Load decode constant 0x6F xor x11, x11, x6 # Decoding process。The origin obdata was fused,so we need to revert in to actual ascii code. ``` Same process as above to decode target peg, x12 here is the ascii code of target peg. ```asm add x7, x20, x19 lbu x12, 0(x7) xor x12, x12, x6 addi x12, x12, -0x12 ``` Here we calculate the offset in a stack, to record "Where is each disk currently located in peg ? " ```asm # BLANK 26: Calculate storage offset slli x5, x9, 2 addi x5, x5, 20 add x5, x2, x5 ``` ### Test Result ```tex 16:19:03 INFO src/main.c:315: RISC-V emulator is created and ready to run =====Hanoi Tower Test===== Origin code test with only print format adjustment. Move Disk 1 from A to C Move Disk 2 from A to B Move Disk 1 from C to B Move Disk 3 from A to C Move Disk 1 from B to A Move Disk 2 from B to C Move Disk 1 from A to C Cycles: 674 Instructions: 674 === Test Completed ===16:19:03 INFO src/main.c:338: RISC-V emulator is destroyed ``` ## [Quiz 3 Problem C](https://hackmd.io/@sysprog/arch2025-quiz3-sol#Problem-C) Fixed-point format `Actual value = Integer value / 2^N` * 2^0^ : No scaling * 2^16^ : 16 times scaling * 2^32^ : 32 times scaling Because y represents 1/sqrt(x), theoretically, its range is (0, 1], and scaled by 2^16^ the range become [0, 65536]. So it's stored as uint32_t, but the valid values are all below 17 bits. ```c static uint64_t mul32(uint32_t a, uint32_t b) { uint64_t r = 0; for (int i = 0; i < 32; i++){ if (b & (1U << i)) r += (uint64_t)a << i; } return r; } ``` ### rsqrt_org.h ```cpp #ifndef RSQRT_ORG // if not defined RSQRT_ORG (if RSQRT macro isn't defined yet). To check if this header file has already been included. #define RSQRT_ORG // define the macro RSQRT_ORG(its content is empty), indicates that the header file has already been included. //If the file is included again the following time, #ifndef will check that it's already defined and skip the entire file's contents. static uint64_t mul32(uint32_t a, uint32_t b); static int clz(uint32_t x); uint32_t fast_rsqrt(uint32_t x); #endif ``` In `main.c` file, we have to use the function `TEST_OUTPUT` to print the number from the table, so I wrote this simple function to calculate the length of the string. ```c static uint32_t str_len(const char *s) { const char *p = s; while (*p) p++; return p - s; } ``` ```tex 16:23:42 INFO src/main.c:315: RISC-V emulator is created and ready to run =====Fast Reciprocal sqrt Test===== Original version fast_rsqrt(1) = 65536Exact: 65536), Approx. Error = 0%, Remainder = 0/6553600 Cycles: 61 Instructions: 61 fast_rsqrt(4) = 32768Exact: 32768), Approx. Error = 0%, Remainder = 0/3276800 Cycles: 2783 Instructions: 2783 fast_rsqrt(16) = 16384Exact: 16384), Approx. Error = 0%, Remainder = 0/1638400 Cycles: 2789 Instructions: 2789 fast_rsqrt(24) = 13377Exact: 13377.48), Approx. Error = 0%, Remainder = 4800/1337748 Cycles: 4203 Instructions: 4203 fast_rsqrt(87) = 7026Exact: 7026.2), Approx. Error = 0%, Remainder = 2000/702620 Cycles: 4432 Instructions: 4432 fast_rsqrt(269) = 3995Exact: 3995.8), Approx. Error = 0%, Remainder = 8000/399580 Cycles: 4375 Instructions: 4375 fast_rsqrt(666) = 2539Exact: 2539.47), Approx. Error = 0%, Remainder = 4700/253947 Cycles: 4331 Instructions: 4331 Total Cycles: 236297 Total Instructions: 236297 === All Test Passed ===16:23:42 INFO src/main.c:338: RISC-V emulator is destroyed ``` ### -Os (Optimized for size) I encounter this error while changing Makefile for optimized for size. This error might relate to the implementation way of 64-bit operation under RV32 environment. In `-march=rv32i` environment, the processor only have 32-bit integer register, so there's no native arithmetic operations for 64-bit. Therefore, if I utilize arithmetic operations for `uint64_t`, the compiler will automatically insert function like `__ashldi.3`. ```tex riscv-none-elf-ld: rsqrt_org.o: in function `fast_rsqrt': /home/david0819/riscv-none-elf-gcc/rv32emu/calHW2/quiz3c/rsqrt_org.c:48:(.text+0xf4): undefined reference to `__lshrdi3' riscv-none-elf-ld: rsqrt_org.o: in function `mul32': /home/david0819/riscv-none-elf-gcc/rv32emu/calHW2/quiz3c/rsqrt_org.c:32:(.text+0x140): undefined reference to `__ashldi3' /home/david0819/riscv-none-elf-gcc/rv32emu/calHW2/quiz3c/rsqrt_org.c:32:(.text+0x1d0): undefined reference to `__ashldi3' make: *** [Makefile:26: test.elf] Error 1 riscv-none-elf-ld -T linker.ld -o test.elf start.o main.o perfcounter.o rsqrt_org.o riscv-none-elf-ld: warning: test.elf has a LOAD segment with RWX permissions riscv-none-elf-ld: rsqrt_org.o: in function `fast_rsqrt': /home/david0819/riscv-none-elf-gcc/rv32emu/calHW2/quiz3c/rsqrt_org.c:48:(.text+0xf4): undefined reference to `__lshrdi3' riscv-none-elf-ld: rsqrt_org.o: in function `mul32': /home/david0819/riscv-none-elf-gcc/rv32emu/calHW2/quiz3c/rsqrt_org.c:32:(.text+0x140): undefined reference to `__ashldi3' /home/david0819/riscv-none-elf-gcc/rv32emu/calHW2/quiz3c/rsqrt_org.c:32:(.text+0x1d0): undefined reference to `__ashldi3' make: *** [Makefile:26: test.elf] Error 1 ``` That means I can only use 32-bit operation for this ```c static void mul32(uint64_result_t *result, uint32_t a, uint32_t b) // can't use uint64_t for size optimization { result->lo = 0; // initialization result->hi = 0; uint32_t a_lo = a; // low 32 bits of a uint32_t a_hi = 0; // high 32 bits of a, it's 0 at the beginning because 'a' have only 32 bits // so 'a' is then seen as 64-bit number for (int i = 0; i < 32; i++){ // check every single bit of 'b' from low to high if (b & 1U){ // check if the LSB of 'b' is 1 uint32_t old_lo = result->lo; // save previous results of result->lo result->lo += a_lo; // if true, this indicates that we should add " the currently shifted 'a' " if (result->lo < old_lo){ // carry check, if result->lo become smaller after adding a_lo // it means overflow has occured result->hi++; // so we have to add the carry to high 32 bits } result->hi += a_hi; // add high 32 bits of a to high 32 bits of the result } // then shift left 'a' to prepare for the next round a_hi = (a_hi << 1) | (a_lo >> 31); // this step puts the MSB of a_lo to the LSB of a_hi a_lo = a_lo << 1; b >>= 1; // also shift right 'b' for the next round } ``` ```c uint32_t fast_rsqrt(uint32_t x) { if (x == 0) return 0xFFFFFFFF; // Return Inf if (x == 1) return 65536; int exp = 31 - clz(x); uint32_t y = rsqrt_table[exp]; if (x > (1u << exp)) { uint32_t y_next = (exp < 31) ? rsqrt_table[exp + 1] : 0; uint32_t delta = y - y_next; // uint32_t frac = (uint32_t) ((((uint64_t)x - (1UL << exp)) << 16) >> exp); // 改寫 uint32_t frac = ((x - (1u << exp)) << 16) >> exp; y -= (uint32_t) ((delta * frac) >> 16); } for (int iter = 0; iter < 2; iter++){ uint64_result_t y2_result; mul32(&y2_result, y, y); uint32_t y2 = y2_result.lo; uint64_result_t xy2_temp; mul32(&xy2_temp, x, y2); uint32_t xy2 = (xy2_temp.hi << 16) | (xy2_temp.lo >> 16); uint64_result_t y_temp; mul32(&y_temp, y, (3u << 16) - xy2); uint32_t result = (y_temp.hi << 15) | (y_temp.lo >> 17); y = result; } return y; } ``` That means in some optimization modes (e.g. -Os、-Ofast), gcc will automatically use uint64_t for simulation, which isn't provided in bare-metal. ```tex =====Fast Reciprocal sqrt Test===== Original version fast_rsqrt(1) = 65536(Exact: 65536), Approx. Error = 0%, Remainder = 0/6553600 Cycles: 43 Instructions: 43 fast_rsqrt(4) = 32768(Exact: 32768), Approx. Error = 0%, Remainder = 0/3276800 Cycles: 1907 Instructions: 1907 fast_rsqrt(16) = 16384(Exact: 16384), Approx. Error = 0%, Remainder = 0/1638400 Cycles: 1911 Instructions: 1911 fast_rsqrt(24) = 13377(Exact: 13377.48), Approx. Error = 0%, Remainder = 4800/1337748 Cycles: 2345 Instructions: 2345 fast_rsqrt(87) = 7026(Exact: 7026.2), Approx. Error = 0%, Remainder = 2000/702620 Cycles: 2398 Instructions: 2398 fast_rsqrt(269) = 3995(Exact: 3995.8), Approx. Error = 0%, Remainder = 8000/399580 Cycles: 2385 Instructions: 2385 fast_rsqrt(666) = 2539(Exact: 2539.47), Approx. Error = 0%, Remainder = 4700/253947 Cycles: 2356 Instructions: 2356 Total Cycles: 103769 Total Instructions: 103767 === All Test Passed === ``` ### Problem for `-Os` and `-Ofast` After I test the original handwritten code in the requirement, I'd like to test on `-Os` (optimized for size) and `-Ofast` (optimized for speed). I encounter the same problem in both mode. The compilter optimized out the function that print strings and numbers. ```c #define TEST_OUTPUT(msg, length) printstr(msg, length) #define TEST_LOGGER(msg) \ { \ char _msg[] = msg; \ TEST_OUTPUT(_msg, sizeof(_msg) - 1); \ } ``` When I `make run`, the results will be like this. ```tex 21:42:54 INFO src/main.c:315: RISC-V emulator is created and ready to run 1655366553600655360036 36 432768327680032768001486 1486 16163841638400163840049 49 241337713377.480480013377481118 1118 8770267026.202000702620209 209 26939953995.808000399580190 190 66625392539.470470025394759 59 d2�i94105 d2�i�� 94105 d2�i�� �21:42:54 ``` So I have to add clobber argument "memory" in `printstr` to avoid this according to this [online doc](https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/Extended-Asm.html?utm_source=chatgpt.com#Clobbers-and-Scratch-Registers). It tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands. ### -Ofast (Optimized for speed) Changing the compilation option to `-Ofast` will make cycle count decrease in most cases. The reason might be : * Reduction of instructions * Constant folding : Some constant arithmetic operation can be solve during compilation. * Inline expansion and branch simplification : Some small function might go through inline expansion, reducing the cost of function call (call/return) and pipeline flush. * Rearrange the order of instructions to obscure memory delay or pipeline hazard, which makes cycle count lower. ```tex =====Fast Reciprocal sqrt Test===== Original version fast_rsqrt(1) = 65536 (Exact: 65536), Approx. Error = 0%, Remainder = 0/6553600 Cycles: 36 Instructions: 36 fast_rsqrt(4) = 32768 (Exact: 32768), Approx. Error = 0%, Remainder = 0/3276800 Cycles: 1486 Instructions: 1486 fast_rsqrt(16) = 16384 (Exact: 16384), Approx. Error = 0%, Remainder = 0/1638400 Cycles: 49 Instructions: 49 fast_rsqrt(24) = 13377 (Exact: 13377.48), Approx. Error = 0%, Remainder = 4800/1337748 Cycles: 1118 Instructions: 1118 fast_rsqrt(87) = 7026 (Exact: 7026.2), Approx. Error = 0%, Remainder = 2000/702620 Cycles: 209 Instructions: 209 fast_rsqrt(269) = 3995 (Exact: 3995.8), Approx. Error = 0%, Remainder = 8000/399580 Cycles: 190 Instructions: 190 fast_rsqrt(666) = 2539 (Exact: 2539.47), Approx. Error = 0%, Remainder = 4700/253947 Cycles: 59 Instructions: 59 Total Cycles: 94551 Total Instructions: 94551 === All Test Passed === ``` ### Section Size Comparison We can use command `riscv-none-elf-size` to show memory usage breakdown by section. And we can conclude that `-Os` perform the best in code size, reducing approximately half of it compared to the original one. The most signficant difference is `Os` is 452 bytes smaller than `Ofast`. Take a look into data section, they're all zero. Which means there's no initialized global variable in the code, or they might be optimized out by the compiler. BSS means uninitialized data segment, and they're almost the same in those three results. This part doesn't take up flash memory. #### For the Original version ```tex text data bss dec hex filename 5036 0 4100 9136 23b0 test.elf ``` #### For `Osize` ```tex text data bss dec hex filename 2596 0 4108 6704 1a30 test.elf ``` The reason why `Os` is the smallest : 1. Loop Optimization : The loop in the optimization might be unrolled, and some duplicate code might be merged. 2. Choice of instructions : `Os` will use the instruction which is more compact. 3. Arrangement of registers : Reduce the instructions like load/store #### For `Ofast` ```tex text data bss dec hex filename 3048 0 4104 7152 1bf0 test.elf ``` The features of `Ofast` including : 1. More aggressive inlining 2. Loop expansion ### Conclusion We can conclude from the results : * `Os` optimization is highly effective, which reduced the code size by 48.5%. * RAM usage is stable, because we could see that the difference across all optimization levels is minimal * `Os` is suitable for embedded systems

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully