# 2025q1 Homework4 (quiz3+4) >contributed by < timsong1 > ## [第 3 周測驗題](https://hackmd.io/@sysprog/linux2025-quiz3#%E6%B8%AC%E9%A9%97-2) ### quiz 1 該題目在實作相比於 [Multiple Precision Arithmetic](https://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library) 更高精度的整數運算 ```cpp #include <inttypes.h> #include <stdint.h> /* mpi: Multi-Precision Integers */ typedef struct { uint32_t *data; size_t capacity; } mpi_t[1]; typedef size_t mp_bitcnt_t; ``` ```cpp /* ceiling division without needing floating-point operations. */ static size_t ceil_div(size_t n, size_t d) { return AAAA; } ``` 要實作整數除法的向上取整(以 n / d 為例),直觀上來講可以分兩種 case: case 1 : n % d == 0,則 `return n / d` case 2 : n % d != 0,則 `return n / d + 1` 因為是整數除法(等於是向下取整),所以我們可以透過把 n 加上 `d - 1` 來達成: `AAAA = (n+d-1)/d` 實際上在 linux 核心程式碼中([include/linux/math.h](https://github.com/torvalds/linux/blob/master/include/linux/math.h) 和 [include/uapi/linux/const.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/const.h))也是這樣實作: ```cpp #define DIV_ROUND_UP __KERNEL_DIV_ROUND_UP #define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) ``` ```cpp #define INTMAX BBBB ``` 接下來要定義 integer 的最大範圍(十六進位) `BBBB = 0x7fffffff` #### 整數乘法 ```cpp= static void mpi_mul_naive(mpi_t rop, const mpi_t op1, const mpi_t op2) { size_t capacity = op1->capacity + op2->capacity; mpi_t tmp; mpi_init(tmp); mpi_enlarge(tmp, capacity); for (size_t n = 0; n < tmp->capacity; ++n) tmp->data[n] = 0; for (size_t n = 0; n < op1->capacity; ++n) { for (size_t m = 0; m < op2->capacity; ++m) { uint64_t r = (uint64_t) op1->data[n] * op2->data[m]; uint64_t c = 0; for (size_t k = CCCC; c || r; ++k) { if (k >= tmp->capacity) mpi_enlarge(tmp, tmp->capacity + 1); tmp->data[k] += (r & INTMAX) + c; r >>= 31; c = tmp->data[k] >> 31; tmp->data[k] &= INTMAX; } } } mpi_set(rop, tmp); mpi_compact(rop); mpi_clear(tmp); } ``` 這個做乘法的函式是一個較直觀的方法,像是一般的**直式乘法**,複雜度是$O(n^2)$ 運作原理是: (行3-8)兩個變數相乘,其`capacity`最大可能為兩數的`capacity`相加,因此宣告變數`tmp`,`capacity`為`op1->capacity + op2->capacity` (行13-14)透過雙迴圈讓`op1`和`op2`的每個 word 相乘 (行15)因為每個`data`陣列元素是 32 bits,相乘起來可能為 64 bits,所以先用`uint64_t`來做乘法才不會有 overflow 的問題 `r`代表某兩個 word 相乘的結果(64 bits) `c`代表之前計算結果的 carry,因為`data`的每個元素只能儲存 31 bits,MSB(Most Significant Bit) 如果為 1 的話必須要進位放到下個`data`元素 接下來要把64位元的這個數放到 tmp 裡,放的`data`起始位置會從`n+m`開始遞增,直到 `c`和`r`都為 0 (行20-23)把`r`的最右邊 31 bits 跟 carry `c`加入到 `data`,並更新`r`和`c`,最後再把 MSB 清除 `CCCC = n+m` #### 整數除法,得出商數和餘數 ```cpp= void mpi_fdiv_qr(mpi_t q, mpi_t r, const mpi_t n, const mpi_t d) { mpi_t n0, d0; mpi_init(n0); mpi_init(d0); mpi_set(n0, n); mpi_set(d0, d); if (mpi_cmp_u32(d0, 0) == 0) { fprintf(stderr, "Division by zero\n"); abort(); } mpi_set_u32(q, 0); mpi_set_u32(r, 0); size_t start = mpi_sizeinbase(n0, 2) - 1; for (size_t i = start; i != (size_t) -1; --i) { mpi_mul_2exp(r, r, DDDD); if (mpi_testbit(n0, i) != 0) mpi_setbit(r, 0); if (mpi_cmp(r, d0) >= 0) { mpi_sub(r, r, d0); mpi_setbit(q, i); } } mpi_clear(n0); mpi_clear(d0); } ``` 這個函式的核心概念是運用[長除法](https://zh.wikipedia.org/zh-tw/%E9%95%B7%E9%99%A4%E6%B3%95)(Long division)的概念,從被除數最左邊的位置開始,把被除數要被比較的部分跟除數比較,決定商跟餘數的一部份 ![image](https://hackmd.io/_uploads/ryW5nBWRyl.png) 此外,**被除數要被比較的部分會放在餘數`r`裡**,跟除數`d0`做比較 (行16)決定被除數`n0`的 bit 數 (行18)迴圈從`n0`的 MSB 開始遍歷,直到 0 迴圈內會做三件事: (行19)對餘數`r`向左位移 1 位(乘以 2) `DDDD = 1` (行20-21)如果`n0`的第`i`個 bit 為 1,就把 r 的 LSB 設成 1。 (行22-24)如果`r`大於除數`d0`,把`r`減掉`d0`且商數`q`第`i`個 bit 設為 1 #### 最大公因數 ```cpp= void mpi_gcd(mpi_t rop, const mpi_t op1, const mpi_t op2) { if (mpi_cmp_u32(op2, 0) == 0) { mpi_set(rop, op1); return; } mpi_t q, r; mpi_init(q); mpi_init(r); mpi_fdiv_qr(q, r, op1, op2); EEEE; mpi_clear(q); mpi_clear(r); } ``` 一開始看到行 3 沒有判斷`op1`就大概猜到會用到遞迴 (行3)判斷`op2`是否為 0,是的話`rop`設為`op1` 此函式是使用輾轉相除法來實作,參考[最大公因數特性和實作考量](https://hackmd.io/@sysprog/gcd-impl)的公式: 另 $a_0=a, b_0=b$,則 \begin{split} \gcd(a_0, b_0) &= \gcd(b_0 \% a_0 = \underline{b_1}, a_0) \\ \gcd(\underline{b_1}, a_0) &= \gcd(a_0 \% b_1 = \underline{a_1}, b_1) \\ \gcd(\underline{a_1}, b_1) &= \gcd(b_1 \% a_1 = b_2, a_1) \\ &\vdots \\ \gcd(a_i, b_i) &= \gcd(b_i \% a_i = a_{i+1}, b_i) \\ &\vdots \\ \gcd(\mathbf{0}, b_n) &= |b_n| \end{split} (行12)所以要先找出 `op1` 除以 `op2` 的餘數`r` 接下來行 14 則會再呼叫`mpi_gcd`,而根據行 3 是判斷`op2`是否為 0,可以知道呼叫`mpi_gcd`時`r`要放在參數最後面 `EEEE = mpi_gcd(rop,op2,r)` ### quiz 1 改進:避免使用遞迴並降低記憶體開銷 ### quiz 1 改進:改進最大公因數的運算效率 --- ### quiz 2 ### quiz 2 疑慮: 測試程式有提到把`x`和`y`合併成一個`uint64_t`: ```cpp static inline uint64_t bit_compound(uint32_t x, uint32_t y) { return ((0L + x) << 32) | ((y + 0L) & (-1L >> 32)); } ``` 其中有對`-1L`向右移32位元 而根據 **c99 standard §6.5.7/5**: >The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity, 2 raised to the power E2. **If E1 has a signed type and a negative value, the resulting value is implementation-defined.** 對**有號負數**向右位移是屬於 implementation-defined,代表不同編譯器可能做出邏輯位移或是算數位移 ``` tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ gcc -S test.c -o test.s tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ cat test.s .file "test.c" .text .section .rodata .LC0: .string "x = %d (0x%08X)\n" .LC1: .string "x >> 1 = %d (0x%08X)\n" .text .globl main .type main, @function main: .LFB0: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $16, %rsp movl $-1, -8(%rbp) movl -8(%rbp), %eax sarl %eax movl %eax, -4(%rbp) movl -8(%rbp), %edx movl -8(%rbp), %eax movl %eax, %esi leaq .LC0(%rip), %rax movq %rax, %rdi movl $0, %eax call printf@PLT movl -4(%rbp), %edx movl -4(%rbp), %eax movl %eax, %esi leaq .LC1(%rip), %rax movq %rax, %rdi movl $0, %eax call printf@PLT movl $0, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0" .section .note.GNU-stack,"",@progbits .section .note.gnu.property,"a" .align 8 .long 1f - 0f .long 4f - 1f .long 5 0: .string "GNU" 1: .align 8 .long 0xc0000002 .long 3f - 2f 2: .long 0x3 3: .align 8 4: ``` ``` tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ arm-linux-gnueabi-gcc -S test.c -o test_arm.s tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ cat test_arm.s .arch armv5t .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 6 .eabi_attribute 34, 0 .eabi_attribute 18, 4 .file "test.c" .text .section .rodata .align 2 .LC0: .ascii "x = %d (0x%08X)\012\000" .align 2 .LC1: .ascii "x >> 1 = %d (0x%08X)\012\000" .text .align 2 .global main .syntax unified .arm .type main, %function main: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 push {fp, lr} add fp, sp, #4 sub sp, sp, #8 mvn r3, #0 str r3, [fp, #-12] ldr r3, [fp, #-12] asr r3, r3, #1 str r3, [fp, #-8] ldr r2, [fp, #-12] ldr r1, [fp, #-12] ldr r0, .L3 bl printf ldr r2, [fp, #-8] ldr r1, [fp, #-8] ldr r0, .L3+4 bl printf mov r3, #0 mov r0, r3 sub sp, fp, #4 @ sp needed pop {fp, pc} .L4: .align 2 .L3: .word .LC0 .word .LC1 .size main, .-main .ident "GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0" .section .note.GNU-stack,"",%progbits ```