# 2025q1 Homework4 contributed by < [BennyWang1007](https://github.com/BennyWang1007) > {%hackmd NrmQUGbRQWemgwPfhzXj6g %} ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 $ lscpu 架構: x86_64 CPU 作業模式: 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 供應商識別號: GenuineIntel Model name: 12th Gen Intel(R) Core(TM) i7-12700H CPU 家族: 6 型號: 154 每核心執行緒數: 2 每通訊端核心數: 14 Socket(s): 1 製程: 3 CPU(s) scaling MHz: 23% CPU max MHz: 4700.0000 CPU min MHz: 400.0000 BogoMIPS: 5376.00 Virtualization features: 虛擬: VT-x Caches (sum of all): L1d: 544 KiB (14 instances) L1i: 704 KiB (14 instances) L2: 11.5 MiB (8 instances) L3: 24 MiB (1 instance) NUMA: NUMA 節點: 1 NUMA node0 CPU(s): 0-19 ``` ### Quiz03-1 #### 參考答案 AAAA = (n+d-1)/d BBBB = 0x7fffffff CCCC = n+m DDDD = 1 EEEE = mpi_gcd(rop,op2,r) #### 延伸1: 解釋上述程式碼運作原理 ```c static size_t ceil_div(size_t n, size_t d) { // return AAAA; return (n+d-1)/d; } ``` 除法取上高斯等同於加上(除數-1)然後相除取下高斯: $\lceil \frac{n}{d} \rceil = \lfloor \frac{n + (d-1)}{d} \rfloor$ ```c // #define INTMAX BBBB #define INTMAX 0x7fffffff ``` 有號整數的最大值為 `0x7fffffff` 在 `mpi_mul_naive` 中,首先對每一對 31-bit 的 limb 相乘,會產生一個最大 62-bit 的中間結果,接下來進行加總與進位處理(res[n+m] += a[n] * b[m]): ```c // for (size_t k = CCCC; c || r; ++k) for (size_t k = n+m; c || r; ++k) ``` 最後用位元運算操作加法以及進位。 ```mpi_fdiv_qr``` 模擬二進位長除法(從高位開始往低位掃)。 用 mpi_testbit() 檢查 n 的每一位,把 `r` 左移一位,加上該位。 若 r ≥ d,則減去 `d`,並把 `q` 的那一位設為 1。 `mpi_gcd` 等價於輾轉相除法: ```c if (op2 == 0) return op1; else gcd(op2, op1 % op2) ``` 因此 EEEE = `mpi_gcd(rop, op2, r)`; #### 延伸二: 這項實作依賴遞迴呼叫,請避免使用並降低記憶體開銷 將程式碼改成避免遞迴 ```c void mpi_gcd(mpi_t rop, const mpi_t op1, const mpi_t op2) { ... while (mpi_cmp_u32(b, 0) != 0) { mpi_fdiv_qr(NULL, t, a, b); // t = a % b, don't care the quotient mpi_set(a, b); // a = b mpi_set(b, t); // b = t } ... ``` #### 延伸三: 改進最大公因數的運算效率 可以使用「Stein's GCD(Binary GCD)」: 1. 若兩數都是偶數,gcd(a,b) = 2 * gcd(a/2, b/2) 2. 若一偶一奇,則 gcd(a,b) = gcd(a/2, b) or gcd(a, b/2) 3. 若兩數皆為奇數,則 gcd(a,b) = gcd(|a-b|, min(a,b)) 以下為最終程式碼 ```c int mpi_even(const mpi_t op) { return (op->capacity == 0) || ((op->data[0] & 1) == 0); } void mpi_rshift1(mpi_t rop) { uint32_t carry = 0; for (size_t i = rop->capacity; i-- > 0;) { uint32_t new_carry = rop->data[i] & 1; rop->data[i] = (rop->data[i] >> 1) | (carry << 30); carry = new_carry; } mpi_compact(rop); } void mpi_lshift(mpi_t rop, const mpi_t op, size_t k) { mpi_set(rop, op); while (k--) mpi_lshift1(rop); } void mpi_swap(mpi_t a, mpi_t b) { mpi_t tmp; mpi_init(tmp); mpi_set(tmp, a); mpi_set(a, b); mpi_set(b, tmp); mpi_clear(tmp); } void mpi_gcd(mpi_t rop, const mpi_t op1, const mpi_t op2) { mpi_t a, b; mpi_init(a); mpi_init(b); mpi_set(a, op1); mpi_set(b, op2); if (mpi_cmp_u32(a, 0) == 0) { mpi_set(rop, b); mpi_clear(a); mpi_clear(b); return; } if (mpi_cmp_u32(b, 0) == 0) { mpi_set(rop, a); mpi_clear(a); mpi_clear(b); return; } // Find common power of 2 (d) size_t d = 0; while (mpi_even(a) && mpi_even(b)) { mpi_rshift1(a); mpi_rshift1(b); ++d; } // Divide a by 2 until it becomes odd while (mpi_even(a)) mpi_rshift1(a); do { while (mpi_even(b)) mpi_rshift1(b); if (mpi_cmp(a, b) > 0) mpi_swap(a, b); // Ensure a <= b mpi_sub(b, b, a); // b = b - a } while (mpi_cmp_u32(b, 0) != 0); // Restore common factors of 2 if (d > 0) mpi_lshift(a, a, d); mpi_set(rop, a); mpi_clear(a); mpi_clear(b); } ``` ### Quiz03-2 #### 參考答案 測驗二: \ ### Quiz03-3 #### 參考答案 AAAA = ENV_RUNNABLE BBBB = attempts+1 CCCC = coro_schedule DDDD = coro_yield