# 2025q1 Homework4 (quiz3+4)
>contributed by < timsong1 >
## [第 3 周測驗題](https://hackmd.io/@sysprog/linux2025-quiz3#%E6%B8%AC%E9%A9%97-2)
### quiz 1
該題目在實作相比於 [Multiple Precision Arithmetic](https://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library) 更高精度的整數運算
```cpp
#include <inttypes.h>
#include <stdint.h>
/* mpi: Multi-Precision Integers */
typedef struct {
uint32_t *data;
size_t capacity;
} mpi_t[1];
typedef size_t mp_bitcnt_t;
```
```cpp
/* ceiling division without needing floating-point operations. */
static size_t ceil_div(size_t n, size_t d)
{
return AAAA;
}
```
要實作整數除法的向上取整(以 n / d 為例),直觀上來講可以分兩種 case:
case 1 : n % d == 0,則 `return n / d`
case 2 : n % d != 0,則 `return n / d + 1`
因為是整數除法(等於是向下取整),所以我們可以透過把 n 加上 `d - 1` 來達成:
`AAAA = (n+d-1)/d`
實際上在 linux 核心程式碼中([include/linux/math.h](https://github.com/torvalds/linux/blob/master/include/linux/math.h) 和 [include/uapi/linux/const.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/const.h))也是這樣實作:
```cpp
#define DIV_ROUND_UP __KERNEL_DIV_ROUND_UP
#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
```
```cpp
#define INTMAX BBBB
```
接下來要定義 integer 的最大範圍(十六進位)
`BBBB = 0x7fffffff`
#### 整數乘法
```cpp=
static void mpi_mul_naive(mpi_t rop, const mpi_t op1, const mpi_t op2)
{
size_t capacity = op1->capacity + op2->capacity;
mpi_t tmp;
mpi_init(tmp);
mpi_enlarge(tmp, capacity);
for (size_t n = 0; n < tmp->capacity; ++n)
tmp->data[n] = 0;
for (size_t n = 0; n < op1->capacity; ++n) {
for (size_t m = 0; m < op2->capacity; ++m) {
uint64_t r = (uint64_t) op1->data[n] * op2->data[m];
uint64_t c = 0;
for (size_t k = CCCC; c || r; ++k) {
if (k >= tmp->capacity)
mpi_enlarge(tmp, tmp->capacity + 1);
tmp->data[k] += (r & INTMAX) + c;
r >>= 31;
c = tmp->data[k] >> 31;
tmp->data[k] &= INTMAX;
}
}
}
mpi_set(rop, tmp);
mpi_compact(rop);
mpi_clear(tmp);
}
```
這個做乘法的函式是一個較直觀的方法,像是一般的**直式乘法**,複雜度是$O(n^2)$
運作原理是:
(行3-8)兩個變數相乘,其`capacity`最大可能為兩數的`capacity`相加,因此宣告變數`tmp`,`capacity`為`op1->capacity + op2->capacity`
(行13-14)透過雙迴圈讓`op1`和`op2`的每個 word 相乘
(行15)因為每個`data`陣列元素是 32 bits,相乘起來可能為 64 bits,所以先用`uint64_t`來做乘法才不會有 overflow 的問題
`r`代表某兩個 word 相乘的結果(64 bits)
`c`代表之前計算結果的 carry,因為`data`的每個元素只能儲存 31 bits,MSB(Most Significant Bit) 如果為 1 的話必須要進位放到下個`data`元素
接下來要把64位元的這個數放到 tmp 裡,放的`data`起始位置會從`n+m`開始遞增,直到 `c`和`r`都為 0
(行20-23)把`r`的最右邊 31 bits 跟 carry `c`加入到 `data`,並更新`r`和`c`,最後再把 MSB 清除
`CCCC = n+m`
#### 整數除法,得出商數和餘數
```cpp=
void mpi_fdiv_qr(mpi_t q, mpi_t r, const mpi_t n, const mpi_t d)
{
mpi_t n0, d0;
mpi_init(n0);
mpi_init(d0);
mpi_set(n0, n);
mpi_set(d0, d);
if (mpi_cmp_u32(d0, 0) == 0) {
fprintf(stderr, "Division by zero\n");
abort();
}
mpi_set_u32(q, 0);
mpi_set_u32(r, 0);
size_t start = mpi_sizeinbase(n0, 2) - 1;
for (size_t i = start; i != (size_t) -1; --i) {
mpi_mul_2exp(r, r, DDDD);
if (mpi_testbit(n0, i) != 0)
mpi_setbit(r, 0);
if (mpi_cmp(r, d0) >= 0) {
mpi_sub(r, r, d0);
mpi_setbit(q, i);
}
}
mpi_clear(n0);
mpi_clear(d0);
}
```
這個函式的核心概念是運用[長除法](https://zh.wikipedia.org/zh-tw/%E9%95%B7%E9%99%A4%E6%B3%95)(Long division)的概念,從被除數最左邊的位置開始,把被除數要被比較的部分跟除數比較,決定商跟餘數的一部份

此外,**被除數要被比較的部分會放在餘數`r`裡**,跟除數`d0`做比較
(行16)決定被除數`n0`的 bit 數
(行18)迴圈從`n0`的 MSB 開始遍歷,直到 0
迴圈內會做三件事:
(行19)對餘數`r`向左位移 1 位(乘以 2)
`DDDD = 1`
(行20-21)如果`n0`的第`i`個 bit 為 1,就把 r 的 LSB 設成 1。
(行22-24)如果`r`大於除數`d0`,把`r`減掉`d0`且商數`q`第`i`個 bit 設為 1
#### 最大公因數
```cpp=
void mpi_gcd(mpi_t rop, const mpi_t op1, const mpi_t op2)
{
if (mpi_cmp_u32(op2, 0) == 0) {
mpi_set(rop, op1);
return;
}
mpi_t q, r;
mpi_init(q);
mpi_init(r);
mpi_fdiv_qr(q, r, op1, op2);
EEEE;
mpi_clear(q);
mpi_clear(r);
}
```
一開始看到行 3 沒有判斷`op1`就大概猜到會用到遞迴
(行3)判斷`op2`是否為 0,是的話`rop`設為`op1`
此函式是使用輾轉相除法來實作,參考[最大公因數特性和實作考量](https://hackmd.io/@sysprog/gcd-impl)的公式:
另 $a_0=a, b_0=b$,則
\begin{split}
\gcd(a_0, b_0) &= \gcd(b_0 \% a_0 = \underline{b_1}, a_0) \\
\gcd(\underline{b_1}, a_0) &= \gcd(a_0 \% b_1 = \underline{a_1}, b_1) \\
\gcd(\underline{a_1}, b_1) &= \gcd(b_1 \% a_1 = b_2, a_1) \\
&\vdots \\
\gcd(a_i, b_i) &= \gcd(b_i \% a_i = a_{i+1}, b_i) \\
&\vdots \\
\gcd(\mathbf{0}, b_n) &= |b_n|
\end{split}
(行12)所以要先找出 `op1` 除以 `op2` 的餘數`r`
接下來行 14 則會再呼叫`mpi_gcd`,而根據行 3 是判斷`op2`是否為 0,可以知道呼叫`mpi_gcd`時`r`要放在參數最後面
`EEEE = mpi_gcd(rop,op2,r)`
### quiz 1 改進:避免使用遞迴並降低記憶體開銷
### quiz 1 改進:改進最大公因數的運算效率
---
### quiz 2
### quiz 2 疑慮:
測試程式有提到把`x`和`y`合併成一個`uint64_t`:
```cpp
static inline uint64_t bit_compound(uint32_t x, uint32_t y)
{
return ((0L + x) << 32) | ((y + 0L) & (-1L >> 32));
}
```
其中有對`-1L`向右移32位元
而根據 **c99 standard §6.5.7/5**:
>The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity, 2 raised to the power E2. **If E1 has a signed type and a negative value, the resulting value is implementation-defined.**
對**有號負數**向右位移是屬於 implementation-defined,代表不同編譯器可能做出邏輯位移或是算數位移
```
tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ gcc -S test.c -o test.s
tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ cat test.s
.file "test.c"
.text
.section .rodata
.LC0:
.string "x = %d (0x%08X)\n"
.LC1:
.string "x >> 1 = %d (0x%08X)\n"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $-1, -8(%rbp)
movl -8(%rbp), %eax
sarl %eax
movl %eax, -4(%rbp)
movl -8(%rbp), %edx
movl -8(%rbp), %eax
movl %eax, %esi
leaq .LC0(%rip), %rax
movq %rax, %rdi
movl $0, %eax
call printf@PLT
movl -4(%rbp), %edx
movl -4(%rbp), %eax
movl %eax, %esi
leaq .LC1(%rip), %rax
movq %rax, %rdi
movl $0, %eax
call printf@PLT
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
.section .note.GNU-stack,"",@progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
```
```
tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ arm-linux-gnueabi-gcc -S test.c -o test_arm.s
tim@tim-ZenBook-UX325JA-UX325JA:~/linux2025$ cat test_arm.s
.arch armv5t
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "test.c"
.text
.section .rodata
.align 2
.LC0:
.ascii "x = %d (0x%08X)\012\000"
.align 2
.LC1:
.ascii "x >> 1 = %d (0x%08X)\012\000"
.text
.align 2
.global main
.syntax unified
.arm
.type main, %function
main:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
push {fp, lr}
add fp, sp, #4
sub sp, sp, #8
mvn r3, #0
str r3, [fp, #-12]
ldr r3, [fp, #-12]
asr r3, r3, #1
str r3, [fp, #-8]
ldr r2, [fp, #-12]
ldr r1, [fp, #-12]
ldr r0, .L3
bl printf
ldr r2, [fp, #-8]
ldr r1, [fp, #-8]
ldr r0, .L3+4
bl printf
mov r3, #0
mov r0, r3
sub sp, fp, #4
@ sp needed
pop {fp, pc}
.L4:
.align 2
.L3:
.word .LC0
.word .LC1
.size main, .-main
.ident "GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
.section .note.GNU-stack,"",%progbits
```