# 2026-03-17/24 問答簡記
## 回顧上週討論
> [2026-03-10](https://hackmd.io/p87SQ6WpT-ehlVhXglXk9w)
:::warning
凡[第三週測驗題](https://hackmd.io/@sysprog/linux2026-quiz3)超過 40 分的學員,請參閱 [kbox](https://github.com/sysprog21/kbox) 專案 (搭配 [LKL: 重用 Linux 核心的成果](https://hackmd.io/@sysprog/linux-lkl)閱讀) 並發信給授課教師,預計要在這專案進行哪些改進
收件人: `<jserv.tw@gmail.com>`
標題: Linux2026/Week4: 針對 kbox 預計的改進
:::
https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/deterministic
:::info
回顧 ==[作業一](https://hackmd.io/@sysprog/linux2026-homework1)== 和 ==[作業二](https://hackmd.io/@sysprog/linux2026-homework2)==
:::
## YANG-CHUN-CHIH
無條件進位的英文
為甚麼 linux 不使用浮點數
積化合差 DCT 的過程怎麼運作的
## JessYu-1011
Stack, BSS, Heap
- 為甚麼 `img_y` `img_cg` `img_co` 分別 $2^{20}$ 個 `int`
- 並且使用 BSS 區段而不用 Stack 因為 determinstic
## JOJOTOOL
對於隨機排列的串列,list_sort 的期望比較次數是多少
linked list版和array版的quick sort,最差情況的時間複雜度式各是多少,如何計算
## Shaoen-Lin
1. amei的圖片可以看出甚麼壓縮的問題
2. 壓縮主要的失真發生在甚麼計算
3. DCT 較適用的是具有什麼特徵的照片
## 這週討論
## deantee
### 課堂上解讀測驗 Problem D
太長不看:`float32_to_float16` 用位元運算實現不同精度浮點數換算
```clike
/**
* convert float32 to float16
*
* single precision (float32) format: 1 (sign) + 8 (exponent) + 23 (mantissa) =
* 32 bits half precision (float16) format: 1 (sign) + 5 (exponent) + 10
* (mantissa) = 16 bits
*/
unsigned float32_to_float16(unsigned f) {
/* extract the value of sign `sign`, exponent `exp`, fraction `frac` */
unsigned sign = (f >> 31) & 1;
unsigned exp = (f >> 23) & 0xFF;
unsigned frac = f & 0x7FFFFF;
/* align sign bit to float16 format */
unsigned hsign = sign << 15;
/* handle special values (infinity & NaN) */
if (exp == 0xFF) {
/* handle infinity */
if (frac == 0) {
/**
* setting the exponent bit to all ones
* 0x7C00 = 01111100 00000000
*/
return hsign | 0x7C00;
}
/**
* handle NaN
* 0x7E00 = 01111110 00000000
* 0x7E00 has one extra set bit compared to 0x7C00, which is located right
* after the least significant set bit in 0x7C00 (to make the mantissa
* non-zero \implies NaN)
*
* TODO: find out why is the encoded data inside mantissa is not preserved
* (maybe not important?)
*
* UPDATE: turns out it's due to the inability to
* encode (2^{23} - 1) of information with only (2^{10} - 1) possible values
* without special handling
*/
return hsign | 0x7E00;
}
/**
* return zero if it's denormalized
*
* the greatest denormalized float32 value is much lower than least float16
* value
*
* TODO: prove by finding the exact values.
*
* UPDATE: greatest denormalized
* float32 value = (1 - \frac{1}{2^{23}}) \cdot 2^{-126} = 2^{-126} - 2^{-149}
* least denormalized float16 value = \frac{1}{2^{10}} \cdot 2^{-14} = 2^{-4}
* \gg 2^{-126} - 2^{-149} = greatest denormalized float32 value
*/
if (exp == 0) {
return hsign;
}
/**
* let E_{32} be the exponent of float32
* let E_{16} be the exponent of float16
* 2^{E_{32} - 127} = 2^{E_{16} - 15} \implies E_{16} = 112 = __D01__
*/
int hexp = (int)exp - 112;
/**
* the norm is too large to be encoded in float 16
* returns infinity
*/
if (hexp >= 31) {
return hsign | 0x7C00;
}
/* handle the case where the norm ends up small enough to be a denormalized
* float16 */
if (hexp <= 0) {
/**
* includes the additional one as 2^{23} into the mantissa
* this is because it was (1 + \frac{M}{2^{23}}) and we're turning it into
* the form \frac{M'}{2^{10}}
*/
unsigned mant = frac | 0x800000;
/**
* storing 1 + (23 (\#\mathrm{mantissa}_{32})) = 24 bits inside 10
(\#\mathrm{mantissa}_{16}) bits, we choose to truncate and keep the highest
24 - 10 = 14 = __D02__ bits, truncating an additional (-hexp) bits
*
* TODO: find a good explanation for the (-hexp) part.
* UPDATE: man * 2^{exp - 127 + 1} = man \gg exp 126 - exp = man \gg 126 -
(hexp + 112) = man \gg 14 - hexp
*/
int shift = 14 - hexp;
/* no precision would remain if the shift width is greater than the (1 +
* \#\mathrm{maintissa}) */
if (shift > 24) {
return hsign;
}
/**
* shift and round
*
* round: round to nearest integer, ties to even
* lost > half: check if it's greater than a half, round up if so
* lost == half && (val & 1): check if it's exactly half and the least
* significant bit is odd, round up if so
*
* TODO: find out why there's no carry handling like there is for the
* 'normalized' case.
*
* UPDATE: turns out if it ever get rounded and carry bits
* beyond 10 bits, i.e. from 00000011 11111111 rounded to 00000100 00000000
* then it'd be correct
*/
unsigned val = mant >> shift;
unsigned lost = mant & (1u << shift) - 1;
unsigned half = 1u << shift - 1;
if (lost > half || lost == half && (val & 1)) {
++val;
}
return hsign | val;
}
/* shift and round */
unsigned hfrac = frac >> 13;
unsigned lost = frac & 0x1FFF;
/* __D03__ = 1u << 13 - 1 = 1u << 12 = 0x1000 */
unsigned half = 0x1000;
if (lost > half || lost == half && (hfrac & 1)) {
++hfrac;
/**
* check if hfrac becomes all zeros after incrementing due to chain of
* carrying, increment the exponent if so.
*
* TODO: should be unlikely (suppose being unlikely means not occuring more
* than 1% of all time) because this only happens in only one case of which
* all bits are set in the mantissa & the most significant lost bit is set.
*
* UPDATE: the determining factor are the 11 most significant bits from the
* mantissa, which only have a case where all are set, in a chance of
* \frac{1}{2^{11}} = \frac{1}{2048} \approx 0.049\%, so I was right
*/
if (hfrac == 0x400) {
hfrac = 0;
++hexp;
/**
* return infinity if its exponent becomes all set
* TODO: find out why using `>=` when `==` would suffice
*
* UPDATE: using `>=` here is just defensive
*/
if (hexp >= 31) {
return hsign | 0x7C00;
}
}
/* packing the sign, exponent and mantissa */
return hsign | hexp << 10 | hfrac;
}
}
```
### 課後實現任意精度* 轉換
\* 並非任意,只能在 $\{2, 4, 8\}\text{-byte}$ float 中做轉換
```clike
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef uint64_t u64;
typedef uint8_t u8;
/* convert n-byte float to m-byte float */
u64 float_n_to_float_m(u64 src, u64 n, u64 m) {
if (!(n >= 2 && n <= 8 && m >= 2 && m <= 8 && !(n & n - 1) && !(m & m - 1)))
exit(1);
if (n == m) return src;
int nexp = __builtin_ctzll(n) * 3 + 2;
int nman = (n << 3) - nexp - 1;
int mexp = __builtin_ctzll(m) * 3 + 2;
int mman = (m << 3) - mexp - 1;
u64 sgn = src >> ((n << 3) - 1);
u64 exp = src >> nman & (1ULL << nexp) - 1;
u64 man = src & (1ULL << nman) - 1;
u64 dsgn = sgn << ((m << 3) - 1);
if (exp == (1 << nexp) - 1) {
if (man) return dsgn | (u64)((1 << (mexp + 1)) - 1) << (mman - 1);
return dsgn | (u64)((1 << mexp) - 1) << mman;
}
if (n > m) {
if (exp == 0) return dsgn;
int dexp = exp - (1 << (nexp - 1)) + (1 << (mexp - 1));
if (dexp >= (1 << mexp) - 1) return dsgn | (u64)((1 << mexp) - 1) << mman;
if (dexp <= 0) {
u64 val = man | 1ULL << nman;
int shift = nman - mman - dexp + 1;
if (shift > nman) return dsgn;
u64 dman = val >> shift;
u64 lost = val & (1ULL << shift) - 1ULL;
u64 half = 1ULL << (shift - 1);
if (lost > half || (lost == half && (dman & 1))) {
++dman;
}
return dsgn | dman;
}
u64 dman = man >> (nman - mman);
u64 lost = man & (1ULL << (nman - mman)) - 1ULL;
u64 half = 1ULL << (nman - mman - 1);
if (lost > half || (lost == half && (dman & 1))) {
++dman;
if (dman == 1ULL << mman) {
dman = 0;
++dexp;
if (dexp >= (1 << mexp) - 1)
return dsgn | (u64)((1 << mexp) - 1) << mman;
}
}
return dsgn | (u64)dexp << mman | dman;
}
if (exp == 0) {
if (man == 0) return dsgn;
int idx = 63 - __builtin_clzll(man);
u64 dexp = idx - nman + 1 - (1 << (nexp - 1)) + (1 << (mexp - 1));
u64 dman = (man ^ (1ULL << idx)) << (mman - idx);
return dsgn | dexp << mman | dman;
}
u64 dexp = exp - (1 << (nexp - 1)) + (1 << (mexp - 1));
u64 dman = man << (mman - nman);
return dsgn | dexp << mman | dman;
}
void print_bin(void* ptr, size_t size) {
u8* p = ptr;
while (size--) {
for (size_t i = 8; i--;) putchar((p[size] >> i & 1) | '0');
putchar(size ? ' ' : '\n');
}
}
int main(void) {
int n;
int m;
double f;
printf("this program converts n-byte float to m-byte float in IEEE-754\n");
printf("n = ");
scanf("%d", &n);
if (!(n && !(n & n - 1))) {
printf("invalid n: n must be in [2, 4, 8]\n");
return 0;
}
printf("m = ");
scanf("%d", &m);
if (!(m && !(m & m - 1))) {
printf("invalid m: m must be in [2, 4, 8]\n");
return 0;
}
printf("f = ");
scanf("%lf", &f);
u64 val;
if (n == 2) {
_Float16 tmp = f;
memcpy(&val, &tmp, n);
} else if (n == 4) {
float tmp = f;
memcpy(&val, &tmp, n);
} else {
memcpy(&val, &f, n);
}
printf("bit representation: ");
print_bin(&val, n);
val = float_n_to_float_m(val, n, m);
double dst;
if (m == 2) {
_Float16 tmp;
memcpy(&tmp, &val, m);
dst = tmp;
} else if (m == 4) {
float tmp;
memcpy(&tmp, &val, m);
dst = tmp;
} else {
memcpy(&dst, &val, m);
}
printf("result: %lf\n", dst);
printf("bit representation: ");
print_bin(&val, m);
}
```
呈現效果:
```
this program converts n-byte float to m-byte float in IEEE-754
n = 8
m = 2
f = 3.14159
bit representation: 01000000 00001001 00100001 11111001 11110000 00011011 10000110 01101110
result: 3.140625
bit representation: 01000010 01001000
```
### 課後實現 64 位元整數轉 64 位元浮點數
```clike
#include <stdint.h>
#include <stdio.h>
#include <string.h>
typedef uint64_t u64;
typedef int64_t i64;
u64 i64_to_f64(u64 n) {
if (n == 0) return 0;
u64 sgn = n >> 63;
u64 val = (n + sgn) ^ sgn;
u64 idx = 63 - __builtin_clzll(val);
u64 exp = idx + 1023;
if (idx <= 52)
return sgn << 63 | exp << 52 | (val & (1ULL << idx) - 1ULL) << (52 - idx);
return sgn << 63 | exp << 52 | (val >> (idx - 52) & (1ULL << 52) - 1ULL);
}
int main(void) {
i64 n;
scanf("%ld", &n);
n = i64_to_f64(n);
double flt;
memcpy(&flt, &n, 8);
printf("%.16lf\n", flt);
}
```
### 課後實現 64 位元浮點數轉 64 位元整數
```clike
#include <stdint.h>
#include <stdio.h>
#include <string.h>
typedef uint64_t u64;
typedef int64_t i64;
u64 f64_to_i64(u64 n) {
u64 sgn = n >> 63 << 63;
u64 exp = n >> 52 & 0x7FF;
u64 man = n & 0xFFFFFFFFFFFFFULL;
u64 val = man | 0x10000000000000ULL;
int shift = exp - 1023;
if (shift < 0) {
return 0;
} else if (shift < 52) {
u64 res = val >> (52 - shift);
return sgn ? ~res + 1 : res;
} else if (shift - 52 < 64) {
u64 res = val << (shift - 52);
return sgn ? ~res + 1 : res;
}
return sgn ? 1ULL << 63 : ~0ULL >> 1;
}
int main(void) {
double dbl;
scanf("%lf", &dbl);
u64 val;
memcpy(&val, &dbl, 8);
val = f64_to_i64(val);
i64 res;
memcpy(&res, &val, 8);
printf("%ld\n", res);
}
```
## hding4915
浮點數精度轉換時,像是正無窮、NaN 要怎麼在轉換 (如倍精度 $\to$ 單精度) 同時保留這種資訊?
使用 bit mask
## Eason0729
浮點數的最近值的意義: https://hackmd.io/@sysprog/c-floating-point#Rounding
做一個 float 轉換 function double
參數:
1. address of float
2. address of double
預期行為:
- 如果 address of float 的儲存 0.0,將 double 轉換成 float
- 如果 address of double 的儲存 0.0,將 float 轉換成 double
- 其餘不改變值
```c
#define GEN_MASK(START, WIDTH) (((1L << (WIDTH)) - 1) << (START))
#include <stdint.h>
void float_double_convert_inner(uint32_t* f, uint64_t* d) {
// double: 1 11 52
// float: 1 8 23
// range of float: 00000 => subnormal => 00001...0000 => normal => 11111...0000 => NaN => 11111.1111=> inf
}
// A two-direction float double convert function
//
// If f is zero, convert double to float
// If d is zero, convert float to double
void float_double_convert(float* f, double* d){
union {
float f;
uint32_t i;
}fp = {.f= *f};
union {
double d;
uint64_t i;
}dp = {.d= *d};
float_double_convert_inner(&fp.i, &dp.i);
}
// A two-direction float double convert function
//
// If f is zero, convert double to float
// If d is zero, convert float to double
float float_double_convert(float* f, double* d){
// generalize
// double: 1 11 52
// float: 1 8 23
// range of float: 00000 => subnormal => 00001...0000 => normal => 11111...0000 => NaN => 11111.1111=> inf
}
```
## patata0717
QuickSort 遞迴式[已訂正]:
$$T(n) = T(k) + T(n - k - 1) + O(n)$$
Worst case 發生在每次都是
$$T(n) = T(n-1) + T(0) + O(n)$$
$$T(n) = O(n) + O(n-1) + O(n-2)... + O(1)$$
Which is $$O(n^2)$$
$2 \times \log_2(n)$
Q:證明並實驗
https://github.com/torvalds/linux/blob/c369299895a591d96745d6492d4888259b004a9e/lib/sort.c#L5
```
* A fast, small, non-recursive O(n log n) sort for the Linux kernel
*
* This performs n*log2(n) + 0.37*n + o(n) comparisons on average,
* and 1.5*n*log2(n) + O(n) in the (very contrived) worst case.
*
* Quicksort manages n*log2(n) - 1.26*n for random inputs (1.63*n
* better) at the expense of stack usage and much larger code to avoid
* quicksort's O(n^2) worst case.
```
分析方式:
A. 以array實作
B. 以circular doubly linked list實作
計算comparison、swap等的次數
推導cost和latency
可以把所有可能列出,畫成一個樹
我認為可以使用DP來算。
假設:
1. pivot切到某段的機率是1/(n-1), n是當下的n值。
- 期望比較次數
$$C(n) = (n - 1) + \frac{1}{n} \sum_{k=0}^{n-1} \left[ C(k) + C(n - 1 - k) \right]$$
$$C(n) = (n - 1) + \frac{2}{n} \sum_{k=0}^{n-1} C(k)$$
$$C(n) = (n - 1) + \frac{2}{n} * currnct-sum-of -dp-table$$
```c
#include <stdio.h>
int main() {
int n = 3;
double dp_table[n+1];
dp_table[0] = 0.0;
dp_table[1] = 0.0;
double current_sum = dp_table[0] + dp_table[1];
for (int i = 2; i <= n; i++) {
dp_table[i] = (i - 1) + (2 / (double)i) * current_sum;
printf("dp_table[%d] = %.4f\n", i, dp_table[i]);
current_sum += dp_table[i];
printf("current_sum = %.4f\n", current_sum);
}
// 輸出結果
printf("當 N = %d 時,QuickSort 的期望比較次數為: %.4f\n", n, dp_table[n]);
return 0;
}
```

- 期望 swap 次數
※並非loop index的比較次數,是list中資料的比較,兩者的成本可能有顯著的不同(字串比較)
假設pivot的位置在sublist裡平均分佈。
這個模型的重點就是要知道pivot是sublist的第幾大,即是比較次數。
從上式出發
$$C(n) = Cost(n) + \frac{2}{n} \sum_{k=0}^{n-1} C(k)$$
Cost(n)在comparison是定值,但在swap則不固定,需要用期望值來計算。
Pivot為sublist中第k大的數,在剩於n-1的數中,可以分為兩組:比k小的,共k-1個,比k大的,共n-k個。
而swap的次數為「錯位」的個數,即「明明比k大,但卻座落在比k小的那一半裡」,反之亦然,兩組錯位的個數是一樣的。
假設k-1為紅球,n-k為綠球,有一個list有紅區(k-1)和綠區(n-k),球為uniform distribution。
在紅區中的綠球的總數(或在綠區中的紅球)是
$$(k-1)*\frac{1}{n-1}*(n-k)$$
$$紅區大小*某一球出現的機率*綠球總數$$
K也是unifrom distribution,將k包在上式外面,可得
$$\text{Cost}_{\text{swap}}(n) = \frac{1}{n} \sum_{k=1}^{n} \frac{(k-1)(n-k)}{n-1}$$
即$(n-2)/6$
$$C(n) = \frac{n-2}{6} + \frac{2}{n} \sum_{k=0}^{n-1} C(k)$$
將此式進行dp,即

log-log

(其實從作圖還是很難看出是n還是nlogn)
雖然看起來像線性,R值也很接近1,但這與我們預期的結果不符,所以嘗試把遞迴式展開。
Comparison:
$$C(n)=(n-1)+\frac{2}{n}\sum_{k=0}^{n-1}C(k)$$
同乘n
$$nC(n)=n(n-1)+2\sum_{k=0}^{n-1}C(k)$$
將n以n-1代入
$$(n-1)C(n-1)=(n-1)(n-2)+2\sum_{k=0}^{n-2}C(k)$$
上下兩式相減
$$nC(n)-(n-1)C(n-1)=2(n-1)+2C(n-1)$$
化簡
$$nC(n)=(n+1)C(n-1)+2(n-1)$$
同除n(n+1)
$$\frac{C(n)}{n+1}=\frac{C(n-1)}{n}+\frac{2(n-1)}{n(n+1)}$$
以n-1代入
$$\frac{C(n-1)}{n}=\frac{C(n-2)}{n-1}+\frac{2(n-2)}{(n-1)(n)}$$
以n-2代入
$$\frac{C(n-2)}{n-1}=\frac{C(n-3)}{n-2}+\frac{2(n-3)}{(n-2)(n-1)}$$
...
...
...
以2代入
$$\frac{C(2)}{3}=\frac{C(1)}{2}+\frac{2*1}{2*3}$$
消去化簡得
$$\frac{C(n)}{n+1}=\frac{C(1)}{2}+\sum_{k=2}^n\frac{2(k-1)}{k(k+1)}=\sum_{k=2}^n\frac{2(k-1)}{k(k+1)}$$
將$\frac{2(k-1)}{k(k+1)}$拆分得
$$2(\frac{2}{k+1}-\frac{1}{k})$$
Harmonic series
$$\sum_{k=1}^{n}\frac{1}{k}=H_n=\ln n + \gamma + \frac{1}{2n}$$
γ~=0.577
$$\sum_{k=2}^n\frac{1}{k}=H_n - \frac{1}{1}$$
$$\sum_{k=2}^n\frac{1}{k+1}=H_n - \frac{1}{1} - \frac{1}{2} + \frac{1}{n+1}$$
$$\frac{C(n)}{(n+1)}=2(2(H_n-1.5+\frac{1}{n+1})-(H_n-1))$$
化簡得
$$\frac{C(n)}{(n+1)}=2H_n-4+\frac{4}{n+1}$$
同乘(n+1)
$$C(n)=(n+1)(2(\ln n + \gamma + \frac{1}{2n})-4+\frac{4}{n+1})$$
整理得
$$C(n)=2n\ln n+(2\gamma-4)n+2\ln n+(2\gamma+1)+\frac{1}{n}$$
$\ln n = \ln 2\log n$
得
$$C(n)=1.39n\log n -2.85n+1.39\log n+2.15$$
1/n在n趨近於無限大時消去(或是說在計算次數時根本不必考慮小數)
Swap:
相同方法推導
$$C(n)=\frac{1}{6}(1.39n\log n-3.35n+1.39\log n +2.65)$$
- 比較rrrchii同學的作法
該同學使用 indicator r.v.的方式去做,可導出相同式子(他的$H_n$沒有包含$\gamma$項)
## rrrchii
了解 quick sort 定義並計算比較次數期望值
令一個元素集合 A = [a1,a2,...,an] a1<a2<a3...< an
X是比較次數,Xij是i跟j的比較次數
$E[X]=\sum_{i=1}^{n-1} \sum_{j=i+1}^{n} 2/(j-i+1)$
當 $j=i+1$ 時(最近的鄰居),$k=(i+1)−i+1=2$。
當 $j=n$ 且 $i=1$ 時(最遠的兩端),$k=n−1+1=n$。
$k$ 的範圍是 $2$ 到 $n$
$k=2$ 配對有 $(1,2),(2,3),…,(n−1,n)$,共 $n−1$ 個
$k=3$ 配對有 $(1,3),(2,4),…,(n−2,n)$,共 $n−2$ 個 依此類推,對於任意的 $k$,總共有 $(n−k+1)$ 個配對。
$E[X]=\sum_{k=2}^{n}(n−k+1)(2/k)$
整理成 $E[X]=2(n+1) \sum_{k=2}^{n}(1/k)− \sum_{k=2}^{n}2$
在n很大得情況下,$\sum_{k=1}^{n}(1/k) ≈ ln(n)$
所以$\sum_{k=2}^{n}(1/k) ≈ ln (n)-1$
$E[X]≈2nln(n)+2ln(n)−2n−2−2n+2$
$n$ 夠大就把就可以忽略 $−4n 或 2ln(n)$
$E[X]≈2 n ln(n)≈1.39⋅nlog2n$
## grawis
Q: quick sort 和 merge sort 在 linked list 操作的優劣 為何不選用 quick sort
(1) quick sort 行為分析
流程
1. 選一個 pivot,並將所有元素與 pivot 比較
2. 分為左右兩邊 (一邊大一邊小)
3. 遞迴排左右兩邊
優點
在array這種連續記憶體中很快速
* 支援 random access
* swap 元素只要 $O(1)$
* partition 只要線性掃描
但這類優點在 linked list 都不存在,還會有 pivot 不好退化為 worst case $O(n^2)$ 、不穩定排序等缺點
(2) 在 linked list 中的 quick sort
* 不支援 random access ,若要取第 i 個元素需要從 head 一路走,時間複雜度為 $O(i)$
* swap 操作複雜。找到兩個 node -> 找到兩個 node 的前一個節點 -> 修改 next 接好串列。若兩個 node 相鄰還有 special case。
* partition 需分成三條 list 再重複串接回去
(3) 在 linked list 中的 merge sort
相比之下,merge sort 對 linked list 更自然,因為 merge 過程只需順序掃描並調整指標,不需要 swap,且最好、平均、最壞情況皆為 $O(nlogn)$ ,也能做成 bottom-up 非遞迴版本以減少 kernel stack 使用。
## Charlie-Tsai1123
- Q: C99 int 長度?
根據 C99 6.2.5 第 5 點
>A ‘‘plain’’ int object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range
INT_MIN to INT_MAX as defined in the header <limits.h>).
根據不同硬體架構 `int` 的大小不同,但它必須足以容納 `limit.h` 中 `INT_MIN` 到 `INT_MAX` 的值
根據 C99 5.2.4.2.1
>Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
>
>— minimum value for an object of type int
INT_MIN -32767 // −(2^15^ − 1)
— maximum value for an object of type int
INT_MAX +32767 // 2^15^ − 1
上面提到的數值大小是最小值,若要設定大小必須大於這些。
因為 `int` 至少要容納 $-(2^{15} - 1)$ ~ $2^{15} - 1$ ,所以 `int` 只少要 16 個位元,也就是 2 bytes。
- Q: C99 unsigned long 長度?
根據 C99 6.2.6.2 第 1 點
>For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^N^−1 , so that objects of that type shall be capable of representing values from 0 to 2^N^ − 1 using a pure binary representation.
所以 unsigned long 可以表達 0 ~ $2^N - 1$
根據 C99 5.2.4.2.1 可以找到 $2^N - 1$ 的最小值
>— maximum value for an object of type unsigned long int
ULONG_MAX 4294967295 // 2^32^ − 1
所以 unsigned long 在記憶體中至少佔 32 位元,也就是 4 bytes。
- Q: 那這些型別的大小關係呢?
從 [64-bit computing](https://en.wikipedia.org/wiki/64-bit_computing) 中可以看到不同 data model 各個型別所佔的 bytes。可以發現 `long long` `long int` `int` 在 ILP64 可以都使用 64 bits
根據 6.2.5 第 6 點
>For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
可以知道每個 signed integer type 會對應到一個 unsigned integer type 且儲存空間相同,所以 `unsigned long` 跟 `long int` 佔的空間相同
6.2.5 第 8 點
>For any two integer types with the same signedness and different integer conversion rank (see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type.
任何兩個符號性(signedness)相同的整數型別,具有較小轉換等級的型別,其數值範圍必定是具有較大轉換等級之型別的「子範圍」
有號整數: `signed char` $\subseteq$ `short int` $\subseteq$ `int` $\subseteq$ `long int` $\subseteq$ `long long int`
無號整數: `unsigned char` $\subseteq$ `unsigned short int` $\subseteq$ `unsigned int` $\subseteq$ `unsigned long int` $\subseteq$ `unsigned long long int`
- Q: GENMASK 做什麼?
生成 l ~ h 是 1 的 mask