Eroiko
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    1
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2023q1 Homework3 (fibdrv) contribute by < [`Shiritai`](https://github.com/Shiritai/fibdrv) > :::spoiler 開發環境 ## 開發環境 ```bash $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz CPU family: 6 Model: 158 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 10 CPU max MHz: 4100.0000 CPU min MHz: 800.0000 BogoMIPS: 4399.99 ``` ::: ## 以 VSCode 開發 Linux Kernel Module 的準備 :::success 根據 @jserv 的延伸問題,為此單元的內容進行擴充。 以下留下舊的懶人包,[點我瀏覽完整筆記](https://hackmd.io/@Eroiko/vscode-for-linux-kernel)與[客製化腳本](https://github.com/Shiritai/make2IntelliSense)。 ::: 如果純粹將 reposiroty clone 好後使用 VSCode 開啟比如 `fibdrv.c`,相信會看到貼心的 IntelliSense 的警告。 ![vscode-intellisense-error](https://i.imgur.com/xoUqDkX.png) 解決辦法整理如下。 ### 確認 kernel header 路徑 [開始寫作業前](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-b)都應按流程安裝好 linux kernel header,確認其安裝的路徑,可移動至 `/usr/src` 下查看。 ```bash $ ls /usr/src | grep linux-headers linux-headers-5.19.0-32-generic linux-headers-5.19.0-35-generic ``` 由以下指令確認當前 Linux 核心版本,得知第二個是我們感興趣的 kernel header。 ```bash uname -r 5.19.0-35-generic ``` ### 修改 C/C++ Extension 設定檔 由前車之鑑 [1](https://stackoverflow.com/questions/58386640/how-to-develop-linux-kernel-module-with-vscode-without-incorrect-error-detection) 和 [2](https://github.com/microsoft/vscode-cpptools/issues/5588) 得知 IntelliSense 需要調整 [C/C++ Extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode.cpptools) 設定檔: `c_cpp_properties.json`。 1. 準備 Linux 設定檔模板 :::success 由 @DokiDokiPB 的建議,對此份作業來說,`cStandard` 的前綴應該改成 gnu,以協助 intelliSense 解析一些結構。對此我也更新腳本法的實作,詳見 [commit df77974](https://github.com/Shiritai/make2IntelliSense/commit/df7797490d4304961a2b60f488d45b6b7f0be9a8)。 ::: ```diff { "configurations": [ { "name": "Linux", "browse": { "limitSymbolsToIncludedHeaders": true, "databaseFilename": "" }, "intelliSenseMode": "linux-gcc-x64", "compilerPath": "/usr/bin/gcc", - "cStandard": "c99", + "cStandard": "gnu99", "cppStandard": "gnu++17" } ], "version": 4 } ``` 2. 加入 `includePath` 設定 ```json ... "name": "Linux", "includePath": [ "${workspaceFolder}/**", "/usr/include", "/usr/local/include", "/usr/src/LINUX_KERNEL_HEADER/include", "/usr/src/LINUX_KERNEL_HEADER/arch/x86/include", "/usr/src/LINUX_KERNEL_HEADER/arch/x86/include/generated", "/usr/lib/gcc/x86_64-linux-gnu/GCC_VERSION/include" ], "browse": { ... ``` 要調整兩點。 * `LINUX_KERNEL_HEADER` 換成自己的 kernel header 路徑名。 * `GCC_VERSION` 換成自己的 GCC 版本。 基本上 `includePath` 中前五項都是開發 Kernel Module 的常客,當然第七項編譯器本身也是。第六項本次實作會用到,不過未來開發其他模組的話可能會引入其他的 header。 3. 調整 `defines` 使 `MODULE_*` 等巨集被正確的識別 `defines` 是為了解決條件編譯的問題,當中加入的字串會使 IntelliSense 在 indexing 時看見需指定條件編譯後才會編譯到的程式碼。 ```json ... "compilerPath": "/usr/bin/gcc", "defines": [ "__GNUC__", "__KERNEL__", "MODULE" ], "cStandard": "c99", ... 到此便完成設定,IntelliSense 復活了 :) ![vscode-intellisense-good](https://i.imgur.com/9B5yt8R.png) ## 費氏數列的計算方法 > Reference > * [L04: fibdrv](https://hackmd.io/@sysprog/linux2023-fibdrv-a#-%E8%B2%BB%E6%B0%8F%E6%95%B8%E5%88%97) > * [費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number) > * [費氏數怎麼算](https://scm.iis.sinica.edu.tw/ncs/2019/06/how-to-compute-fibonacci/) ### 經典款 * 樹狀遞迴版 $$ fib :: \mathbb{R} \rightarrow \mathbb{R} \\ fib(0) = 0 \\ ‍‍‍‍‍‍fib(1) = 1 \\ ‍‍‍‍‍‍fib(n+2) = fib(n+1) + fib(n) $$ * 線性遞迴版 $$ fib2 :: \mathbb{R} \rightarrow (\mathbb{R}, \mathbb{R}) \\ fib2(0) = (0, 1) \\ ‍‍‍‍‍‍fib2(n+1) = (y, x + y) \\ where\ (x, y) = fib2(n) $$ ### 公式解 > Binet’s Formula $$ \dfrac{1}{\sqrt{5}} \Big( \big( \dfrac{1 + \sqrt{5}}{2} \big)^n - \big( \dfrac{1 - \sqrt{5}}{2} \big)^n \Big) $$ 當中指數與根號運算造成的計算誤差和複雜度都不容小覷。[費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number#%E8%A7%A3%E6%B1%BA%E9%81%8B%E7%AE%97%E7%B2%BE%E6%BA%96%E5%BA%A6%E7%9A%84%E5%95%8F%E9%A1%8C)一文使用 GNU 的 `GMP`, `MPFFR` 這倆透過盡情使用記憶體資源計算遞迴版 (下述分解法)和本公式解的運算時間,得到前者大勝後者的結果。 ### 分解法與 Fast Doubling > 最早可追溯至 [Nikolai. N. Vorobev 提出的方法](https://www.amazon.com/Fibonacci-Numbers-Dover-Books-Mathematics/dp/048648386X)。 由費氏數列的遞迴定義,也許你會猜測是否能尋找費氏數列中各項的關聯,分解法便是對此疑問的解答。令費氏數第 $n$ 項為 $f_n$,則 $$ f_{n+k} = f_{n-1} \cdot f_k + f_n \cdot f_{k+1} $$ 上式對 $n \ge 1, k \ge 0 \ \land n, k \in \mathbb{R}$ 成立。 :::success **證明** 如果根據費氏數列定義拆解此式,可以得到如回音般的結果。 $$ f_{n+k} = f_{n-1} \cdot f_k + f_n \cdot f_{k+1} \\ = f_{n-1} \cdot f_k + f_{n} \cdot \big( f_{k} + f_{k-1} \big) \\ = \big( f_{n-1} + f_{n} \big) \cdot f_{k} + f_n + f_{k-1} \\ = f_{n+1} \cdot f_k + f_n \cdot f_{k-1} $$ 這樣的結果與原本分解式的展開只差在 $n$ 和 $k$ 的互換。並不能給我們額外的結論,也許我們應該另尋其道。通常遇到這種難以直接證明的,我們可以用終極絕招數學歸納法: * Base case: 帶入即正確。 > 好像沒有寫的必要 :) $$ f_{1} = f_{1+0} = f_{0} \cdot f_{0} + f_{1} \cdot f_{1} = f{1} = 1 $$ * Inductive case 假設分解式對 $t \le n$ 成立,則當 $t = n + 1$ 時,利用費氏數列的定義和 $t \le n$ 的情況,可得 $$ f_{(n+1)+k} \stackrel{\rm{def\ of\ fib}}{=} f_{n+k} + f_{(n-1)+k} \\ \stackrel{\rm{eq\ when\ }t \le n}{=} f_{n-1} \cdot f_k + f_n \cdot f_{k+1} + f_{n-2} \cdot f_k + f_{n-1} \cdot f_{k+1} \\ = \big( f_{n-1} + f_{n-2} \big) \cdot f_k + \big( f_{n} + f_{n-1} \big) \cdot f_{k+1} \\ \stackrel{\rm{def\ of\ fib}}{=} f_{n} \cdot f_k + f_{n+1} \cdot f_{k+1} $$ 由數學歸納法得證。 ::: 以分解法為基礎,今若要求 $f_{2n}$,有 $$ f_{2n} = f_{n+n} = f_{n-1} \cdot f_{n} + f_{n} \cdot f_{n+1} \\ = f_{n-1} \cdot f_{n} + f_{n} \cdot (f_{n} + f_{n-1}) \\ = f_{n-1} \cdot 2 f_{n} + f_{n} \cdot f_{n} = f_n \cdot (2f_{n-1} + f_{n}) $$ 可以發現 $f_{2n}$ 需要 $f_{n-1}, f_{n}$。那麼 $f_{2n-1}$ 呢? $$ f_{2n-1} = f_{n+(n-1)} = f_{n-1} \cdot f_{n-1} + f_{n} \cdot f_{n} = f_{n-1}^2 + f_n^2 $$ 也需要 $f_{n-1}, f_{n}$,真好。 不過其實有其他做法,同樣考慮 $f_{2n}$,有 $$ f_{2n} \stackrel{\rm{prev\ result}}{=} f_n \cdot (2f_{n-1} + f_{n}) \\ = f_n \cdot (2f_{n+1} - f_{n}) $$ 由於我們透過合併改關注 $n+1$ 項,故改成考慮 $2n+1$ 項 $f_{2n+1}$,有 $$ f_{2n+1} = f_{(n+1)+n} = f_{n} \cdot f_{n} + f_{n+1} \cdot f_{n+1} = f_{n}^2 + f_{n+1}^2 $$ 後者的結論是 $f_{2n}$ 和 $f_{2n+1}$ 都需要 $f_n$ 和 $f_{n+1}$,是[原文](https://hackmd.io/@sysprog/fibonacci-number#Vorobev-%E7%AD%89%E5%BC%8F)中的方法,兩者的結果都很漂亮,畢竟是等價的東西。 以前者 ($f_{2n}$, $f_{2n-1}$ 需 $f_{n}, f_{n-1}$) 為例,取得二的冪就不是問題,假設我們想知道 $f_{16}$,可以藉由以下順序求得: ```graphviz digraph twos_power { rankdir=LR; node[shape=record]; edge[arrowsize=0.5] 12[label="<b> f2|<a> f1"]; 34[label="<b> f4|<a> f3"]; 78[label="<b> f8|<a> f7"]; 16[label="f16"]; 12:a -> 34:a -> 78:a -> 16; 12:b -> 34:b -> 78:b -> 16; 12:a -> 34:b -> 78:a; 12:b -> 34:a -> 78:b; } ``` 繼續以前者為例,那如果是任意數呢?假設要計算 $f_{21} = f_{2n-1}$,我們需要 $f_n = f_{11}$ 和 $f_{n-1} = f_{10}$,為此我們還需要 $f_6, f_5$,同樣我們又需要 $f_3, f_2$, 最後回到 $f_2, f_1$ 便皆大歡喜。看來無論哪個正整數,都可以使用此演算法。下圖圖像化這段流程,與上圖不同在於稍作一點簡化。 ```graphviz digraph flow_21 { rankdir=LR; node[shape=square]; 1[label="f2 \n f1"]; 2[label="f3 \n f2"]; 3[label="f6 \n f5"]; 4[label="f11 \n f10"]; 5[label="f22 \n f21"]; 1 -> 2 -> 3 -> 4 -> 5; } ``` 另外前述「前者」與「後者」的差異可透過上圖描述。前者是以數字大者 (比如 $[22, 21]$ 中的 $22$) 為主視角,後者則是以數字小者 ($[22, 21]$ 中的 $21$) 為主視角,兩者實際上會畫出一樣的關係圖。 由於實作上以後者視角比較好描述,今後將採用該視角,與[作業說明](https://hackmd.io/@sysprog/linux2023-fibdrv-a#-%E8%B2%BB%E6%B0%8F%E6%95%B8%E5%88%97)同步。 ### Fast Doubling 的程式化 以 $f_{21}$ 為例,當初我們以 top-down 的角度分析,實作時勢必需要遞迴,如果不想要遞迴呢?我們要如何知道接下來是單純從 $(5, 6)$ 爬兩倍到 $(10, 11)$,還是額外加一,從 $(2, 3)$ 到 $(5, 6)$? ```graphviz digraph flow_21 { rankdir=LR; node[shape=square]; subgraph 1 { 3[label="f6 \n f5"]; 4[label="f11 \n f10"]; 3 -> 4; } subgraph 2 { 1[label="f3 \n f2"]; 2[label="f6 \n f5"]; 1 -> 2; } } ``` 把之前的流程圖加上是否「加一」的標記:令以 $1$ 表加一,以 $0$ 表不加,同時將 $f_0, f_1$ 這個 trivial 項補回來... ```graphviz digraph flow_21 { rankdir=LR; node[shape=square]; 0[label="f1 \n f0"]; 1[label="f2 \n f1"]; 2[label="f3 \n f2"]; 3[label="f6 \n f5"]; 4[label="f11 \n f10"]; 5[label="f22 \n f21"]; 0 -> 1 [label="1"]; 1 -> 2 [label="0"]; 2 -> 3 [label="1"]; 3 -> 4 [label="0"]; 4 -> 5 [label="1"]; } ``` 敏銳而有看作業說明的你想必已經觀察到了,標記的 $10101_{2}$ 不正是二進制版的 $21_{10}$ 嗎?這顯然不是巧合,因為前述問題實際上等價於: :::info 給定兩種操作:乘二、加一。 如何從 $0$ 開始用最少計算次數抵達某正整數 $n$? ::: 上述是在任意進位制的角度下問的問題,換成熟悉二進制與位元運算的你所熟悉的語言就是: :::success 給定兩種操作:左移一位、加一。 如何從 $0$ 開始用最少計算次數產生某二進制數 $n$? ::: 這一問等於白問,因為答案即藏在 $n$ 的二進制編碼 XD 希望這能幫助你理解[作業說明中虛擬碼](https://hackmd.io/@sysprog/linux2023-fibdrv-a#-%E8%B2%BB%E6%B0%8F%E6%95%B8%E5%88%97)和[硬體加速手法](https://hackmd.io/@sysprog/linux2023-fibdrv-a#%E8%80%83%E6%85%AE%E5%88%B0%E7%A1%AC%E9%AB%94%E5%8A%A0%E9%80%9F-Fn-%E7%9A%84%E6%89%8B%E6%B3%95)的原理。 ### Fast doubling 實作 不考慮大數運算的情況下,針對 `long long`,為了實現 Fast doubling,首先須取得目標費波那契數的二進制編碼,接著由高至低走訪這些編碼完成乘二和加一的邏輯。 取得一個數字精確的二進制編碼,有以下兩個針對 `long long` 的 gcc 內建函式: 1. `__builtin_clzll`: 其在 intel x86_64 架構下可能使用硬體指令 bit search reverse 找尋首個 `1` bit,接著以 63 減去而取得領導的 `0` bit 數量。 2. `__builtin_ctzll` 同理在 intel x86_64 可能使用 bit search forward,最終取得末尾的 `0` bit 數量。 前者的使用可以加速略過領導之 `0` bits 的過程,後者則由於 fast doubling 的「加一與否」牽扯到 branching,而且很難直接避免而轉換成 branchless,不過若所有位元皆為零就不需顧慮 branch 操作,故可以透過後者完成末尾 `0` bits 的 branchless 乘二操作,同時降低 branching 時的比較開銷。 具體實作解釋如下。 1. 取得領導、末尾 `0` bits 數量,並調整 $f(n)$ 的 $n$,將領導的 `1` bit 上調至最高位元。 :::success 注意 `c{l,t}z_long_long` 為[自己實作的函式](#Count-leading-zeros-%E5%AF%A6%E4%BD%9C),以巨集控制編譯時要採用何者。 ::: ```c= long long fn = 0ll, // f(n) fnp1 = 1ll; // f(n+1) n plus 1 #ifdef MY_CZ const int lz = clz_long_long(n); const int tz = ctz_long_long(n); #else const int lz = __builtin_clzll(n); const int tz = __builtin_ctzll(n); #endif /* MY_CZ */ n <<= lz; // so now there is no leading zeros ``` 2. Branching 部分: 根據當前最高位元,處理「乘二」和「是否加一」,注意到「乘二」的邏輯可使用 bitwise left shift 完成,且 branching 的條件只需考慮是否還有 `1` bit 的存在,即 $n$ 是否為零。 ```c= while (n) { // exists "1" bit long long f2n = fn * ((fnp1 << 1) - fn), f2np1 = fn * fn + fnp1 * fnp1; if (n & 0x8000000000000000ll) { fn = f2np1, fnp1 = f2np1 + f2n; } else { fn = f2n, fnp1 = f2np1; } n <<= 1; } ``` 3. Branchless 部分: 完成末尾 `0` bits 所代表之「乘二」邏輯。 ```c= // speed up for tailing zeros: branchless for (int i = 0; i < tz; ++i) { long long f2n = fn * ((fnp1 << 1) - fn), f2np1 = fn * fn + fnp1 * fnp1; fn = f2n, fnp1 = f2np1; } return fn; ``` ### 與作業文件的比較 > 迴圈內 if...else... 的部份可用 `-!!(target & (1 << i))` 作為 mask 的技巧簡化成: > > ```c= > static inline uint64_t fast_doubling_iter(uint64_t target) > { > if (target <= 2) > return !!target; > > // find first 1 > uint8_t count = 63 - __builtin_clzll(target); > uint64_t fib_n0 = 1, fib_n1 = 1; > > for (uint64_t i = count, fib_2n0, fib_2n1, mask; i-- > 0;) { > fib_2n0 = fib_n0 * ((fib_n1 << 1) - fib_n0); > fib_2n1 = fib_n0 * fib_n0 + fib_n1 * fib_n1; > > mask = -!!(target & (1UL << i)); > fib_n0 = (fib_2n0 & ~mask) + (fib_2n1 & mask); > fib_n1 = (fib_2n0 & mask) + fib_2n1; > } > return fib_n0; > } > ``` `(target & (1 << i))` 的意思為檢查某位元是否為 `1`,前面加上 `!!` 表取得真或假 (0 ro 1),加負號表取得 `0xFFFFFFFFFFFFFFFF`。 :::spoiler 順帶一提,類似的技巧也出現在無分支取絕對值中。 ```c= int absWithoutBranching(int number) { // 對 number > 0 來說,mask = 0x00000000 (0) // 對負數來說, mask = 0xffffffff (-1) const int mask = number >> 31; // 二補數的「負數取正」為「減一後位元反轉」 // 對負數來說 // 加上 mask 相當於 -1,相當於「減一」 // 對 -1 = 0xffffffff 取 XOR 相當於「位元反轉」得到負數取正 // 對正數來說 // 加上 mask = 0 等於沒做事 // 對 0 取 XOR 相當於沒做事,保持原數 return (mask + number) ^ mask; } ``` ::: 最後根據此 mask 過濾是否採用 `fib_2n0`, `fib_2n1` 與否,十分精妙。 最後我採納此技巧,將原本的 branching 調整成 branchless,不過還是保留第三階段的 branchless,以達到效能的最大提升。以下展示原 branching 調整的部分。 ```diff while (n) { // exists "1" bit long long f2n = fn * ((fnp1 << 1) - fn), f2np1 = fn * fn + fnp1 * fnp1; + long long is_one = -!!(n & 0x8000000000000000ll); + fn = (f2np1 & is_one) + (f2n & ~is_one); + fnp1 = f2np1 + (f2n & is_one); - if (n & 0x8000000000000000ll) { - fn = f2np1, fnp1 = f2np1 + f2n; - } else { - fn = f2n, fnp1 = f2np1; - } n <<= 1; } ``` ## Count leading zeros 實作 雖然 `__builtin_clzll` 和 `__builtin_ctzll` 好用且高效,但本方法的使用受限於 gcc 與特定型別 (支援 `unsigned int`, `unsigned long` 和 `unsigned long long`);且參考 [GCC 官方文件](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html),當輸入為 `0` 時屬於 undefined behavior,可見略有不足之處。於是我受 bitwise 操作教學文件的啟發,設計巨集來生成 `clz` 和 `ctz` 函式,支援所有長度在 64 以內的型別。 原本教學文件中的程式碼如下。 ```c= int clz (int x){ if (x == 0) return 32; int n = 0; if ((x & 0xFFFF0000) == 0) {n += 16; x <<= 16;} if ((x & 0xFF000000) == 0) {n += 8; x <<= 8;} if ((x & 0xF0000000) == 0) {n += 4; x <<= 4;} if ((x & 0xC0000000) == 0) {n += 2; x <<= 2;} if ((x & 0x80000000) == 0) n += 1; return n; } ``` 要改成 `long long` 很簡單,簡單調整一下即可,如下。 ```diff int clz(long long x){ if (x == 0) return 64; int n = 0; + if ((x & 0xFFFFFFFF00000000) == 0) {n += 32; x <<= 32;} if ((x & 0xFFFF000000000000) == 0) {n += 16; x <<= 16;} if ((x & 0xFF00000000000000) == 0) {n += 8; x <<= 8;} if ((x & 0xF000000000000000) == 0) {n += 4; x <<= 4;} if ((x & 0xC000000000000000) == 0) {n += 2; x <<= 2;} if ((x & 0x8000000000000000) == 0) n += 1; return n; } ``` 計算時間一樣是確定的 (deterministic),不過實在有失優雅。主要有三點: * `== 0` 邏輯可以簡化 * 近乎相同的位移、加法邏輯重複寫好幾遍,即 bad smell * 針對每個不同的型別都要重寫,明明內容都差不多,這又是 bad smell 第一點即採用 `!(x & __)`。第二點的做法可能根據考量點而有所不同,比如[此文提供一個很有趣的想法](https://stackoverflow.com/questions/45221914/number-of-trailing-zeroes),考量最後一步的效能,他利用 bitwise 操作將分支與加法合併: ```diff - if ((x & 0x8000000000000000) == 0) n += 1; + n += (x & 1) ^ 1; ``` 若考慮消去 bad smell 的話,採用迴圈可能是更好的做法。雖然效能上可能會稍微降低,不過程式碼會變的簡潔易讀,且容易擴展。搭配前置處理器技巧,想擴展至更多型別便不是問題。 首先將函式調整成迴圈版。 ```c= int clz(long long x){ if (x == 0) return 64; int n = 0; long long mask = 0xFFFFFFFF00000000; int shift = 32; while (shift) { if (!(x & mask)) { n += shift; x <<= shift; } shift >>= 1; mask <<= shift; } return n; } ``` 可以發現,透過改成迴圈版,我們將與 `long long` 型別有關的變因獨立在迴圈之前,那是不是調整這些變因就可以適用於其他型別呢?沒錯,並且搭配前置處理器,程式碼的前段變成: ```c= #define __CZ_MAX_ONES 0xFFFFFFFFFFFFFFFF #define __clz(type, fn_postfix, bit_len, ones) \ static int clz_##fn_postfix(type x) \ { \ if (!x) \ return bit_len; \ int bits = 0; \ int shift = bit_len >> 1; \ type mask = (type) (ones << shift); \ /* ... */ /** * Macro to generate "count leading zeros" * for type which length is less than int64_t * Named: clz_"fn_postfix" */ #define clz(type, fn_postfix) \ __clz(type, fn_postfix, (sizeof(type) * 8), __CZ_MAX_ONES) ``` 如此一來,我們可以輕鬆產生長度在 64 bits 內的所有 `clz`,產生方法如下: ```c= #include <stdint.h> clz(long long, long_long); // clz_long_long ctz(long long, long_long); // clz_long_long clz(int, int); // clz_int clz(short, short); // clz_short clz(char, char); // clz_char clz(int8_t, int8); // clz_int8 clz(int16_t, int16); // clz_int16 clz(int32_t, int32); // clz_int32 clz(int64_t, int64); // clz_int64 clz(uint8_t, uint8); // clz_uint8 clz(uint16_t, uint16); // clz_uint16 clz(uint32_t, uint32); // clz_uint32 clz(uint64_t, uint64); // clz_uint64 ``` 上百行的函式就這麼樸實無華的產生了。`ctz` 同理,只要進行如下的調整... :::info 注意到第一個差異使用 `~` 而非反方向位移是因為右移與型別 (`type`) 的符號有關,對於有號數,全一再怎麼 (算數) 右移都是全一。 ::: ```diff #define __ctz(type, fn_postfix, bit_len, ones) \ static int ctz_##fn_postfix(type x) \ { \ if (!x) \ return bit_len; \ int bits = 0; \ int shift = bit_len >> 1; \ - type mask = (type) ones << shift; \ + type mask = (type) ~(ones << shift); \ while (shift) { \ if (!(x & mask)) { \ bits += shift; \ - x <<= shift; \ + x >>= shift; \ } \ shift >>= 1; \ - mask <<= shift; \ + mask >>= shift; \ } \ return bits; \ } ``` 如此,我們又可以使出四兩撥千斤之術生成 `ctz_XXX` 函式們,一勞永逸 :) :::info 測試的部分,可利用與 `__builtin_c{l,t}z` 來比較輸出,在此省略細節。不過我也是因此注意到 `__builtin_c{l,t}z` 的 undefined behavior。 ::: ## 大數運算的實作 > 恩,我不知道自己多久以後才會完善測試與筆記,還是直接放連結吧。 > 至於 Repo 中怎麼沒什麼東西...不好意思等我一...下 QwQ > [name=Shiritai] * 筆記:[手刻一套 C 語言大數運算之物件導向函式庫](https://hackmd.io/@Eroiko/impl-big-num)。 * Reposiroty: [BigNum](https://github.com/Shiritai/BigNum) Repo 的部分... ![](https://i.imgur.com/dFCvJPR.jpg) ## 導讀紀錄 循著作業說明操作的紀錄。 ### 環境準備 ```bash ls -l /dev/fibonacci crw------- 1 root root 510, 0 三 8 23:51 /dev/fibonacci ``` 可以看見 `510, 0` 兩數,分別為 device major/minor number。 :::info 看完作業說明影片後發現自己讀+實作/實驗太少了,先回去讀... :::

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully