tinyynoob
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 2022q1 Homework3 (fibdrv) contributed by < [`tinyynoob`](https://github.com/tinyynoob) > > [作業要求](https://hackmd.io/@sysprog/linux2022-fibdrv) > 最近的方向可以跳到 [June rework](https://hackmd.io/MJ1kMQBLRBCuo08yEmhDeQ?view#June-Rework) 開始看 > 因為一篇筆記放不下了,再開一篇:[fibdrv 2](https://hackmd.io/@tinyynoob/linux2022q1-fibdrv2) ## 研讀資料 ### Fibonacci 相關性質 > https://hackmd.io/@sysprog/fibonacci-number > [Algorithms for Computing Fibonacci Numbers Quickly](https://ir.library.oregonstate.edu/downloads/t435gg51w) [Fast doubling 算法](https://chunminchang.github.io/blog/post/calculating-fibonacci-numbers-by-fast-doubling) $$\begin{aligned} F_{2n+1} &= F_{n+1}^2 + F_n^2 \\ F_{2n} &= F_n\cdot(2F_{n+1}-F_n) \end{aligned} $$ 作業說明中有給出 pseudocode : ```c Fast_Fib(n) a = 0; b = 1; // m = 0 for i = (number of binary digit in n) to 1 t1 = a*(2*b - a); t2 = b^2 + a^2; a = t1; b = t2; // m *= 2 if (current binary digit == 1) t1 = a + b; // m++ a = b; b = t1; return a; ``` 注意到這個 `^` 是指 power 而非 xor 。 目前看不出 `clz()` 能對計算 Fib 做到什麼強力優化,似乎只能加快 initialize `i` 而已,需要再研究研究。 :::info 待 ::: ### Follow 《The Linux Kernel Module Programming Guide》 > 將自己研讀的摘要整理到另一篇筆記: https://hackmd.io/@tinyynoob/Hk89qPw-9 本地版本: ```shell $ apt-cache search linux-headers-`uname -r` linux-headers-5.13.0-30-generic - Linux kernel headers for version 5.13.0 on 64 bit x86 SMP ``` 根據書中範例嘗試建立 **hello-1.c** 及對應的 **makefile** 。 make 時馬上出現問題: :::spoiler log ```shell $ make -p | grep PWD scripts/Makefile.build:44: /home/tinyynoob/code/kernel/hello-1/Makefile: 沒有此一檔案或目錄 make[2]: *** 沒有規則可製作目標「/home/tinyynoob/code/kernel/hello-1/Makefile」。 停止。 OLDPWD = /home/tinyynoob/code/kernel PWD = /home/tinyynoob/code/kernel/hello-1 CONFIG_HPWDT_NMI_DECODING = y make[1]: *** [Makefile:1879:/home/tinyynoob/code/kernel/hello-1] 錯誤 2 OLDPWD = /home/tinyynoob/code/kernel PWD = /home/tinyynoob/code/kernel/hello-1 CONFIG_HPWDT_NMI_DECODING = y @echo ' Syntax: make -C path/to/kernel/src M=$$PWD target' make: *** [makefile:7:all] 錯誤 2 PWD := /home/tinyynoob/code/kernel/hello-1 OLDPWD = /home/tinyynoob/code/kernel echo $((PWD) make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean ``` ::: \ 發現原來檔名把 `Makefile` 寫成 `makefile` 也不行,更改後得到: :::spoiler log ```shell $ make make -C /lib/modules/5.13.0-30-generic/build M=/home/tinyynoob/code/kernel/hello-1 modules make[1]: 進入目錄「/usr/src/linux-headers-5.13.0-30-generic」 CC [M] /home/tinyynoob/code/kernel/hello-1/hello-1.o MODPOST /home/tinyynoob/code/kernel/hello-1/Module.symvers CC [M] /home/tinyynoob/code/kernel/hello-1/hello-1.mod.o LD [M] /home/tinyynoob/code/kernel/hello-1/hello-1.ko BTF [M] /home/tinyynoob/code/kernel/hello-1/hello-1.ko Skipping BTF generation for /home/tinyynoob/code/kernel/hello-1/hello-1.ko due to unavailability of vmlinux make[1]: 離開目錄「/usr/src/linux-headers-5.13.0-30-generic」 ``` ::: \ 看似往正確的方向前進了,也有產出 **hello-1.ko** ,但沒有完全成功,它在 `vmlinux` 遇到問題 :::info `vmlinux` 問題尚未解決 ::: 根據 `modinfo` 可以發現在 **hello-1.ko** 的 vermagic 欄位有 `modversions` 。回去閱讀書中提到 Modversioning 的部份,得知我可能需要關掉它。 參考[此](https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1863245),試著進到 /boot/config-5.13.0-30-generic 關掉 `CONFIG_MODVERSIONS` 。 :::info 關掉 modversions 但尚未 recompile kernel ::: 把 **hello-1.c** 裡的 return 值改成負數會導致: ```shell insmod: ERROR: could not insert module hello-1.ko: Operation not permitted ``` ### 研讀數字字串運算範例程式 > [source code](https://github.com/AdrianHuang/fibdrv/tree/big_number) 數字字串在運算時可以因應字串 size 選擇不同的策略。 範例程式使用的字串使用 union 定義如下: ```c= typedef union { /* allow strings up to 15 bytes to stay on the stack * use the last byte as a null terminator and to store flags * much like fbstring: * https://github.com/facebook/folly/blob/master/folly/docs/FBString.md */ char data[16]; struct { uint8_t filler[15], /* how many free bytes in this stack allocated string * same idea as fbstring */ space_left : 4, /* if it is on heap, set to 1 */ is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1; }; /* heap allocated */ struct { char *ptr; /* supports strings up to 2^MAX_STR_LEN_BITS - 1 bytes */ size_t size : MAX_STR_LEN_BITS, /* capacity is always a power of 2 (unsigned)-1 */ capacity : 6; /* the last 4 bits are important flags */ }; } xs; ``` - 在短碼的情況會使用第 9 行的 struct ,直接將資料字串存在 `filler` 並把一些 flag 資訊放在最後一個 byte - 在長碼的情況會使用第 20 行的 struct ,使用一個 `ptr` 指標並配置空間在 heap ,後面的 8 byte 拿來存放一些 flag :::warning 還沒完全理解在長碼情況下, 1. 如何繼續使用 `is_ptr` flag 2. `capacity` 使用的詳細狀況 ::: 範例程式在 `fib_sequence()` 計算 Fib ,算法使用逐項相加: ```c for (i = 2; i <= k; i++) string_number_add(&f[i - 1], &f[i - 2], &f[i]); ``` 因此若是我們能改善乘法及平方的複雜度,則可以使用 fast doubling 的技巧來改善目前算法。 ### Linux 效能分析 嘗試查看 CPU affinity : ```shell $ taskset -p 8528 pid 8528 目前的近似者遮罩:ff ``` ff 的二進位為 11111111 ,代表這個 process 可以在 7~0 任意一個 CPU 上執行。試著更改: ```shell $ taskset -p FA 8528 pid 8528 目前的近似者遮罩:ff pid 8528 新的近似者遮罩:fa ``` 根據[網路上的教學](https://charleslin74.pixnet.net/blog/post/405173965),在 */boot/grub/grub.cfg* 中加入 `isolcpus=0,1` 。 最後,完成 [排除干擾效能分析的因素](https://hackmd.io/PMozqaxqRA-jF2KmgTZcPQ?view#%E6%8E%92%E9%99%A4%E5%B9%B2%E6%93%BE%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E5%9B%A0%E7%B4%A0) 中的所有設定。 reboot 之後查看 process 發現 core mask 變成了 fc : ```shell $ taskset -p 1 pid 1 目前的近似者遮罩:fc ``` 雖然有些已經變成 fc 但有些仍顯示 ff ,因此重新依據[參考共筆](https://hackmd.io/@KYWeng/rkGdultSU)進行設定。結果仍失敗,目前的在設定 smp_affinity 上因腳本錯誤遇到困難。 --- Fork 嘗試掛載模組之後輸入 `sudo ./client` 可以得到 :::spoiler 結果 ```shell Reading from /dev/fibonacci at offset 0, returned the sequence 0. Reading from /dev/fibonacci at offset 1, returned the sequence 1. Reading from /dev/fibonacci at offset 2, returned the sequence 1. Reading from /dev/fibonacci at offset 3, returned the sequence 2. Reading from /dev/fibonacci at offset 4, returned the sequence 3. Reading from /dev/fibonacci at offset 5, returned the sequence 5. Reading from /dev/fibonacci at offset 6, returned the sequence 8. Reading from /dev/fibonacci at offset 7, returned the sequence 13. Reading from /dev/fibonacci at offset 8, returned the sequence 21. Reading from /dev/fibonacci at offset 9, returned the sequence 34. Reading from /dev/fibonacci at offset 10, returned the sequence 55. Reading from /dev/fibonacci at offset 11, returned the sequence 89. Reading from /dev/fibonacci at offset 12, returned the sequence 144. Reading from /dev/fibonacci at offset 13, returned the sequence 233. Reading from /dev/fibonacci at offset 14, returned the sequence 377. Reading from /dev/fibonacci at offset 15, returned the sequence 610. Reading from /dev/fibonacci at offset 16, returned the sequence 987. Reading from /dev/fibonacci at offset 17, returned the sequence 1597. Reading from /dev/fibonacci at offset 18, returned the sequence 2584. Reading from /dev/fibonacci at offset 19, returned the sequence 4181. Reading from /dev/fibonacci at offset 20, returned the sequence 6765. Reading from /dev/fibonacci at offset 21, returned the sequence 10946. Reading from /dev/fibonacci at offset 22, returned the sequence 17711. Reading from /dev/fibonacci at offset 23, returned the sequence 28657. Reading from /dev/fibonacci at offset 24, returned the sequence 46368. Reading from /dev/fibonacci at offset 25, returned the sequence 75025. Reading from /dev/fibonacci at offset 26, returned the sequence 121393. Reading from /dev/fibonacci at offset 27, returned the sequence 196418. Reading from /dev/fibonacci at offset 28, returned the sequence 317811. Reading from /dev/fibonacci at offset 29, returned the sequence 514229. Reading from /dev/fibonacci at offset 30, returned the sequence 832040. Reading from /dev/fibonacci at offset 31, returned the sequence 1346269. Reading from /dev/fibonacci at offset 32, returned the sequence 2178309. Reading from /dev/fibonacci at offset 33, returned the sequence 3524578. Reading from /dev/fibonacci at offset 34, returned the sequence 5702887. Reading from /dev/fibonacci at offset 35, returned the sequence 9227465. Reading from /dev/fibonacci at offset 36, returned the sequence 14930352. Reading from /dev/fibonacci at offset 37, returned the sequence 24157817. Reading from /dev/fibonacci at offset 38, returned the sequence 39088169. Reading from /dev/fibonacci at offset 39, returned the sequence 63245986. Reading from /dev/fibonacci at offset 40, returned the sequence 102334155. Reading from /dev/fibonacci at offset 41, returned the sequence 165580141. Reading from /dev/fibonacci at offset 42, returned the sequence 267914296. Reading from /dev/fibonacci at offset 43, returned the sequence 433494437. Reading from /dev/fibonacci at offset 44, returned the sequence 701408733. Reading from /dev/fibonacci at offset 45, returned the sequence 1134903170. Reading from /dev/fibonacci at offset 46, returned the sequence 1836311903. Reading from /dev/fibonacci at offset 47, returned the sequence 2971215073. Reading from /dev/fibonacci at offset 48, returned the sequence 4807526976. Reading from /dev/fibonacci at offset 49, returned the sequence 7778742049. Reading from /dev/fibonacci at offset 50, returned the sequence 12586269025. Reading from /dev/fibonacci at offset 51, returned the sequence 20365011074. Reading from /dev/fibonacci at offset 52, returned the sequence 32951280099. Reading from /dev/fibonacci at offset 53, returned the sequence 53316291173. Reading from /dev/fibonacci at offset 54, returned the sequence 86267571272. Reading from /dev/fibonacci at offset 55, returned the sequence 139583862445. Reading from /dev/fibonacci at offset 56, returned the sequence 225851433717. Reading from /dev/fibonacci at offset 57, returned the sequence 365435296162. Reading from /dev/fibonacci at offset 58, returned the sequence 591286729879. Reading from /dev/fibonacci at offset 59, returned the sequence 956722026041. Reading from /dev/fibonacci at offset 60, returned the sequence 1548008755920. Reading from /dev/fibonacci at offset 61, returned the sequence 2504730781961. Reading from /dev/fibonacci at offset 62, returned the sequence 4052739537881. Reading from /dev/fibonacci at offset 63, returned the sequence 6557470319842. Reading from /dev/fibonacci at offset 64, returned the sequence 10610209857723. Reading from /dev/fibonacci at offset 65, returned the sequence 17167680177565. Reading from /dev/fibonacci at offset 66, returned the sequence 27777890035288. Reading from /dev/fibonacci at offset 67, returned the sequence 44945570212853. Reading from /dev/fibonacci at offset 68, returned the sequence 72723460248141. Reading from /dev/fibonacci at offset 69, returned the sequence 117669030460994. Reading from /dev/fibonacci at offset 70, returned the sequence 190392490709135. Reading from /dev/fibonacci at offset 71, returned the sequence 308061521170129. Reading from /dev/fibonacci at offset 72, returned the sequence 498454011879264. Reading from /dev/fibonacci at offset 73, returned the sequence 806515533049393. Reading from /dev/fibonacci at offset 74, returned the sequence 1304969544928657. Reading from /dev/fibonacci at offset 75, returned the sequence 2111485077978050. Reading from /dev/fibonacci at offset 76, returned the sequence 3416454622906707. Reading from /dev/fibonacci at offset 77, returned the sequence 5527939700884757. Reading from /dev/fibonacci at offset 78, returned the sequence 8944394323791464. Reading from /dev/fibonacci at offset 79, returned the sequence 14472334024676221. Reading from /dev/fibonacci at offset 80, returned the sequence 23416728348467685. Reading from /dev/fibonacci at offset 81, returned the sequence 37889062373143906. Reading from /dev/fibonacci at offset 82, returned the sequence 61305790721611591. Reading from /dev/fibonacci at offset 83, returned the sequence 99194853094755497. Reading from /dev/fibonacci at offset 84, returned the sequence 160500643816367088. Reading from /dev/fibonacci at offset 85, returned the sequence 259695496911122585. Reading from /dev/fibonacci at offset 86, returned the sequence 420196140727489673. Reading from /dev/fibonacci at offset 87, returned the sequence 679891637638612258. Reading from /dev/fibonacci at offset 88, returned the sequence 1100087778366101931. Reading from /dev/fibonacci at offset 89, returned the sequence 1779979416004714189. Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120. Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120. Reading from /dev/fibonacci at offset 89, returned the sequence 1779979416004714189. Reading from /dev/fibonacci at offset 88, returned the sequence 1100087778366101931. Reading from /dev/fibonacci at offset 87, returned the sequence 679891637638612258. Reading from /dev/fibonacci at offset 86, returned the sequence 420196140727489673. Reading from /dev/fibonacci at offset 85, returned the sequence 259695496911122585. Reading from /dev/fibonacci at offset 84, returned the sequence 160500643816367088. Reading from /dev/fibonacci at offset 83, returned the sequence 99194853094755497. Reading from /dev/fibonacci at offset 82, returned the sequence 61305790721611591. Reading from /dev/fibonacci at offset 81, returned the sequence 37889062373143906. Reading from /dev/fibonacci at offset 80, returned the sequence 23416728348467685. Reading from /dev/fibonacci at offset 79, returned the sequence 14472334024676221. Reading from /dev/fibonacci at offset 78, returned the sequence 8944394323791464. Reading from /dev/fibonacci at offset 77, returned the sequence 5527939700884757. Reading from /dev/fibonacci at offset 76, returned the sequence 3416454622906707. Reading from /dev/fibonacci at offset 75, returned the sequence 2111485077978050. Reading from /dev/fibonacci at offset 74, returned the sequence 1304969544928657. Reading from /dev/fibonacci at offset 73, returned the sequence 806515533049393. Reading from /dev/fibonacci at offset 72, returned the sequence 498454011879264. Reading from /dev/fibonacci at offset 71, returned the sequence 308061521170129. Reading from /dev/fibonacci at offset 70, returned the sequence 190392490709135. Reading from /dev/fibonacci at offset 69, returned the sequence 117669030460994. Reading from /dev/fibonacci at offset 68, returned the sequence 72723460248141. Reading from /dev/fibonacci at offset 67, returned the sequence 44945570212853. Reading from /dev/fibonacci at offset 66, returned the sequence 27777890035288. Reading from /dev/fibonacci at offset 65, returned the sequence 17167680177565. Reading from /dev/fibonacci at offset 64, returned the sequence 10610209857723. Reading from /dev/fibonacci at offset 63, returned the sequence 6557470319842. Reading from /dev/fibonacci at offset 62, returned the sequence 4052739537881. Reading from /dev/fibonacci at offset 61, returned the sequence 2504730781961. Reading from /dev/fibonacci at offset 60, returned the sequence 1548008755920. Reading from /dev/fibonacci at offset 59, returned the sequence 956722026041. Reading from /dev/fibonacci at offset 58, returned the sequence 591286729879. Reading from /dev/fibonacci at offset 57, returned the sequence 365435296162. Reading from /dev/fibonacci at offset 56, returned the sequence 225851433717. Reading from /dev/fibonacci at offset 55, returned the sequence 139583862445. Reading from /dev/fibonacci at offset 54, returned the sequence 86267571272. Reading from /dev/fibonacci at offset 53, returned the sequence 53316291173. Reading from /dev/fibonacci at offset 52, returned the sequence 32951280099. Reading from /dev/fibonacci at offset 51, returned the sequence 20365011074. Reading from /dev/fibonacci at offset 50, returned the sequence 12586269025. Reading from /dev/fibonacci at offset 49, returned the sequence 7778742049. Reading from /dev/fibonacci at offset 48, returned the sequence 4807526976. Reading from /dev/fibonacci at offset 47, returned the sequence 2971215073. Reading from /dev/fibonacci at offset 46, returned the sequence 1836311903. Reading from /dev/fibonacci at offset 45, returned the sequence 1134903170. Reading from /dev/fibonacci at offset 44, returned the sequence 701408733. Reading from /dev/fibonacci at offset 43, returned the sequence 433494437. Reading from /dev/fibonacci at offset 42, returned the sequence 267914296. Reading from /dev/fibonacci at offset 41, returned the sequence 165580141. Reading from /dev/fibonacci at offset 40, returned the sequence 102334155. Reading from /dev/fibonacci at offset 39, returned the sequence 63245986. Reading from /dev/fibonacci at offset 38, returned the sequence 39088169. Reading from /dev/fibonacci at offset 37, returned the sequence 24157817. Reading from /dev/fibonacci at offset 36, returned the sequence 14930352. Reading from /dev/fibonacci at offset 35, returned the sequence 9227465. Reading from /dev/fibonacci at offset 34, returned the sequence 5702887. Reading from /dev/fibonacci at offset 33, returned the sequence 3524578. Reading from /dev/fibonacci at offset 32, returned the sequence 2178309. Reading from /dev/fibonacci at offset 31, returned the sequence 1346269. Reading from /dev/fibonacci at offset 30, returned the sequence 832040. Reading from /dev/fibonacci at offset 29, returned the sequence 514229. Reading from /dev/fibonacci at offset 28, returned the sequence 317811. Reading from /dev/fibonacci at offset 27, returned the sequence 196418. Reading from /dev/fibonacci at offset 26, returned the sequence 121393. Reading from /dev/fibonacci at offset 25, returned the sequence 75025. Reading from /dev/fibonacci at offset 24, returned the sequence 46368. Reading from /dev/fibonacci at offset 23, returned the sequence 28657. Reading from /dev/fibonacci at offset 22, returned the sequence 17711. Reading from /dev/fibonacci at offset 21, returned the sequence 10946. Reading from /dev/fibonacci at offset 20, returned the sequence 6765. Reading from /dev/fibonacci at offset 19, returned the sequence 4181. Reading from /dev/fibonacci at offset 18, returned the sequence 2584. Reading from /dev/fibonacci at offset 17, returned the sequence 1597. Reading from /dev/fibonacci at offset 16, returned the sequence 987. Reading from /dev/fibonacci at offset 15, returned the sequence 610. Reading from /dev/fibonacci at offset 14, returned the sequence 377. Reading from /dev/fibonacci at offset 13, returned the sequence 233. Reading from /dev/fibonacci at offset 12, returned the sequence 144. Reading from /dev/fibonacci at offset 11, returned the sequence 89. Reading from /dev/fibonacci at offset 10, returned the sequence 55. Reading from /dev/fibonacci at offset 9, returned the sequence 34. Reading from /dev/fibonacci at offset 8, returned the sequence 21. Reading from /dev/fibonacci at offset 7, returned the sequence 13. Reading from /dev/fibonacci at offset 6, returned the sequence 8. Reading from /dev/fibonacci at offset 5, returned the sequence 5. Reading from /dev/fibonacci at offset 4, returned the sequence 3. Reading from /dev/fibonacci at offset 3, returned the sequence 2. Reading from /dev/fibonacci at offset 2, returned the sequence 1. Reading from /dev/fibonacci at offset 1, returned the sequence 1. Reading from /dev/fibonacci at offset 0, returned the sequence 0. ``` ::: \ 可以發現到 92 項以後數值就不再改變,代表這裡存在嚴重缺陷,對於超過 64 位元的數值無能為力。 另外在輸入指令 ```shell $ ls -l /dev/fibonacci ``` 後,不是得到 235 也不是得到 236 而是 505 ```shell crw------- 1 root root 505, 0 3月 11 13:10 /dev/fibonacci ``` > [ktime doc](https://www.kernel.org/doc/html/latest/core-api/timekeeping.html) 看[參考共筆](https://hackmd.io/@KYWeng/rkGdultSU)依樣畫葫蘆,由於目前 **fibdrv.c** 的 `fib_write()` 沒有功能,所以在其中塞入測量時間的程式碼: ```c static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { ktime_t kt; kt = ktime_get(); fib_sequence(*offset); kt = ktime_sub(ktime_get(), kt); return (ssize_t) ktime_to_ns(kt); } ``` 然而輸出的數值看起來都差不多,回頭想找找看這個 `offset` 到底是怎麼設的。我發現在 **client.c** 呼叫時傳入的參數並不包含 `offset` ,不過在呼叫 `write()` 跟 `read()` 時有個不同是 `read()` 多了一行: ```c lseek(fd, i, SEEK_SET); ``` 其中 `SEEK_SET` 將被展開為 0 。觀察發現呼叫 `lseek()` 會連到 **fibdrv.c** 中的: ```c static loff_t fib_device_lseek(struct file *file, loff_t offset, int orig) { loff_t new_pos = 0; switch (orig) { case 0: /* SEEK_SET: */ new_pos = offset; break; case 1: /* SEEK_CUR: */ new_pos = file->f_pos + offset; break; case 2: /* SEEK_END: */ new_pos = MAX_LENGTH - offset; break; } if (new_pos > MAX_LENGTH) new_pos = MAX_LENGTH; // max case if (new_pos < 0) new_pos = 0; // min case file->f_pos = new_pos; // This is what we'll use now return new_pos; } ``` 雖然可以猜測這個函式的回傳值會更改 `offset` ,但是目前不知道這到底是如何達成的,我甚至不曉得 `offset` 到底存在哪裡。 :::info TODO: - 釐清 `offset` 答: 我認為就是存在 file 結構的 `f_pos` 成員中 ::: ### 研讀參考共筆中的大數運算 > https://hackmd.io/@KYWeng/rkGdultSU#%E5%A6%82%E4%BD%95%E6%B8%9B%E5%B0%91%E5%A4%A7%E6%95%B8%E9%81%8B%E7%AE%97%E7%9A%84%E6%88%90%E6%9C%AC 這份共筆提出多種實作上改進計算 Fib 速度的技巧,摘要如下: #### 調整 Fast Fib 算法 將原先的算法由 $$\begin{aligned} F_{2n+1} &= F_{n+1}^2 + F_n^2 \\ F_{2n} &= F_n\cdot(2F_{n+1}-F_n) \end{aligned} $$ 改良為: $$\begin{aligned} F_{2n-1} &= F_n^2 + F_{n-1}^2 \\ F_{2n} &= F_n\cdot(2F_{n-1}+F_n) \end{aligned} $$ 以迴避所有減法運算,並且不會再出現任何負數,可以全部用 unsigned 型態儲存。 #### 使用 memory pool 在大數結構中引入 memory pool 的概念來避免頻繁的 `realloc()` 。 #### 使用內嵌組合語言加速乘法 做乘法運算時直接指定高低位的位置,而不是使用擴展整數型別。 以參考共筆中的語法為例: ``` __asm__("mulq %3" : "=a"(low), "=d"(high) : "%0"(a[i]), "rm"(k)); assembly language : output : input ``` 其中 instruction 顯然為 `mulq` ,而 1. `=` 是 write only 的意思 2. `a`, `d` 分別表示 `$rax`, `$rdx` 3. `( )` 內使用 C 變數 4. `rm` 表示由 compiler 自己選擇存放位置 5. `%0` 在此等同 `a` 暫存器 6. `%3` 代表 `rm` 參考: > https://ithelp.ithome.com.tw/articles/10213500 > https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html #### 引入 [Karatsuba algorithm](https://en.wikipedia.org/wiki/Karatsuba_algorithm) 算法 核心思想為在做乘法時,將一些乘法運算改為加減,考慮: $$\begin{aligned} (a_12^n+a_0)\times(b_12^n+b_0) &= c_22^{2n} + c_12^n +c_0 \\ &= (a_1b_1)2^{2n} + (a_1b_0+a_0b_1)2^n + a_0b_0 \end{aligned} $$ 原先需要做 4 次乘法運算,經過一些設計可以變成: $$ (2^{2n}+2^n)(a_1b_1) + 2^n(a_1-a_0)(b_0-b_1) + (2^n+1)(a_0b_0) $$ 便能少做一次乘法。 ## 改進 fibdrv > [實作程式碼 github](https://github.com/tinyynoob/fibdrv/tree/master) ### 大數運算 在為 fibdrv 改進效能之前,我們需要先確保它的正確性,為此我們首先引入大數運算的機制。對於存放大數的方式前面提出了兩種設計: - 單純使用結構 - 使用 union 達成 small string optimization 由於在計算 Fib 的情境下不會一直需要宣告新的變數,也不會有頻繁居於小數計算的情況,因此我們選擇使用結構的方式,如此也較便利於 maintain resize 的功能。 因為我目前不太熟悉 kernel programming 的各種 function 用法,因此我打算先在 user space 開發,並 debug 確認功能沒問題之後再移植到 kernel space 。 我想借助 macro 來完成此工作,例如這樣: ```c #if DEBUG N = (ubn *) malloc(sizeof(ubn)); #else N = (ubn *) kmalloc(sizeof(ubn), GFP_KERNEL); #endif ``` 由於在計算 Fib 的過程中不會遇到任何負數,因此我的大數運算假設不存在負數,相關的運算也都沒有提供。 引入[共筆用的 macro](https://hackmd.io/@KYWeng/rkGdultSU#%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-4---%E5%96%84%E7%94%A8-64-bit-CPU-%E7%9A%84%E5%84%AA%E5%8B%A2) : ```c #if defined(__LP64__) || defined(__x86_64__) || defined(__amd64__) || \ defined(__aarch64__) typedef uint64_t ubn_unit; // typedef unsigned __int128 bn_data_tmp; // gcc support __int128 #define ubn_unit_bit 64 #else typedef uint32_t ubn_unit; // typedef uint64_t bn_data_tmp; #define ubn_unit_bit 32 #endif ``` 目前還未使用到加倍的型別,暫時先註解掉。 基於沒有負數的假設,我們得以直接移除共筆結構中的 `sign` 成員,訂出: ```c /* unsigned big number * @data: MSB:[size-1], LSB:[0] * @size: sizeof(ubn_unit) * @capacity: allocated size */ typedef struct { ubn_unit *data; int size; int capacity; } ubn; ``` 預期 (無號) 大數運算的界面提供以下功能: - `ubignum_init()` - `ubignum_free()` - `ubignum_resize()` - `ubignum_add()` - `ubignum_mult()` - `ubignum_assign()` 目前比較沒有頭緒的是 `assign()` 的作法 #### ubignum_init() :::spoiler 程式碼 ```c bool ubignum_init(ubn *N) { #if DEBUG N = (ubn *) malloc(sizeof(ubn)); #else N = (ubn *) kmalloc(sizeof(ubn), GFP_KERNEL); #endif if (!N) goto struct_aloc_failed; #if DEBUG N->data = (ubn_unit *) calloc(sizeof(ubn_unit), DEFAULT_CAPACITY); // including mem reset #else N->data = (ubn_unit *) kcalloc(sizeof(ubn_unit), DEFAULT_CAPACITY, GFP_KERNEL); // including mem reset #endif if (!N->data) goto data_aloc_failed; N->size = 0; N->capacity = DEFAULT_CAPACITY; return true; data_aloc_failed: free(N); N = NULL; struct_aloc_failed: return false; } ``` ::: 其中 `DEFAULT_CAPACITY` 目前訂為 16 。 靜態分析出現一堆問題,這才發現我原本的寫法根本不會正確把初始化後的 `N` 回傳,因此修改程式碼: #### ubignum_resize() :::spoiler 程式碼 ```c bool ubignum_resize(ubn *N, int new_size) { if (new_size < 0) return false; #if DEBUG ubn_unit *new = (ubn_unit *) realloc(N->data, sizeof(ubn_unit) * new_size); #else ubn_unit *new = (ubn_unit *) krealloc(N->data, sizeof(ubn_unit) * new_size, GFP_KERNEL); #endif if (!new) return false; N = new; N->capacity = new_size; return true; } ``` ::: 目前不確定允許 `new_size` 傳入 0 是否是個好主意。 #### ubignum_add() ```c bool ubignum_add(const ubn *a, const ubn *b, ubn *out) ``` 目前設計讓使用者傳入存放 sum 的變數,但這麼做就需要解決 pointer aliasing 的問題。 由低到高做加法的流程大致可以分成兩段: 1. 兩數相加 2. 一數加上 1. 的 carry out 首先開發 1. 的部份: ```c= ubn_unit carry = 0; int i = 0; for (i = 0; i < min(a->size, b->size); i++) { if (__builtin_expect(i >= out->capacity, 0)) { if (!resize(out, out->capacity * 2)) { return false; } } out->data[i] = a->data[i] + b->data[i] + carry; // The following consideration does not include carry // fix it! carry = ((a->data[i] ^ b->data[i]) >> 1 ^ (a->data[i] & b->data[i])) & (1u << (ubn_unit_bit - 1)); } ``` 若超出 `out` 範圍則對它進行 `resize()` ,其中加入 `builtin_expect()` 協助 compiler 進行 branch prediction 。 撰寫過程馬上讓我遭遇了幾個問題: 1. 第 11 行,我嘗試引入 [quiz2 第一題](https://hackmd.io/cmwilKeBSsStEwvKUgkfMA#%E6%B8%AC%E9%A9%97-1) 計算平均的方式再取 MSB 來得出新的 `carry` ,但我發覺這樣的話舊的 `carry` 沒有被加進來,所以在例如 `(a, b, c) = (UINT_MAX, 0, 1)` 的情況下會算出錯誤結果。 2. 我不熟悉 gcc 對於 unsigned 型態的右移會如何處理 根據 [gcc 的 mail list](https://gcc.gnu.org/legacy-ml/gcc/2000-04/msg00152.html) : > For right shifts, if the type is signed, then we use an arithmetic shift; if the type is unsigned then we use a logical. 因此對於 unsigned 右移的行為符合我的預期。 :::info 1\. 待解決 $\to$ 後面完全更換作法就不再有這個問題 ::: 為了應對 pointer aliasing 的問題,我們應該不能直接拿 `a->data` 和 `b->data` 去運算,因為它們在第 9 行可能會被改變 (如果他們跟 `out` 指到同樣位置) ,那麼第 12 行的運算將會錯誤。因此需要先將它們在第 9 行的值存起來,將程式碼改成: ```c= for (i = 0; i < min(a->size, b->size); i++) { if (__builtin_expect(i >= out->capacity, 0)) { if (!resize(out, out->capacity * 2)) { goto realoc_failed; } } ubn_unit an = a->data[i]; ubn_unit bn = b->data[i]; out->data[i] = an + bn + carry; // The following consideration does not include carry // fix it! carry = ((an ^ bn) >> 1 + (an & bn)) & (1u << (ubn_unit_bit - 1)); } ``` 其中第 4 行跳到最後去把 `a` 跟 `b` 還原成原本的樣子,這也是因為 aliasing 才有的問題。 第二段程式碼: ```c= ubn *remain = NULL; if (i == a->size) remain = b; else if (i == b->size) remain = a; for (; carry || i < remain->size; i++) { if (__builtin_expect(i >= out->capacity, 0)) { if (!resize(out, out->capacity * 2)) { return false; } } if (__builtin_expect(i < remain->size, 1)) { out->data[i] = remain->data[i] + carry; carry = ((remain->data[i] ^ carry) >> 1 ^ (remain->data[i] & carry)) & (1u << (ubn_unit_bit - 1)); } else { // the last carry out carries to a new ubn_unit out->data[i] = carry; // = 1 carry = 0; } } ``` 先使用 `remain` 來記住哪一個加數還有剩餘的位數沒加,接著就進入迴圈繼續加完,大部分的情況都會走 12 行,唯一會走到 17 行的情況是最高位有 carry out 而且剛好 carry 到一個新的 `ubn_unit` chunk 。 同樣處理 pointer aliasing 的問題並改進寫法得: ```c ubn *remain = (i == a->size) ? b : a; for (; carry || i < remain->size; i++) { if (__builtin_expect(i >= out->capacity, 0)) { if (!resize(out, out->capacity * 2)) { goto realoc_failed; } } if (__builtin_expect(i < remain->size, 1)) { ubn_unit rn = remain->data[i]; out->data[i] = rn + carry; carry = ((rn ^ carry) >> 1 ^ (rn & carry)) & (1u << (ubn_unit_bit - 1)); } else { // the last carry out carries to a new ubn_unit out->data[i] = carry; // = 1 carry = 0; } } ``` 處理 aliasing 的程式碼: ```c ubn *store = NULL; int alias = 0; if (a == out) // pointer aliasing alias ^= 1; if (b == out) // pointer aliasing alias ^= 2; if (alias) { store = ubignum_duplicate(store, out); if (!store) return false; } ``` 直接使用一個變數 `alias` 中的 bit 來觀察,若有 aliasing 則配置空間給保留原值。 ```c realoc_failed: switch (alias) { case 1: swapptr(&a, &store); break; case 2: case 3: swapptr(&b, &store); case 0: } return false; ``` 注意到這裡 2, 3 共用的原因是,如果 `a` 和 `b` 指到同一個位置,那麼恢復 `b` 也同時恢復了 `a` 。 --- aliasing 的問題比我預期的還更難處理,我直接重改邏輯,變成如果有 aliasing 則配置一個新空間存放計算結果,算完再放到 `out` ,改寫過程中也修復了一些 bug 。 ::: spoiler 更改後的 `ubignum_add()` 程式碼 ```c= /* out = a + b * Aliasing arguments are acceptable. * If it return true, the result is put at @out. * If return false with alias input, the @out would be unchanged. * If return false without alias input, the @out would return neither answer nor * original out. */ bool ubignum_add(const ubn *a, const ubn *b, ubn **out) { ubn *ans = *out; int alias = 0; if (a == *out) // pointer aliasing alias ^= 1; if (b == *out) // pointer aliasing alias ^= 2; if (alias) { // if so, allocate space to store the result if (!ubignum_init(&ans)) return false; } ubn_unit carry = 0; int i = 0; for (i = 0; i < min(a->size, b->size); i++) { if (i >= ans->size) { if (__builtin_expect(i >= ans->capacity, 0)) if (!ubignum_resize(ans, ans->capacity * 2)) goto realoc_failed; ans->size++; } const ubn_unit_extend sum = a->data[i] + b->data[i] + carry; ans->data[i] = sum; carry = sum >> ubn_unit_bit; } const ubn *remain = (i == a->size) ? b : a; for (; carry || i < remain->size; i++) { if (i >= ans->size) { if (__builtin_expect(i >= ans->capacity, 0)) if (!ubignum_resize(ans, ans->capacity * 2)) goto realoc_failed; ans->size++; } if (__builtin_expect(i < remain->size, 1)) { const ubn_unit_extend sum = remain->data[i] + carry; ans->data[i] = sum; carry = sum >> ubn_unit_bit; } else { // the last carry out carries to a new ubn_unit ans->data[i] = carry; // = 1 carry = 0; } } *out = ans; // no condition needed return true; realoc_failed: if (alias) ubignum_free(ans); return false; } ``` ::: \ 其中 `ubn_unit_extend` 在 x64 定義為 `unsigned __int128` ,目前嘗試在 user space 配合 gdb 工具進行測試,得到錯誤的結果, `carry` 沒有被正確計算 ```c carry = sum >> ubn_unit_bit; ``` 疑似 uint128 的 shift operation 有問題,我嘗試去找 [gcc doc](https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html) ,但是內容只有三行,沒有提到是否支援 shift operation 。 我在 assignment 之前加入強制轉型就有我預期的結果了 ```c const ubn_unit_extend sum = (ubn_unit_extend) a->data[i] + b->data[i] + carry; ``` 我好像已經吃了很多次這種虧,卻都沒有記住,我一直誤以為計算的時候就會轉成左邊的型態,這次特別寫進 commit message ,希望能確實記得。 #### ubignum_mult() 雖然前面有提及[加速的算法](https://hackmd.io/MJ1kMQBLRBCuo08yEmhDeQ?both#%E5%BC%95%E5%85%A5-Karatsuba-algorithm-%E7%AE%97%E6%B3%95),但是因為我目前的大數機制沒有減法,所以暫時先略過此項。 在開發時發現 `ubn_unit` 的加法一直重複遇到,所以將其模組化 ```c /* no checking, sum and cout input should be guarantee */ static inline void ubn_unit_add(const ubn_unit a, const ubn_unit b, const int cin, ubn_unit *sum, int *cout) { const ubn_unit_extend SUM = (ubn_unit_extend) a + b + cin; *sum = SUM; *cout = SUM >> ubn_unit_bit; } ``` 長得有點像硬體加法器的樣子。 `cin` 和 `cout` 都只會用到一個 bit ,用什麼型態似乎都蠻浪費的,最後選用了最常見的 int 。 :::spoiler `mult()` 程式碼 ```c= bool ubignum_mult(const ubn *a, const ubn *b, ubn **out) { if (!a || !b || !out || !*out) return false; ubn *ans = *out; int alias = 0; if (a == *out) // pointer aliasing alias ^= 1; if (b == *out) // pointer aliasing alias ^= 2; if (alias) { // if alias, allocate space to store the result if (!ubignum_init(&ans)) return false; } /* keep mcand longer then mplier */ const ubn *mcand = a->size > b->size ? a : b; const ubn *mplier = a->size > b->size ? b : a; ubn *prod; if (!ubignum_init(&prod)) goto prod_aloc_failed; if (prod->capacity < mcand->size + 1) if (!ubignum_resize(&prod, mcand->size + 1)) goto prod_resize_failed; /* The outer loop */ for (int i = 0; i < mplier->size; i++) { /* The inner loop explanation: * a b c = mcand->data[2, 1, 0] * * d = mplier->data[0] * -------- * c c = (high, low) denotes result of c * d * b b * + a a * ---------- * x y ? c sum of bb and cc, it may be carry out to x * + a a * * let x be recorded in @carry and y be @overlap * no realloc needed for @prod */ int carry = 0; ubn_unit overlap = 0; for (int j = 0; j < mcand->size; j++) { ubn_unit low, high; // need macro to select appropriate assembly __asm__("mulq %3" : "=a"(low), "=d"(high) : "%0"(mcand->data[j]), "rm"(mplier->data[i])); int cout = 0; ubn_unit_add(low, overlap, carry & 2, &prod->data[j], &cout); carry = (carry << 1) | cout; // update carry where cout is 0 or 1 overlap = high; } prod->data[mcand->size] = overlap + (ubn_unit) carry & 2; // no carry out would be generated if (!ubignum_mult_add(prod, i, &ans)) goto multadd_failed; } *out = ans; return true; multadd_failed: prod_resize_failed: ubignum_free(prod); prod_aloc_failed: if (alias) ubignum_free(ans); return false; } ``` ::: \ 目前的 bug 是 inline asm 沒有正確作用,看起來像是 `low` 跟 `high` 相反了,也可以看出確實是把 `$rdx`, `$rax` 分別接到 `high` 跟 `low` 。 ```shell Breakpoint 2, ubignum_mult (a=0x5555555592a0, b=0x555555559350, out=0x7fffffffdc88) at bignum.h:336 336 ubn_unit_add(low, overlap, carry & 2, &prod->data[j], &cout); (gdb) p mcand->data[j] $1 = 18446744073709551615 (gdb) p i $2 = 0 (gdb) p j $3 = 0 (gdb) p mplier->data[i] $4 = 18446744073709551615 (gdb) p low $5 = 1 (gdb) p high $6 = 18446744073709551614 (gdb) info registers rax rax 0x1 1 (gdb) info registers rdx rdx 0xfffffffffffffffe -2 ``` 難道說它執行 `mulq` instruction 時的資料放置方式是跟原本相反嗎? 我嘗試去查看組合語言的編譯結果,看到: ``` mulq %rdx ``` 雖然它把 `"rm"` 選成 `$rdx` ,但這應該也不會影響指令執行結果才對。 我發現其實它沒有算錯,這個結果是正確的,但因為我對乘法的直覺還不夠,誤用加法的邏輯來思考乘法才誤判結果。 在 pointer aliasing 的情況下無法做到 in place 的算法,否則錯誤處理會出問題。 --- ### 大數運算 第二版 在開發的過程中,因為大數運算難以 debug 只能經由 gdb 慢慢嘗試,因此修修改改花了很多時間,改了又錯又重改的情況多次發生,難以一一列舉,在此大略敘述過程遇到的狀況。 #### pointer aliasing issues 首先是前面提及 pointer aliasing 的問題,這個狀況是我在第一個版本遇到最大的問題,第一次是在開發加法時遇到。例如當想計算 `a = a + b` 時,若加出更大的位數則需要 reallocation ,但中途的 reallocation 失敗會導致傳入值 `a` 被破壞而正確的新 `a` 也沒有算出來,進退兩難的情況。 起初為了應對這個問題,我每次計算都先查看是否有 aliasing ,若有則進行複製, maintain 一份副本以利在 reallocation 失敗時能將數值復原。然而這個方法顯然導致很糟糕的效能,雖然之後為了節省交換的操作而直接計算在新空間,但仍然用了 3 個數。 使用者在使用界面時若指定 `a = a + b` ,那通常直覺也會認為自己使用了 in place 的算法,可是我在實作上卻無論如何都需要 3 個數的空間,違反使用直覺。 :::spoiler 原程式碼 (add) ```c /* (*out) = a + b * Aliasing arguments are acceptable. * If it return true, the result is put at @out. * If return false with alias input, the @out would be unchanged. * If return false without alias input, the @out would return neither answer nor * original out. */ bool ubignum_add(const ubn *a, const ubn *b, ubn **out) { if (!a || !b || !out || !*out) return false; ubn *ans = *out; int alias = 0; if (a == *out) // pointer aliasing alias ^= 1; if (b == *out) // pointer aliasing alias ^= 2; if (alias) { // if alias, allocate space to store the result if (!ubignum_init(&ans)) return false; } int i = 0, carry = 0; for (i = 0; i < min(a->size, b->size); i++) { if (i >= ans->size) { if (__builtin_expect(i >= ans->capacity, 0)) if (!ubignum_resize(&ans, ans->capacity * 2)) goto realoc_failed; ans->size++; } ubn_unit_add(a->data[i], b->data[i], carry, &ans->data[i], &carry); } const ubn *remain = (i == a->size) ? b : a; for (; carry || i < remain->size; i++) { if (i >= ans->size) { if (__builtin_expect(i >= ans->capacity, 0)) if (!ubignum_resize(&ans, ans->capacity * 2)) goto realoc_failed; ans->size++; } ubn_unit_add(remain->data[i], 0, carry, &ans->data[i], &carry); } if (alias) ubignum_free(*out); *out = ans; return true; realoc_failed: if (alias) ubignum_free(ans); return false; } ``` ::: \ 對於每個 $n$ 測量 1000 次 $\mathrm{fib}(n)$ 計算後印出平均消耗時間。 其糟糕效能如下: ![](https://i.imgur.com/xFs08l4.png) 為此我後來發現若是能提早知道 reallocation 的大小,或至少知道該次計算的上界,那麼即可在開始計算之前就進行 reallocation 。如此一來即使配置失敗,那麼直接回傳 false 原先的值也不會被破壞。因此在計算 `a = a + b` 時便可以達成 in place 的目標,使用 2 個數的空間完成加法,也因為每次 `ubignum_add()` 都只會做一次 reallocation ,大幅降低了 memory allocation 的花費。此外也不再需要確認 aliasing ,移除判斷條件後也使程式碼簡潔許多。 :::spoiler 改良 (add) ```c /* (*out) = a + b * Aliasing arguments are acceptable. * If false is returned, the values of (*out) remains unchanged. */ bool ubignum_add(const ubn *a, const ubn *b, ubn **out) { if (!a || !b || !out || !*out) return false; while (__builtin_expect((*out)->capacity < max(a->size, b->size) + 1, 0)) if (!ubignum_resize(out, (*out)->capacity * 2)) return false; if (*out != a && *out != b) // if no pointer aliasing ubignum_zero(*out); int i = 0, carry = 0; for (i = 0; i < min(a->size, b->size); i++) ubn_unit_add(a->data[i], b->data[i], carry, &(*out)->data[i], &carry); const ubn *const remain = (i == a->size) ? b : a; for (; i < remain->size; i++) ubn_unit_add(remain->data[i], 0, carry, &(*out)->data[i], &carry); (*out)->size = remain->size; if (carry) { (*out)->data[i] = 1; (*out)->size++; } return true; } ``` ::: \ 效能: ![](https://i.imgur.com/jFQsPRi.png) 相比原先的版本加速將近十倍之多。 #### 列印 第二個困難出在列印,雖然我的 `ubignum` 已能進行數種運算,也正確地將數值儲存在檔案中,但它們就是一串由上百、上千個 0, 1 組成的資料,難以為人類所理解。 為了將很長的二進位轉成十進制,我左思右想許久,最後認為終究是離不開除法的命運,還是得乖乖寫一個除法。我想起前幾周的測驗題中,恰好有[計算除法的內容](https://hackmd.io/u4lvdDcAQeC8UoQ1dT7LUg?view#%E6%B8%AC%E9%A9%97-5),於是我就嘗試套用這個模式,意圖以位移和減法來達成除法運算。此外因為目前沒有其它除法的應用場景,所以可以只考量「除以 10 」的情況並進行特化。 為了實作出 `ubignum_divby_ten()` ,首先需要 `ubignum_left_shift()` 及 `ubignum_sub()` 。 ##### ubignum_sub() 由於我的大數體系原先完全不存在任何負的概念,因此想引入減法也需要做一些權衡。 不過因為除法中使用的減法會保證結果為非負 (餘數) ,如此一來就不必引入負數,減法直接使用二補數加法進行,並把最高的進位拋棄。基於直接假設減法都是大減小,原本的 ubignum 體系可以維持不變。 以下程式碼都展示解決前述 pointer aliasing 問題後的版本。 :::spoiler `sub()` 程式碼 ```c bool ubignum_sub(const ubn *a, const ubn *b, ubn **out) { if (!a || !b || !out || !*out) return false; /* ones' complement of b */ ubn *cmt; if (!ubignum_init(&cmt)) goto cmt_failed; if (__builtin_expect(cmt->capacity < a->size, 0)) if (!ubignum_resize(&cmt, a->size)) goto cmt_realoc_failed; cmt->size = a->size; for (int i = 0; i < b->size; i++) cmt->data[i] = ~b->data[i]; for (int i = b->size; i < a->size; i++) cmt->data[i] = CPU_64 ? UINT64_MAX : UINT32_MAX; int carry = 1; for (int i = 0; i < a->size; i++) // compute result and store in cmt ubn_unit_add(a->data[i], cmt->data[i], carry, &cmt->data[i], &carry); // the last carry is discarded for (int i = a->size - 1; i >= 0; i--) if (!cmt->data[i]) cmt->size--; ubignum_free(*out); *out = cmt; return true; cmt_realoc_failed: ubignum_free(cmt); cmt_failed: return false; } ``` ::: ##### ubignum_left_shift() :::spoiler `left_shift()` 程式碼 ```c= /* left shift a->data by d bit */ bool ubignum_left_shift(const ubn *a, int d, ubn **out) { if (!a || d < 0 || !out || !*out) return false; else if (d == 0) { if (a == *out) return true; if (__builtin_expect((*out)->capacity < a->size, 0)) if (!ubignum_resize(out, a->size)) return false; for (int i = 0; i < a->size; i++) (*out)->data[i] = a->data[i]; (*out)->size = a->size; return true; } const int chunk_shift = d / ubn_unit_bit; const int shift = d % ubn_unit_bit; const int new_size = a->size + chunk_shift + 1; if (__builtin_expect((*out)->capacity < new_size, 0)) if (!ubignum_resize(out, new_size)) return false; int ai = new_size - chunk_shift - 1; // note a->size may be changed due to // aliasing, should not use a->size int oi = new_size - 1; /* copy data from a to (*out) */ if (shift) { (*out)->data[oi--] = a->data[ai - 1] >> (ubn_unit_bit - shift); for (ai--; ai > 0; ai--) (*out)->data[oi--] = (a->data[ai] << shift) | (a->data[ai - 1] >> (ubn_unit_bit - shift)); (*out)->data[oi--] = a->data[ai] << shift; // ai is now 0 } else { while (ai >= 0) (*out)->data[oi--] = a->data[ai--]; } /* end copy */ while (oi >= 0) // set remaining part of (*out) to 0 (*out)->data[oi--] = 0; (*out)->size = new_size; if ((*out)->data[(*out)->size - 1] == 0) // if MS chunk is 0 (*out)->size--; return true; } ``` ::: \ 大致上就是一些搬移的操作,在此解釋 29 至 33 行的操作,假設我們要將 `a` : ``` 00000000 11111111 01001111 00000000 ``` 左移 3 位。 - 注意到 chunk 編號是 $(3, 2, 1, 0)$ ,左邊為 most significant 。 每次我們需要目前 chunk 的最低 5 位放到高 (置左) 與其右方 chunk 的最高 3 位放到低 (置右) 進行合併,以編號 $2$ 為例即是: ``` 11111... from [2] or .....010 from [1] ---------- 11111010 ``` - 取最低 5 位放到高:左移 $3$ - 取最高 3 位放到低:右移 $(8-3)$ 最高跟最低的 chunk 都只有分別一邊所以需要特殊處理,最終得: ``` 00000111 11111010 01111000 00000000 ``` 由於我們每次都不會去取目標左方的新 chunk ,因此 pointer aliasing 在此並不會造成問題。 另外,考量到當 `shift` 為 0 時,原本的程式碼會導致數值右移超過自己的 size ,產生 undefined behavior ,因此需要第 28 行的 if..else 來進行 branch 。 ##### ubignum_divby_ten() 概念源於 [quiz3](https://hackmd.io/u4lvdDcAQeC8UoQ1dT7LUg?view#%E6%B8%AC%E9%A9%97-5) 的算法。 在撰寫的過程中,再多寫了一個 `ubignum_compare()` 函式以利比較。 :::spoiler `divby_ten()` 程式碼 ```c /* a / 10 = (*quo)...rmd */ bool ubignum_divby_ten(const ubn *a, ubn **quo, int *rmd) { if (!a || !a->size || !quo || !*quo || !rmd) return false; ubn *ans, *dvd, *suber, *ten; if (!ubignum_init(&ans)) return false; if (!ubignum_resize(&ans, a->size)) goto cleanup_ans; if (!ubignum_init(&dvd)) goto cleanup_ans; if (!ubignum_resize(&dvd, a->size)) goto cleanup_dvd; if (!ubignum_init(&suber)) goto cleanup_dvd; if (!ubignum_resize(&suber, dvd->capacity + 1)) goto cleanup_suber; if (!ubignum_init(&ten)) goto cleanup_suber; ubignum_uint(ten, 10); for (int i = 0; i < a->size; i++) dvd->data[i] = a->data[i]; dvd->size = a->size; int shift = dvd->size * ubn_unit_bit - 4; // can be improved by clz() ubignum_left_shift(ten, shift, &suber); while (ubignum_compare(dvd, ten) >= 0) { // if dvd >= 10 while (ubignum_compare(dvd, suber) < 0) { shift--; ubignum_left_shift(ten, shift, &suber); } ans->data[shift / ubn_unit_bit] |= (ubn_unit) 1 << (shift % ubn_unit_bit); ubignum_sub(dvd, suber, &dvd); } ans->size = a->size; if (ans->data[a->size - 1] == 0) ans->size--; *rmd = (int) dvd->data[0]; ubignum_free(ten); ubignum_free(suber); ubignum_free(dvd); ubignum_free(*quo); *quo = ans; return true; cleanup_suber: ubignum_free(suber); cleanup_dvd: ubignum_free(dvd); cleanup_ans: ubignum_free(ans); return false; } ``` ::: \ 之後同樣如 quiz3 引入 `builtin_clz()` 並重新包裝成 `ubignum_clz()` 進行改進,關鍵段落變化: ```diff for (int i = 0; i < a->size; i++) dvd->data[i] = a->data[i]; dvd->size = a->size; - int shift = dvd->size * ubn_unit_bit - 4; - ubignum_left_shift(ten, shift, &suber); while (ubignum_compare(dvd, ten) >= 0) { // if dvd >= 10 - while (ubignum_compare(dvd, suber) < 0) { - shift--; - ubignum_left_shift(ten, shift, &suber); - } + int shift = dvd->size * ubn_unit_bit - ubignum_clz(dvd) - 4; + ubignum_left_shift(ten, shift, &suber); + if (ubignum_compare(dvd, suber) < 0) + ubignum_left_shift(ten, --shift, &suber); ans->data[shift / ubn_unit_bit] |= (ubn_unit) 1 << (shift % ubn_unit_bit); ubignum_sub(dvd, suber, &dvd); } ans->size = a->size; // FIX: ans = 0 should be considered if (ans->data[a->size - 1] == 0) ans->size--; *rmd = (int) dvd->data[0]; ``` ##### ubignum_2decimal() 在實作完成除以 10 之後,我們即可透過不斷除以 10 來將二進位數轉成字串印出。 :::spoiler `2decimal()` 程式碼 ```c /* convert the ubn to ascii string */ char *ubignum_2decimal(const ubn *N) { if (!N) return NULL; if (ubignum_iszero(N)) { #if DEBUG char *ans = (char *) calloc(sizeof(char), 2); #else char *ans = (char *) kcalloc(sizeof(char), 2, GFP_KERNEL); #endif if (!ans) return NULL; ans[0] = '0'; ans[1] = 0; return ans; } ubn *dvd = NULL; if (!ubignum_init(&dvd)) return NULL; if (!ubignum_resize(&dvd, N->size)) goto cleanup_dvd; for (int i = 0; i < N->size; i++) dvd->data[i] = N->data[i]; dvd->size = N->size; /* Let n be the number. * digit = 1 + log_10(n) = 1 + \frac{log_2(n)}{log_2(10)} * log_2(10) \approx 3.3219 \approx 7/2, we simply choose 3 */ unsigned digit = (ubn_unit_bit * N->size / 3) + 1; #if DEBUG char *ans = (char *) calloc(sizeof(char), digit); #else char *ans = (char *) kcalloc(sizeof(char), digit, GFP_KERNEL); #endif if (!ans) goto cleanup_dvd; /* convert 2-base to 10-base */ int index = 0; while (dvd->size) { int rmd; if (__builtin_expect(!ubignum_divby_ten(dvd, &dvd, &rmd), 0)) goto cleanup_ans; ans[index++] = rmd | '0'; } int len = index; /* reverse string */ index = len - 1; for (int i = 0; i < index; i++, index--) { char tmp = ans[i]; ans[i] = ans[index]; ans[index] = tmp; } ubignum_free(dvd); return ans; cleanup_ans: #if DEBUG free(ans); #else kfree(ans); #endif cleanup_dvd: ubignum_free(dvd); return NULL; } ``` ::: \ 目前我認為字串反轉的方式還可以再改良,但因為這個部份並非 fib 運算的主軸,因此暫時就先如此。 --- ### fibdrv 效能分析與改進 原先想與[範例程式碼](https://github.com/AdrianHuang/fibdrv/tree/big_number)比較,然而稍微更改後在自己電腦實驗的結果卻非常糟: ![](https://i.imgur.com/BDVkuDe.png) ![](https://i.imgur.com/Kd0V1c9.png) 因此目前先以自己改進為方向, 由於速度的提昇,我們現在將範圍拉到 $n=1000$ 來測試。 ![](https://i.imgur.com/xE4ejiO.png) 對於目前 fib sequence 逐項往上加的算法 ```c static inline void ubn_unit_add(const ubn_unit a, const ubn_unit b, const int cin, ubn_unit *sum, int *cout) { const ubn_unit_extend SUM = (ubn_unit_extend) a + b + cin; *sum = SUM; *cout = SUM >> ubn_unit_bit; } ``` 我認為仍有可能改進的部份: 1. 使用 inline asm 來做加法,如此 (在 64 位元情況) 可以避免用到 `uint128_t` 今天 (3/24) 上課的時候,聽到老師介紹 gcc 有 builtin 的 overflow 函式,藉由這點我們即可改寫原本的 `ubn_unit_add()`,不需要低到 asm 的層級,相對 asm 的優點在於 ==architecture independence==。然而這裡是需要 3 個數相加,因此需要呼叫兩次 `builtin_uadd_overflow()`。 ```c static inline void ubn_unit_add(const ubn_unit a, const ubn_unit b, const int cin, ubn_unit *sum, int *cout) { *cout = __builtin_uaddll_overflow(a, cin, sum); *cout |= __builtin_uaddll_overflow(*sum, b, sum); } ``` ![](https://i.imgur.com/uD7JceJ.png) 測試後可以發現在較大的 case 上,使用 `builtin_uadd_overflow()` 可以得到略佳的效果。再測到更大一些得: ![](https://i.imgur.com/BUdjRc1.png) --- 當我嘗試將 $n$ 提高到 3000 時,卻觀察到一個奇怪的現象,就是在 1000 之後,計算時間就不再有提昇的趨勢,如圖: ![](https://i.imgur.com/FhKNPup.png) 我覺得這非常不合理,因為以我的算法對於越大的數肯定有越高的計算量。目前想到的可能出在,因數值太大而產生 overflow ,但 20000 $\times$ 1000 也仍不到 32-bit int 的上界,應不可能超出 long long ,且可以從圖中見得有些數字遠遠超過水平線卻仍被印了出來。或許需要更理解 ktime_t 相關的知識。 > On 64-bit systems, a ktime_t is really just a 64-bit integer value in nanoseconds. > --- [The high-resolution timer API](https://lwn.net/Articles/167897/) 以這個資料來說,顯然更不是時間變數資料型態的問題了。 :::success 經過數天之後我終於發現問題,是 **fibdrv.c** 中的 `MAX_LENGTH` 。之前改過一次之後我就忘記有這個參數,目前直接設定成 1000000 以利後續實驗。 更正後試跑已可正常進行統計: ![](https://i.imgur.com/TJMiICj.png) ::: --- 嘗試引入 fast doubling 算法加速 ![](https://i.imgur.com/EBIoEDE.png) 效能改善的情況高於我的預期,我原先以為 fast doubling 要到很大的 case 才會有顯著的改善。但從實驗結果可以發現,在 $n=400$ 左右兩種算法即出現執行時間的交叉。 試著仿造使用 `perf stat` 測量我寫的 fast doubling 算法到 $n=3000$ : ```shell $ sudo perf stat -e cycles,instructions,cache-misses,cache-references,branch-instructions,branch-misses ./exp 1 > /dev/null Performance counter stats for './exp 1': 118,264,893,816 cycles 264,687,477,706 instructions # 2.24 insn per cycle 15,142,354 cache-misses # 15.801 % of all cache refs 95,829,386 cache-references 38,631,044,731 branch-instructions 69,373,051 branch-misses # 0.18% of all branches 35.744624633 seconds time elapsed 0.931959000 seconds user 34.806501000 seconds sys ``` 可以看到我的實作程式碼有高達 15% 的 cache miss 。 我想利用 cachegrind 來測試是哪個函式發生最多 cache miss ,結果如下: ```shell $ sudo valgrind --tool=cachegrind ./exp 1 > /dev/null ==37329== Cachegrind, a cache and branch-prediction profiler ==37329== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al. ==37329== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==37329== Command: ./exp 1 ==37329== --37329-- warning: L3 cache found, using its data for the LL simulation. ==37329== ==37329== I refs: 69,563,678 ==37329== I1 misses: 1,159 ==37329== LLi misses: 1,128 ==37329== I1 miss rate: 0.00% ==37329== LLi miss rate: 0.00% ==37329== ==37329== D refs: 28,270,312 (24,764,894 rd + 3,505,418 wr) ==37329== D1 misses: 3,323 ( 2,585 rd + 738 wr) ==37329== LLd misses: 2,733 ( 2,070 rd + 663 wr) ==37329== D1 miss rate: 0.0% ( 0.0% + 0.0% ) ==37329== LLd miss rate: 0.0% ( 0.0% + 0.0% ) ==37329== ==37329== LL refs: 4,482 ( 3,744 rd + 738 wr) ==37329== LL misses: 3,861 ( 3,198 rd + 663 wr) ==37329== LL miss rate: 0.0% ( 0.0% + 0.0% ) $ sudo cg_annotate cachegrind.out.37329 -------------------------------------------------------------------------------- I1 cache: 32768 B, 64 B, 8-way associative D1 cache: 32768 B, 64 B, 8-way associative LL cache: 6291456 B, 64 B, 12-way associative Command: ./exp 1 Data file: cachegrind.out.37329 Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Thresholds: 0.1 100 100 100 100 100 100 100 100 Include dirs: User annotated: Auto-annotation: off -------------------------------------------------------------------------------- Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -------------------------------------------------------------------------------- 69,563,678 1,159 1,128 24,764,894 2,585 2,070 3,505,418 738 663 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw file:function -------------------------------------------------------------------------------- 33,078,041 4 4 15,024,011 1 0 3,012,013 0 0 ???:main 27,000,072 1 1 6,000,016 0 0 0 0 0 /build/glibc-sMfBJT/glibc-2.31/io/../sysdeps/unix/sysv/linux/write.c:write 6,054,221 27 24 3,027,082 11 0 15 1 1 ???:??? 1,493,991 48 47 398,997 19 6 261,001 6 2 /build/glibc-sMfBJT/glibc-2.31/stdio-common/vfprintf-internal.c:__vfprintf_internal 648,170 4 4 141,032 4 1 114,012 1 0 /build/glibc-sMfBJT/glibc-2.31/libio/fileops.c:_IO_file_xsputn@@GLIBC_2.2.5 418,262 3 3 30,733 2 1 24,733 0 0 /build/glibc-sMfBJT/glibc-2.31/stdio-common/_itoa.c:_itoa_word 228,000 5 5 21,000 3 2 0 0 0 /build/glibc-sMfBJT/glibc-2.31/string/../sysdeps/x86_64/multiarch/strchr-avx2.S:__strchrnul_avx2 163,977 2 2 29,983 0 0 17,985 63 63 /build/glibc-sMfBJT/glibc-2.31/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms 90,000 4 4 18,000 3 0 33,000 3 2 /build/glibc-sMfBJT/glibc-2.31/stdio-common/printf.c:printf 78,000 1 1 24,000 0 0 6,000 0 0 /build/glibc-sMfBJT/glibc-2.31/stdio-common/../libio/libioP.h:__vfprintf_internal 71,545 10 10 14,316 1,080 893 15 2 0 /build/glibc-sMfBJT/glibc-2.31/elf/dl-addr.c:_dl_addr ``` :::warning cachegrind 似乎無法配合 taskset 使用 ::: :::success 改用 ```shell sudo taskset -c 1 valgrind --tool=cachegrind ./exp 1 > /dev/null ``` 即可正常使用 cachegrind ,推測為 Valgrind 實作方式的影響 ::: 測試結果竟顯示 data cache miss 為 0% ,與從 perf stat 收到的資訊天差地遠。 可以觀察到 cache reference 在 perf 顯示 95,829,386 ;而在 cachegrind 顯示為 28,270,312 。 :::spoiler 進行更多次 perf 測試,發現每次的 cache references 竟有巨大落差: ```shell Performance counter stats for './exp 1': 112,781,466,006 cycles 261,497,199,771 instructions # 2.32 insn per cycle 6,962,452 cache-misses # 11.023 % of all cache refs 63,160,276 cache-references 38,179,989,631 branch-instructions 67,070,255 branch-misses # 0.18% of all branches 33.386655206 seconds time elapsed 0.975406000 seconds user 32.384290000 seconds sys ``` ```shell Performance counter stats for './exp 1': 108,714,836,439 cycles 264,555,729,876 instructions # 2.43 insn per cycle 1,520,786 cache-misses # 14.088 % of all cache refs 10,795,259 cache-references 38,609,436,202 branch-instructions 63,560,865 branch-misses # 0.16% of all branches 32.059672731 seconds time elapsed 0.847985000 seconds user 31.207482000 seconds sys ``` ::: \ 我不曉得為什麼會出現這樣的情況。 :::success 後續發現是測試次數過少而導致的數值不穩定,後面會補充結果。 ::: --- 回去觀摩[參考共筆](https://hackmd.io/@KYWeng/rkGdultSU#%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-1---%E6%94%B9%E5%AF%AB-bn_fib_fdoubling-%E5%AF%A6%E4%BD%9C%E7%9A%84%E6%96%B9%E5%BC%8F),發現因為我先前閱讀的重點都在運算加速,沒注意到資料搬移優化的部份,嘗試將其納入我的實作: ```diff for (int currbit = 1 << (32 - __builtin_clzll(k) - 1 - 1); currbit; currbit = currbit >> 1) { /* compute 2n-1 */ ubignum_mult(fast[1], fast[1], &fast[0]); ubignum_mult(fast[2], fast[2], &fast[3]); ubignum_add(fast[0], fast[3], &fast[3]); /* compute 2n */ ubignum_left_shift(fast[1], 1, &fast[4]); ubignum_add(fast[4], fast[2], &fast[4]); ubignum_mult(fast[4], fast[2], &fast[4]); n *= 2; if (k & currbit) { ubignum_add(fast[3], fast[4], &fast[0]); n++; - ubignum_copy(fast[2], fast[0]); - ubignum_copy(fast[1], fast[4]); + ubignum_swapptr(&fast[2], &fast[0]); + ubignum_swapptr(&fast[1], &fast[4]); } else { - ubignum_copy(fast[2], fast[4]); - ubignum_copy(fast[1], fast[3]); + ubignum_swapptr(&fast[2], &fast[4]); + ubignum_swapptr(&fast[1], &fast[3]); } } ``` 效能差異: ![](https://i.imgur.com/vchjKJS.png) 我測出來的結果和參考共筆的結論不符合,並沒有出現效能大幅改善的情形。 perf 測起來仍然非常不穩: :::spoiler perf 測試 ```shell $ sudo perf stat -e cycles,instructions,cache-misses,cache-references,branch-instructions,branch-misses ./exp 1 > /dev/null Performance counter stats for './exp 1': 99,537,523,424 cycles 241,725,789,043 instructions # 2.43 insn per cycle 1,579,228 cache-misses # 53.451 % of all cache refs 2,954,538 cache-references 34,729,788,080 branch-instructions 72,001,270 branch-misses # 0.21% of all branches 29.372597211 seconds time elapsed 0.963940000 seconds user 28.402259000 seconds sys ``` ```shell Performance counter stats for './exp 1': 110,100,048,298 cycles 251,991,005,629 instructions # 2.29 insn per cycle 11,886,534 cache-misses # 14.939 % of all cache refs 79,566,707 cache-references 36,196,420,296 branch-instructions 71,695,967 branch-misses # 0.20% of all branches 32.880962908 seconds time elapsed 0.939414000 seconds user 31.908111000 seconds sys ``` ::: \ 嘗試在 perf 命令設置參數 `-r 10` 來降低測量數值的波動,為避免時間過長,改為測試 fib 到 $n=1000$ 。 經過這個設定後,每次測試間得到的數值就不再有顯著差異,至少每個 events 的前兩個數字會保持不變。 ```shell $ bash measure ... Performance counter stats for 'taskset -c 1 ./exp 1' (10 runs): 23,018,839,194 cycles ( +- 0.27% ) 49,622,366,443 instructions # 2.16 insn per cycle ( +- 0.00% ) 1,634,045 cache-misses # 20.062 % of all cache refs ( +- 6.49% ) 8,145,159 cache-references ( +- 5.01% ) 8,319,541,416 branch-instructions ( +- 0.00% ) 13,417,461 branch-misses # 0.16% of all branches ( +- 0.18% ) 14.4259 +- 0.0388 seconds time elapsed ( +- 0.27% ) ``` 確立測試方法之後再回去重新測量使用 `ubignum_copy()` 的版本,得: ```shell $ bash measure ... Performance counter stats for 'taskset -c 1 ./exp 1' (10 runs): 25,215,127,347 cycles ( +- 2.31% ) 51,971,604,860 instructions # 2.06 insn per cycle ( +- 0.00% ) 2,423,932 cache-misses # 13.441 % of all cache refs ( +- 18.29% ) 18,033,276 cache-references ( +- 24.11% ) 8,811,912,111 branch-instructions ( +- 0.00% ) 13,058,167 branch-misses # 0.15% of all branches ( +- 1.10% ) 15.804 +- 0.367 seconds time elapsed ( +- 2.32% ) ``` 對照之下,從舊版到新版雖然 cache-miss-rate 由 13% 增長到 20% ,但我們可以注意到 cache-references 總數大約降低了一半,我推測這個差距來自於 copy 的過程對資料的 access 。 因此就 data cache 而言,因為省下這些操作,新版仍是更佳的實作方式。 20% 的 cache miss 非常驚人,因此目前最直接的改進目標就是要想辦法降低 miss rate 。 --- 引入 `ubignum_square()` ```shell Performance counter stats for 'taskset -c 1 ./exp 1' (10 runs): 25,405,898,533 cycles ( +- 1.21% ) 53,288,279,701 instructions # 2.10 insn per cycle ( +- 0.00% ) 1,370,567 cache-misses # 20.309 % of all cache refs ( +- 34.02% ) 6,748,682 cache-references ( +- 32.89% ) 9,033,980,854 branch-instructions ( +- 0.00% ) 12,907,672 branch-misses # 0.14% of all branches ( +- 0.62% ) 15.921 +- 0.193 seconds time elapsed ( +- 1.21% ) ``` 差異實在也不是很明顯,甚至是落後於全數使用 `ubignum_mult()` 的版本。 雖然我想要嘗試測到大一點的 case , --- 執行測試有時會出現頻繁的失敗,不確定原因是什麼,但與引入 `ubignum_square()` 應該沒有關聯,因為即使改回 mult 的版本也沒有恢復。其錯誤情景如下: ```shell $ sudo perf stat -r 10 -e cycles,instructions,cache-misses,cache-references,branch-instructions,branch-misses taskset -c 0 ./exp 1 > /dev/null taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 taskset: 強制結束 ``` 我想看看是否是 `fib_fast()` 的計算過程中的問題,例如 reallocation 失敗等等,因此嘗試引入 `flag` 變數來紀錄: :::spoiler `fib_fast()` ```c static ubn *fib_fast(long long k) { ubn *fast[5] = {NULL}; bool flag = true; if (k == 0) { flag &= ubignum_init(&fast[0]); ubignum_zero(fast[0]); return fast[0]; } else if (k == 1) { flag &= ubignum_init(&fast[0]); ubignum_uint(fast[0], 1); return fast[0]; } for (int i = 0; i < 5; i++) flag &= ubignum_init(&fast[i]); ubignum_zero(fast[1]); ubignum_uint(fast[2], 1); int n = 1; for (int currbit = 1 << (32 - __builtin_clzll(k) - 1 - 1); currbit; currbit = currbit >> 1) { /* compute 2n-1 */ flag &= ubignum_square(fast[1], &fast[0]); flag &= ubignum_square(fast[2], &fast[3]); //flag &= ubignum_mult(fast[1], fast[1], &fast[0]); //flag &= ubignum_mult(fast[2], fast[2], &fast[3]); flag &= ubignum_add(fast[0], fast[3], &fast[3]); /* compute 2n */ flag &= ubignum_left_shift(fast[1], 1, &fast[4]); flag &= ubignum_add(fast[4], fast[2], &fast[4]); flag &= ubignum_mult(fast[4], fast[2], &fast[4]); n *= 2; if (k & currbit) { flag &= ubignum_add(fast[3], fast[4], &fast[0]); n++; ubignum_swapptr(&fast[2], &fast[0]); ubignum_swapptr(&fast[1], &fast[4]); } else { ubignum_swapptr(&fast[2], &fast[4]); ubignum_swapptr(&fast[1], &fast[3]); } } ubignum_free(fast[0]); ubignum_free(fast[1]); ubignum_free(fast[3]); ubignum_free(fast[4]); if(!flag) printk(KERN_INFO "flag is %d\n", flag); return fast[2]; } ``` ::: \ 執行後於 `dmesg` 中並未看到任何 `flag` 相關的報錯,不過出現了一段紅底黑字: ``` [ 990.777318] BUG: unable to handle page fault for address: ffff96d435beb220 [ 990.777322] #PF: supervisor read access in kernel mode [ 990.777324] #PF: error_code(0x0000) - not-present page ``` --- ## June Rework 隔一段時間沒有著手於這份專案,回來先進行一些實驗重做等等,並補上一些之前忽略的內容。 由於第一次做的時候認為需要專注於提昇運算 Fibonacci number 的速度,而只測量計算消耗的時間。然而這類 character device 在使用上總是要 read,因此應該也進行實驗量測 user space 到 kernel space 的時間、以及資料複製的時間等等納入成本計算,進行效能的全盤考量,完整地進行實驗才能正確認知到效能瓶頸何在。 ### user-kernel transition time #### 時間量測 首先實驗量測由 user mode 進出 kernel mode 的耗時 :::spoiler code ```c for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); unsigned long long ucons = 0, kcons = 0; for (int j = 0; j < TEST_NUM; j++) { struct timespec t1, t2; clock_gettime(CLOCK_MONOTONIC, &t1); kcons += write(fd, NULL, select); clock_gettime(CLOCK_MONOTONIC, &t2); ucons += (unsigned long long) (t2.tv_sec * 1e9 + t2.tv_nsec) - (t1.tv_sec * 1e9 + t1.tv_nsec); } printf("%d,%llu,%llu,%llu\n", i, kcons / TEST_NUM, ucons / TEST_NUM, (ucons - kcons)/TEST_NUM); } ``` ```c /* write operation is skipped */ /* try to measure time within this function */ static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { ubn *N = NULL; // do not initialize ktime_t kt = ktime_get(); switch (size) { case 0: N = fib_sequence(*offset); break; case 1: N = fib_fast(*offset); break; default: return 0; } ubignum_free(N); kt = ktime_sub(ktime_get(), kt); return (ssize_t) ktime_to_ns(kt); } ``` ::: \ 以 iteration 算法測試,並使用較小的 offset 來突顯該數值,兩個時間相減大致可得 (user $\to$ kernel) $+$ (kernel $\to$ user) 的時間。 ![](https://i.imgur.com/XbQbwnF.png) 即使用上了[作業說明](https://hackmd.io/@sysprog/linux2022-fibdrv)和[參考共筆](https://hackmd.io/@KYWeng/rkGdultSU)提及的所有方式,仍會產生這種巨大的 peak。從 csv 檔觀察,發現其位於 $n = 13,43,44,63,64$ - fib(13) = 233 - fib(43) = 433494437 - fib(44) = 701408733 - fib(63) = 6557470319842 - fib(64) = 10610209857723 第一猜想是使用 `ubignum_resize()` 導致的消耗,但結果這幾個值看起來也並非是 `UINT32_MAX` 或 `UINT64_MAX`,從此 value 上看不出有特別意義。接著又測試幾次,發現每次 peak 都產生在不同地方。 試著排除 peak 點再次作圖,結果發現在其它情況我的量測也不太穩定。 ![](https://i.imgur.com/bdrU2Lr.png) 換成另一種叫簡易的方式來測量:將 `fib_write()` 函式清空並直接把 user space 下的來回時間當成是 transition time。 :::success 注意到這時 x 軸 $n$ 不再有意義 ::: ![](https://i.imgur.com/BpktAXO.png) 無論怎麼做都一直量到 peak,但可以看出這個來回時間幾乎都非常貼近 1000 ns。 為了考察這個問題,我試著把每次測試中過大的的耗時數值印出來,結果發現偶爾會出現 case 有十倍左右的時間消耗。於是我了解到[排除統計上的離群值](https://hackmd.io/@sysprog/linux2022-fibdrv#%E7%94%A8%E7%B5%B1%E8%A8%88%E6%89%8B%E6%B3%95%E5%8E%BB%E9%99%A4%E6%A5%B5%E7%AB%AF%E5%80%BC)有其必要性,果然是不該偷吃步。 #### 排除統計離群值 在統計的過程中,對於某些不特定的 $n$ 會出現標準差大於平均值的奇怪現象,於是我同樣把過程中的資料全部存下來,再去奇怪的 $n$ 看看出了什麼問題,借助 `sprintf()` 的小技巧來把資料全部存起來: ```c char name[128]; sprintf(name, "data/%d.dat", i); FILE *f = fopen(name, "w"); ``` 我這裡跑完時間測量後,看到 $n=67$ 時 $\mu=1073, \; \sigma=1322$,於是查詢 **data/67.dat**,原來其中一次的耗時測量到 42656。然而經過離群值的排除,調整後的平均值正常地落在了 1085。 :::spoiler outlier elimination ```c /* Eliminate 5% outliers and compute the average. */ uint64_t average(int64_t *nums, int numsSize) { int64_t mu = 0; for (int i = 0; i < numsSize; i++) mu += nums[i]; mu /= numsSize; int64_t deviation = 0; for (int i = 0; i < numsSize; i++) deviation += (nums[i] - mu) * (nums[i] - mu) / numsSize; deviation = sqrt(deviation); for (int i = 0; i < numsSize; i++) if (nums[i] >= mu + 2 * deviation || nums[i] <= mu - 2 * deviation) nums[i] = -1; // printf("%ld, %ld\t", mu, deviation); uint64_t ans = 0; for (int i = 0; i < numsSize; i++) if (nums[i] != -1) ans += nums[i]; return ans / (numsSize / 20 * 19); } ``` 其中 `sqrt()` 為我自己實作的整數開根號函式。 ::: \ 至此我們成功排除掉了 peak 的問題,於是耗時的細微變化便可在圖表上呈現出來。 ![](https://i.imgur.com/32bLFD9.png) 圖表的刻度變小之後,可以看出測量上有 100 ~ 300 ns 的起伏,仍無法達到如[ KYG 同學的平滑結果](https://hackmd.io/@KYWeng/rkGdultSU#user-space-%E8%88%87-kernel-space-%E7%9A%84%E5%82%B3%E9%81%9E%E6%99%82%E9%96%93)。這裡由於任何 $n$ 的輸入都是直接 return 沒有計算,因此最理想的狀況應該要是一條水平線。 打算做一個 commit 時收到警告訊息: ``` Dangerous function detected in /home/tinyynoob/code/fibdrv/exp.c 54: sprintf(name, "data/%d.dat", i); ``` 於是改為使用 `snprint()` 並順利完成 commit。 #### 於 desktop 量測 由於怎麼測總是拿到漂浮的數據,我想 laptop 或許天生就較為不穩定,因此我想從 laptop 改成用 desktop 跑跑看是否會出現相同的結果。 :::spoiler cpu info ``` 架構: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數: 1 每通訊端核心數: 4 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 94 Model name: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz 製程: 3 CPU MHz: 2511.819 CPU max MHz: 3600.0000 CPU min MHz: 800.0000 BogoMIPS: 6399.96 虛擬: VT-x L1d 快取: 128 KiB L1i 快取: 128 KiB L2 快取: 1 MiB L3 快取: 6 MiB NUMA node0 CPU(s): 0-3 Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cach e flushes, SMT disabled Vulnerability Mds: Mitigation; Clear CPU buffers; SMT disabled Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v ia prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_ FW, STIBP disabled, RSB filling Vulnerability Srbds: Mitigation; Microcode Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT disabled Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr r pge mca cmov pat pse36 clflush dts acpi mmx f xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd tscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperf mperf pni pclmulqdq dtes64 monitor ds_cpl vmx s mx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid s se4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t imer aes xsave avx f16c rdrand lahf_lm abm 3dno wprefetch cpuid_fault invpcid_single pti ssbd i brs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsa ves dtherm ida arat pln pts hwp hwp_notify hwp_ act_window hwp_epp md_clear flush_l1d ``` ::: \ 然而同樣的程式碼和 scripts 移過來執行竟然出現 segmentation fault。 我使用 gdb 工具查詢,發現出錯在 `fprintf()` 處,我的 `f` 意圖打開目錄 **data** 下的檔案,然而不存在這個 directory 因此產生記憶體區段錯誤,`mkdir data` 後解決。 跑出來的圖表: ![](https://i.imgur.com/UlmxlW0.png) 雖然看似仍有很大的浮動,但仔細觀察可以注意到 y 軸的刻度由 50 降至 10,確實穩定了不少。從比較大的 scale 看是像這樣子: ![](https://i.imgur.com/my86CIE.png) 雖然還沒辦法變成完全的水平線,但已從最初上千 ns 的 peak 改善許多。 > 版本 commit id:b80d536b08bd1a3c3325b53d684f97e39ac734f9 接著再把之前從 `fib_write()` 清空的內容補回進行測試: ![](https://i.imgur.com/uWsgTOd.png) 可以看出無論 $n$ 的變化,user space 到 kernel space 的來回時間穩定落在 600 ns 左右。 即使一路算到 $n=500$ 也是如此: ![](https://i.imgur.com/GHOdAgH.png) > 版本 id:766a4b80f6fbcf8a0f5ee0931113b16e6ea629ec ### 運算時間 繼續使用 desktop 來重新量測兩種 Fibonacci number 算法在 kernel space 的耗時: ![](https://i.imgur.com/P2ik2ZG.png) 像之前一樣看到我實作的 fast fib 算法會出現這種階梯形的跳躍現象,這次我想要好好研究這些階梯的成因。 先從小一點的 case 開始探討: ![](https://i.imgur.com/6kLMTPL.png) 從 **fast.csv** 查看發現跳點出現在 $n=2,4,8,16,32,64$,每次往上跳 500 或 1000 ns。每個跳躍竟然都恰好落在 2 的冪,這肯定有很好的理由可以解釋。 仔細想想之後也很直觀,因為 fast fib 算法每次迴圈都會將 $n$ 加倍,因此需要 $\log(n)$ 次 iteration 以算完要求結果。而這麼解釋就代表每個跳躍便是 fast doubling 算法 each iteration 的耗時,從 `fast_fib()` 程式碼也可以看出有個迴圈: ```c int n = 1; for (int currbit = 1 << (32 - __builtin_clzll(k) - 1 - 1); currbit; currbit = currbit >> 1) { /* compute 2n-1 */ flag &= ubignum_square(fast[1], &fast[0]); flag &= ubignum_square(fast[2], &fast[3]); flag &= ubignum_add(fast[0], fast[3], &fast[3]); /* compute 2n */ flag &= ubignum_left_shift(fast[1], 1, &fast[4]); flag &= ubignum_add(fast[4], fast[2], &fast[4]); flag &= ubignum_mult(fast[4], fast[2], &fast[4]); n *= 2; if (k & currbit) { flag &= ubignum_add(fast[3], fast[4], &fast[0]); n++; ubignum_swapptr(&fast[2], &fast[0]); ubignum_swapptr(&fast[1], &fast[4]); } else { ubignum_swapptr(&fast[2], &fast[4]); ubignum_swapptr(&fast[1], &fast[3]); } } ``` 在這裡可以簡單比較兩種算法的特性如下: ||sequencial|fast doubling| |:-:|:-:|:-:| |迭代次數|$n$|$\lfloor\log(n)\rfloor$| |每輪迭代的複雜度|低,約 ==16== ns|高,約 ==834== ns| > 該數字使用 $n=0$ 到 $n=500$ 的測量結果算得 ### 比較 #### KYG 同學 為了確認自己實作的水準,clone [KYG 同學的 fibdrv](https://github.com/KYG-yaya573142/fibdrv) 進行對比,使用 optimize-bn 分支在我的電腦上執行 `make plot`,結果超乎我的預期: ![](https://i.imgur.com/wVMqyny.png) 耗時僅僅我的 $1/10$,顯然我的程式碼有不只一點改進空間。此外,為何使用相同的設定,它的程式就能有非常平滑的結果,而我卻會一直抖動,這也需要再好好研究。 但仔細去看程式碼發現其測量有誤,我 clone 到的版本並非 bignum 的實作。接著繼續閱讀之後,發現同學並未把自己的最新版本推上 github,因此我無法在自己的電腦上與其進行耗時比較。 :::info 需要找找看哪個同學的程式碼適合拿來做效能對比 ::: ### 研讀 [0xff07 同學的實作](https://github.com/0xff07/bignum/tree/fibdrv) 研讀同學的實作,看看是否有值得借鏡的地方。 #### apm_internal.h > 此 apm 取的是 Arbitrary-Precision arithmetic 的意思 首先我看到同學將 inline assembly 的部份包裝成 macro: ```c #if defined(__amd64__) || defined(__x86_64__) #define digit_mul(u, v, hi, lo) \ __asm__("mulq %3" : "=a"(lo), "=d"(hi) : "%0"(u), "rm"(v)) #define digit_div(n1, n0, d, q, r) \ __asm__("divq %4" : "=a"(q), "=d"(r) : "0"(n0), "1"(n1), "rm"(d)) #endif ``` 我認為這是非常好的想法,由於 inline assembly 的語法邏輯跟 C 的邏輯有些不同,因此包裝起來可以使人閱讀時統一依 C 的邏輯理解,較為易讀,同時也方便重複使用。 同時他在無法使用 x86 assembly 的情況下仍提供了對應實作: ```c #ifndef digit_mul #define digit_mul(u, v, hi, lo) \ do { \ apm_digit __u0 = (u); \ apm_digit __u1 = __u0 >> APM_DIGIT_HSHIFT; \ __u0 &= APM_DIGIT_LMASK; \ apm_digit __v0 = (v); \ apm_digit __v1 = __v0 >> APM_DIGIT_HSHIFT; \ __v0 &= APM_DIGIT_LMASK; \ apm_digit __lo = __u0 * __v0; \ apm_digit __hi = __u1 * __v1; \ __u1 *= __v0; \ __v0 = __u1 << APM_DIGIT_HSHIFT; \ __u1 >>= APM_DIGIT_HSHIFT; \ __hi += __u1 + ((__lo += __v0) < __v0); \ __v1 *= __u0; \ __u0 = __v1 << APM_DIGIT_HSHIFT; \ __v1 >>= APM_DIGIT_HSHIFT; \ __hi += __v1 + ((__lo += __u0) < __u0); \ (lo) = __lo; \ (hi) = __hi; \ } while (0) #endif ``` #### apm.c apm.h 這部份提供許多大數的 operations 包括: - increment/decrement - add - sub - dmul,計算 partial product 的函式 - shift - cmp 預設是 `u` 和 `v` 運算後將結果放到 `w`,其中沒看到計算 apm 乘法的函式,僅有計算 partial product 的 `dmul()`。 :::success apm 乘法被獨立寫在 **mul.c** ::: - 帶 *i* postfix 的 operations 似乎是代表 in-place,都是 `u`, `v` 運算後放到 `u` - 帶 *n* postfix 的 operations 不確定語意上是代表哪個英文單字,不過其共通點為 `u`, `v` 兩數的初始 size 相同 #### mul.c 在其乘法實作當中混用了兩種乘法 - base,最原始的直式乘法方式 - [Karatsuba multiplication](https://en.wikipedia.org/wiki/Karatsuba_algorithm) 短碼時使用直式乘法,長碼時使用 Karatsuba,使兩者各展所長。 ##### Karatsuba multiplication 解說 假設我們要作 $x\times y$,那麼藉由分解較大的 $x$ 及 $y$,可以達到: $$ \begin{aligned} x\times y &= (x_1\times 2^b+x_0)\times(y_1\times 2^b+y_0) \\ &= (x_1 y_1)2^{2b} + (x_1 y_0+ x_0 y_1)2^b + (x_0 y_0) \end{aligned} $$ 因 $(x_1 y_0+ x_0 y_1) = (x_1+x_0)\times(y_1+y_0) - x_1y_1 - x_0y_0$,可將乘法次數由 4 次降到 3 次。 此外每個 $x_iy_i$ 又呈現 $x\times y$ 的形式,因此可以再次遞迴下去,總結而言這是個 divide and conquer 的算法。 #### bn.h 此部份將前面大數相關的 type 使用 struct 包起來: ```c typedef struct { apm_digit *digits; /* Digits of number. */ apm_size size; /* Length of number. */ apm_size alloc; /* Size of allocation. */ unsigned sign : 1; /* Sign bit. */ } bn, bn_t[1]; ``` 並使用 apm 的介面提供一系列的 arithmetic operations,再次包裝成 bn 介面。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully