2024q1 Homework6 (integration)

# 2024q1 Homework6 (integration) contributed by < `dockyu` > ## 自我檢查清單 - [x] 研讀前述 Linux 效能分析描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作從中也該理解為何不希望在虛擬機器中進行實驗; - [x] 閱讀〈Linux 核心模組運作原理〉並對照 Linux 核心原始程式碼 (v6.1+)，解釋 insmod 後，Linux 核心模組的符號 (symbol) 如何被 Linux 核心找到 (使用 List API)、MODULE_LICENSE 巨集指定的授權條款又對核心有什麼影響 (GPL 與否對於可用的符號列表有關)，以及藉由 strace 追蹤 Linux 核心的掛載，涉及哪些些系統呼叫和子系統？ - [ ] 探討 Timsort, Pattern Defeating Quicksort (pdqsort) 及 Linux 核心 lib/sort.c 在排序過程中的平均比較次數，並提供對應的數學證明; - [ ] 解釋 xoroshiro128+ 的原理 (對照〈Scrambled Linear Pseudorandom Number Generators〉論文)，並利用 ksort 提供的 xoro 核心模組，比較 Linux 核心內建的 /dev/random 及 /dev/urandom 的速度，說明 xoroshiro128+ 是否有速度的優勢？其弱點又是什麼？ #### Linux 效能分析的重點 :::danger 查閱第一手材料了嗎？ ::: 1. 時間的計算: 我看不懂 jiffies ， ktime_t 就是一個可以用來取得當前時間並計算時間差的結構 2. 不受干擾的環境: CPU affinity 可以將行程綁定在某顆 CPU 上，也可以在開機時指定 CPU 只給綁定的行程使用。文中還有提到一些可以排除干擾的指令 3. 統計方法: 排除分佈兩端的極端值 :::danger 注意用語： * virtual machine 是「虛擬機器」，不該簡稱為「機」，否則無法區分 engine 的翻譯 * physical 不作為學科時，翻譯為「實體」 > 了解，已修改 ::: 就算在虛擬機器上建立了排除干擾，但是在實體的 CPU 上其實還是存在，因為虛擬機只是模擬而無法直接操作硬體，所以不應該在虛擬機進行 #### 閱讀〈Linux 核心模組運作原理〉參考 [vax-r/linux2024-homework6](https://hackmd.io/@vax-r/linux2024-homework6) strace insmod 可以找到 `finit_module(3, "", 0)` ， finit_module -> load_module -> [simplify_symbols](https://elixir.bootlin.com/linux/v4.18/source/kernel/module.c#L2237) ###### [kernel/module.c](https://elixir.bootlin.com/linux/v4.18/source/kernel/module.c#L2237) ```c static int simplify_symbols(struct module *mod, const struct load_info *info) { Elf_Shdr *symsec = &info->sechdrs[info->index.sym]; Elf_Sym *sym = (void *)symsec->sh_addr; const struct kernel_symbol *ksym; for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) { switch (sym[i].st_shndx) { switch (sym[i].st_shndx) { case SHN_COMMON: case SHN_ABS: case SHN_LIVEPATCH: case SHN_UNDEF: case SHN_UNDEF: ksym = resolve_symbol_wait(mod, info, name); default: } } ``` 可以從上面的程式碼知道首先會從核心模組的 ELF 中提取 symbol table 的 section ，並遍歷所有 symbol ，如果是 undefined 那就代表這個 symbol 是在外部的所以要用 resolve_symbol_wait 去查找 [resolve_symbol_wait](https://elixir.bootlin.com/linux/v4.18/source/kernel/module.c#L1427) -> [resolve_symbol](https://elixir.bootlin.com/linux/v4.18/source/kernel/module.c#L1385) -> [find_symbol](https://elixir.bootlin.com/linux/v4.18/source/kernel/module.c#L576) -> [each_symbol_section](https://elixir.bootlin.com/linux/v4.18/source/kernel/module.c#L441) -> [list_for_each_entry_rcu](https://elixir.bootlin.com/linux/v4.18/source/include/linux/rculist.h#L351) ###### rculist.h/list_for_each_entry_rcu ```c /** * list_for_each_entry_rcu - iterate over rcu list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. * @member: the name of the list_head within the struct. * * This list-traversal primitive may safely run concurrently with * the _rcu list-mutation primitives such as list_add_rcu() * as long as the traversal is guarded by rcu_read_lock(). */ #define list_for_each_entry_rcu(pos, head, member) \ for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \ &pos->member != (head); \ pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) ``` list_entry_rcu 是一個用 List API 用來遍歷 rcu list #### linux 檔案系統 :::danger 注意用語： * file 是「檔案」，而非「文件」(document) > 已修改 ::: :::danger 注意用語： * access 是「存取」，而非「訪問」(visit) * 參見[詞彙對照表](https://hackmd.io/@l10n-tw/glossaries) ::: :::danger 查閱第一手材料，避免閱讀低劣的簡體中文翻譯。 ::: 試著執行 `cat /dev/random` 會一直顯示無法解碼的隨機數到終端 kernel module 提供的功能可以透過 device (`/dev/<device>`)讓用戶空間透過讀寫操作使用，這也是這一次作業要做的 :::danger 什麼是「噴出」？ > 已修改 expose 不該翻譯為「暴露」，你學習漢語多年，可以好好說話嗎？ ::: ## 作業主題一: 並行的混合排序 ksort 已經定義 /dev/sort 的 read 操作如下(有簡化過)，會掉用 sort_main 函式 ```c static ssize_t sort_read(struct file *file, char *buf, size_t size, loff_t *offset) { void *sort_buffer = kmalloc(size, GFP_KERNEL); len = copy_from_user(sort_buffer, buf, size); sort_main(sort_buffer, size / es, es, num_compare); len = copy_to_user(buf, sort_buffer, size); kfree(sort_buffer); } static const struct file_operations fops = { .read = sort_read, } ``` sort_main 的定義如下(有簡化過)，會執行 qsort_algo ```c static void init_qsort(struct qsort *q, void *elems, size_t size, struct common *common) { INIT_WORK(&q->w, qsort_algo); } static void qsort_algo(struct work_struct *w) { // qsort algorithm } void sort_main(void *sort_buffer, size_t size, size_t es, cmp_t cmp) { init_qsort(q, sort_buffer, size, &common); queue_work(workqueue, &q->w); } ``` 這次作業的目標是要在 sort module 實做3種排序演算法，並且可以切換，我認為應該要在 sort 設備的 write 操作進行設定，並在 sort_main 判斷要用哪一個用一個枚舉表示 ```c typedef enum { QSORT, TIMSORT, PDQSORT, LIB_SORT } sort_method_t; ``` sort(read) 前先設定(write) ```c sort_method_t method = TIMSORT; if (write(fd, &method, sizeof(method)) != sizeof(method)) { perror("Failed to set sort method"); close(fd); goto error; } ssize_t r_sz = read(fd, inbuf, size); ``` > [commit 80b7ece](https://github.com/dockyu/ksort/commit/80b7ecef54de4db6a9fa91a02cf1203e6ad4c8fd) sort 時判斷要用哪一個演算法 ```c switch (sort_method) { case QSORT: printk(KERN_INFO "Start sorting: qsort\n"); break; case TIMSORT: printk(KERN_INFO "Start sorting: timsort\n"); break; case PDQSORT: printk(KERN_INFO "Start sorting: pdqsort\n"); break; case LIB_SORT: printk(KERN_INFO "Start sorting: lib/sort\n"); break; } ``` > [commit a201a1b](https://github.com/dockyu/ksort/commit/a201a1b178fa0c2a05d1452af6ccb97e47ec4eb4) ### workqueue 運作方式 ```c struct workqueue_struct *workqueue; struct qsort { struct work_struct w; struct common *common; void *a; size_t n; }; ``` 查看 [workqueue_types.h](https://elixir.bootlin.com/linux/latest/source/include/linux/workqueue_types.h#L16) ```c typedef void (*work_func_t)(struct work_struct *work); struct work_struct { atomic_long_t data; struct list_head entry; work_func_t func; } ``` 可以發現 func 是一個 pointer to function ，接著看到 ksort 的 qsort_algo 函式，這是一個排序演算法的實現也就是這個 work 實際要執行的函式，函式原型吻合 work_func_t 的型別 ```c static void qsort_algo(struct work_struct *w) ``` 在看到 init_qsort 函式使用的一個 `INIT_WORK(&q->w, qsort_algo);` ```c static void init_qsort(struct qsort *q, void *elems, size_t size, struct common *common) { INIT_WORK(&q->w, qsort_algo); q->a = elems; q->n = size; q->common = common; } ``` INIT_WORK 定義在 [workqueue.h#L288](https://elixir.bootlin.com/linux/latest/source/include/linux/workqueue.h#L288) ```c #define INIT_WORK(_work, _func) \ __INIT_WORK((_work), (_func), 0) ``` 展開 `INIT_WORK` ```c __INIT_WORK((&q->w), (qsort_algo), 0) ``` `__INIT_WORK` 的定義 ```c #define __INIT_WORK(_work, _func, _onstack) \ do { \ static __maybe_unused struct lock_class_key __key; \ \ __INIT_WORK_KEY(_work, _func, _onstack, &__key); \ } while (0) ``` 展開 `__INIT_WORK` ```c do { static __maybe_unused struct lock_class_key __key; __INIT_WORK_KEY(&q->w, qsort_algo, 0, &__key); } while (0) ``` `__INIT_WORK_KEY` 的定義 ```c #define __INIT_WORK_KEY(_work, _func, _onstack, _key) \ do { \ __init_work((_work), _onstack); \ (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \ INIT_LIST_HEAD(&(_work)->entry); \ (_work)->func = (_func); \ } while (0) ``` 展開 `__INIT_WORK_KEY` ```c do { static __maybe_unused struct lock_class_key __key; do { __init_work((&q->w), 0); (&q->w)->data = (atomic_long_t) WORK_DATA_INIT(); INIT_LIST_HEAD(&(&q->w)->entry); (&q->w)->func = (qsort_algo); } while (0) } while (0) ``` 可以看到 `(&q->w)->func = (qsort_algo);` 會把 work_struct 的 func 設為 qsort_algo ```c init_qsort(q, sort_buffer, size, &common); ``` 接著 init_qsort 後半部將要排序的資料 (sort_buffer) 放進 struct qsort init_qsort 之後是 queue_work ```c queue_work(workqueue, &q->w); ``` 查看 [workqueue.h#L545](https://elixir.bootlin.com/linux/latest/source/include/linux/workqueue.h#L545) ```c static inline bool queue_work(struct workqueue_struct *wq, struct work_struct *work) { return queue_work_on(WORK_CPU_UNBOUND, wq, work); } ``` 查看 [kernel/workqueue.c#L1828](https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L1828) `queue_work_on` 裡面呼叫 ```c __queue_work(cpu, wq, work); ``` 查看 [kernel/workqueue.c#L1705](https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L1705) `__queue_work` 裡面呼叫 ```c static void __queue_work(int cpu, struct workqueue_struct *wq, struct work_struct *work) { struct pool_workqueue *pwq; struct worker_pool *last_pool, *pool; if (likely(pwq->nr_active < pwq->max_active)) { insert_work(pwq, work, &pool->worklist, work_flags); } else { insert_work(pwq, work, &pwq->inactive_works, work_flags); } ``` 查看 [kernel/workqueue.c#L1647](https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L1647) `insert_work` ，參數的 head 在正常情況下是 worker_pool 的 worklist 屬性， work_struct 的 entry 被加進去 ```c static void insert_work(struct pool_workqueue *pwq, struct work_struct *work, struct list_head *head, unsigned int extra_flags) { list_add_tail(&work->entry, head); ``` 從 [kernel/workqueue.c](https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L171) 可以看到 worklist 的註解 ```c struct worker_pool { struct list_head worklist; /* L: list of pending works */ ``` 閱讀 [kernel/workqueue.c#L2539](https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L2539) 可以看到我覺得最精妙的設計，先複習一下 struct qsort 裡的屬性有 work_struct 以及要排序的資料以及 compare function ，接著 work_struct 裡的屬性有一個 func 這是一個指到要執行的函式的指標，這個函式接受一個 work_struct ????? **這不是套娃嗎??????**，但是又好像非常合理因為要排序的資料確實可以用 container_of 巨集拿到，我認為這種作法的好處是可以統一 work_struct ，不用為每種功能都定義自己 work_struct 而是將 work_struct 當成一個屬性，而 work_queue 只管理 work_struct ，需要的資料使用者可以自己用 container_of 拿到，以下程式碼展示 kernel 是怎麼做到 **我的屬性拿我自己當作參數傳入** ```c static void process_one_work(struct worker *worker, struct work_struct *work) { worker->current_func = work->func; worker->current_func(work); ``` ### Timsort 嘗試使用 [2024q1 第 1 週測驗題](https://hackmd.io/@sysprog/linux2024-quiz1) 中的 timsort 程式碼，只需要修改 compare function 為整數的形式參考 [vax-r/lab0-c](https://github.com/vax-r/lab0-c)， [millaker/lab0-c](https://github.com/millaker/lab0-c) commit: [8366658](https://github.com/dockyu/ksort/commit/83666581028a2a22bcfded9d2a0aca8c6cd203df) ### linux sort 原本的程式碼已經有 `#include <linux/sort.h>` ```c void sort(void *base, size_t num, size_t size, cmp_func_t cmp_func, swap_func_t swap_func); ``` 只需要有適合的 compare function 和 swap funciton 就可以使用， compare funciton 直接使用和 qsort 同一個就好 commit: [d3b5381](https://github.com/dockyu/ksort/commit/d3b53814e28d5bd6204e875341b78dc41ad817e3) ### userspace 測量時間 commit: [a58c220](https://github.com/dockyu/ksort/commit/a58c220e4299a40640e99a8d00841c513d0625b9) |最小資料大小|最大資料大小|步數| |:-:|:-:|:-:| |1000|20000|500| 資料太多或是跑太多次都有可能卡住，建議從較小的數字開始測試 |糟糕的資料|正常的資料| |:-:|:-:| |![image](https://hackmd.io/_uploads/B1BgyJVzA.png)|![userspace_sorting_times](https://hackmd.io/_uploads/S1iAwJNfR.png)| ### kernelspace 測量時間 ```c start = ktime_get(); queue_work(workqueue, &q->w); /* Ensure completion of all work before proceeding, as reliance on * objects allocated on the stack necessitates this. If not, there is a * risk of the work item referencing a pointer that has ceased to exist. */ drain_workqueue(workqueue); end = ktime_get(); ``` |kernel space|user space| |:-:|:-:| |![image](https://hackmd.io/_uploads/SkyqHzHMC.png)|![userspace_sorting_times](https://hackmd.io/_uploads/HyKX8zBG0.png)| commit: [7a9366c](https://github.com/dockyu/ksort/commit/7a9366c92b93b57b5627764ada6af3ed8bded7f7) ### 指定cpu執行未指定前的輸出 ```bash [92911.088946] Working on CPU 13 [92911.088955] Set sort method: Quick Sort [92911.089297] Set sort method: Tim Sort [92911.089312] Start sorting: timsort [92911.090026] Working on CPU 16 [92911.090027] Start timsort_algo [92911.092800] Set sort method: Tim Sort [92911.093131] Set sort method: Library Sort [92911.093141] Start sorting: linux/sort [92911.093145] Working on CPU 16 [92911.094921] Set sort method: Library Sort [92911.095267] Set sort method: Quick Sort [92911.095278] Start sorting: qsort [92911.095281] Working on CPU 16 ``` ```diff - queue_work(workqueue, &l->w); + queue_work_on(cpu_id, workqueue, &l->w); ``` 指定後的輸出 ```bash [93789.082169] Working on CPU 10 [93789.082170] Working on CPU 10 [93789.082170] Working on CPU 10 [93789.082177] Set sort method: Quick Sort [93789.082452] Set sort method: Tim Sort [93789.082458] Start sorting: timsort [93789.082800] Working on CPU 10 [93789.082800] Start timsort_algo [93789.085116] Set sort method: Tim Sort [93789.085365] Set sort method: Library Sort [93789.085370] Start sorting: linux/sort [93789.085372] Working on CPU 10 [93789.086783] Set sort method: Library Sort [93789.134480] sort: unloaded [93789.178624] XORO: Exit ``` ### 綁定 cpu ![image](https://hackmd.io/_uploads/S1GeLXIGR.png) ```bash $ ps -eo pid,psr,cmd | awk '$2 == 10' 44 10 [cpuhp/10] 45 10 [idle_inject/10] 46 10 [migration/10] 47 10 [ksoftirqd/10] 143 10 [oom_reaper] 169 10 [mld] 667 10 [card0-crtc0] 668 10 [card0-crtc1] 669 10 [card0-crtc2] 913 10 /lib/systemd/systemd-logind 1015 10 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 129:137 2169 10 /usr/libexec/packagekitd 2550 10 /usr/libexec/gvfs-afc-volume-monitor 2575 10 /usr/libexec/goa-identity-service 2595 10 /usr/libexec/gdm-x-session --run-script env GNOME_SHELL_SESSION_MODE=ubuntu /usr/bin/gnome-session --session=ubuntu 2718 10 /usr/libexec/gnome-session-ctl --monitor 2859 10 /snap/snapd-desktop-integration/157/usr/bin/snapd-desktop-integration 12350 10 /usr/libexec/gnome-terminal-server 49373 10 /usr/share/code/code --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=none --crashpad-handler-pid=49350 --enable-crash-reporter=ae5778e7-98d3-44f9-a135-93a16359f3ea,no_channel --user-data-dir=/home/dockyu/.config/Code --standard-schemes=vscode-webview,vscode-file --enable-sandbox --secure-schemes=vscode-webview,vscode-file --cors-schemes=vscode-webview,vscode-file --fetch-schemes=vscode-webview,vscode-file --service-worker-schemes=vscode-webview --code-cache-schemes=vscode-webview,vscode-file --shared-files=v8_context_snapshot_data:100 --field-trial-handle=0,i,17914157464607285421,3578834844614996182,262144 --disable-features=CalculateNativeWinOcclusion,SpareRendererForSitePerProcess 49493 10 /home/dockyu/.vscode/extensions/ms-vscode.cpptools-1.19.9-linux-x64/bin/cpptools 50449 10 [kworker/10:5H-ttm] 50451 10 [kworker/10:7H-ttm] 50874 10 [kworker/10:0H-ttm] 50933 10 [kworker/10:1H-ttm] 50934 10 [kworker/10:2H-ttm] 50935 10 [kworker/10:3H-ttm] 50936 10 [kworker/10:4H-ttm] 50937 10 [kworker/10:6H-ttm] 50938 10 [kworker/10:8H-ttm] 52961 10 [kworker/10:9H-ttm] 52962 10 [kworker/10:10H-ttm] 52963 10 [kworker/10:11H-ttm] 52964 10 [kworker/10:12H-ttm] 52965 10 [kworker/10:13H-ttm] 52966 10 [kworker/10:14H-ttm] 52967 10 [kworker/10:15H-ttm] 52968 10 [kworker/10:16H] 53616 10 [kworker/10:1-events] 60136 10 [kworker/10:0-mm_percpu_wq] 60272 10 /snap/firefox/4173/usr/lib/firefox/firefox -contentproc -childID 149 -isForBrowser -prefsLen 33447 -prefMapSize 249080 -jsInitLen 234952 -parentBuildID 20240419214016 -greomni /snap/firefox/4173/usr/lib/firefox/omni.ja -appomni /snap/firefox/4173/usr/lib/firefox/browser/omni.ja -appDir /snap/firefox/4173/usr/lib/firefox/browser {a0f3b1dc-0b7e-4f91-b433-f31d23320b5d} 11743 true tab ``` 更改 `/etc/default/grub` ```diff - GRUB_CMDLINE_LINUX_DEFAULT=" " + GRUB_CMDLINE_LINUX_DEFAULT="isolcpus=10" ``` ```bash $ ps -eo pid,psr,cmd | awk '$2 == 10' 16 10 [rcu_preempt] 44 10 [cpuhp/10] 45 10 [idle_inject/10] 46 10 [migration/10] 47 10 [ksoftirqd/10] 48 10 [kworker/10:0-events] 49 10 [kworker/10:0H-kblockd] 134 10 [kdevtmpfs] 136 10 [kauditd] 183 10 [kworker/10:1-events] 343 10 [jbd2/nvme0n1p7-8] 2993 10 [kworker/10:1H-kblockd] ``` ## 論文閱讀 Xorshift RNGs 提出 Xorshift RNGS，可以通過各種亂度檢驗各種 xorshift RNGs 的變形的週期(period) 可以達到 $2^{160}$ ，代表這個方法快又可靠(週期足夠大) 其他 RNG 也可以達到這麼大的週期，但是通常會需要乘法運算，而 xorshift RNG 只需要 xor 和 shift :::success T 是 nonsingular matrix 代表 T 有反矩陣 T 的 order 是 K 代表 $T^K$ 是單位矩陣，且 K 為最小的可能 Cayley-Hamilton 定理可以將矩陣的高維運算轉為低維的，ex: $T^3=3T+2I$ 可以用奇異值分解（SVD）方法快速判斷一個矩陣是不是 nonsingular matrix ::: 這句好像寫錯 ![image](https://hackmd.io/_uploads/H1PQOGwyC.png) --- ### 推導 #### 週期性函數 $\beta$ 是一個 $1*n$ 的 binary vector ex: [0, 1, 1, 0, 1, 1, 0, 0,...] $T$ 是一個 $n*n$ 的 binary matrix $T$ 的 order 為 $2^n-1$ 的**充要條件**為 $T$ 是 non-singular matrix 如果 $T$ 的 order 為 $2^n-1$ 代表 sequence: $\beta, \beta T, \beta T^2, \beta T^3 \dots$ 的週期是 $2^n-1$ ，因為 $T^{2^n-1}=I$ ，得出 $\beta T^{2^n-1}=\beta I=\beta$ 直到第 $2^n-1$ 個才會重複如果找到一個 $T$ 那麼就可以做出一個週期為 $2^n-1$ 的 RNG，且因為 $\beta$ 是 binary vector 剛好就是二進制表示的數字形式所有 non-singular 的 nxn matrix 都可以作為 $T$，但是**矩陣乘法太慢了** --- #### 快速操作如果 $\beta T$ 能變成在電腦上的簡單操作(ex: xor, shift) 就可以加快速度，但是並不是每一個 $T$ 都可以轉換成這種操作論文提出一種方法如果假設 $y$ 是一個 n-bits 表示的數字， $y\oplus (y<<a)$ 操作等同於 $y * (I+L^a)$ ，$L^a$ 是一個 nxn 矩陣，是基於 $a$ 的(並非a次方) :::info ##### 範例如果 $y=[0,1,1,0]$ ， $a=2$ 那麼 $y\ll a = [1,0,0,0]$ $L^a$ 是 $I$ 向下位移 $a$ 個位子 \begin{align} I= \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{bmatrix} ,L^a= \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 \end{bmatrix} \end{align} \begin{align} I+L^a= \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 \end{bmatrix} \end{align} \begin{align} y(I+L^a)= \begin{bmatrix} 0 & 1 & 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 \end{bmatrix} \end{align} 可以發現 $y(I+L^a)$ 之後在對每一格取 $\mod 2$ 其實等同於 $y\oplus (y \ll a)$ ::: ###### $T=(I+L^a)$ 所以找到一個數字 $a$ 可以得出 $L^a$ ，如果$T=I+L^a$ 且 $T$ 是 non-singular 那麼 $\beta T$ 可以視為 $\beta\oplus(\beta\ll a)$ ，在電腦上可以快速做到因為電腦上數字是用 32-bits 表示，所以 n 要是 32 的倍數 ex: 32, 64, 96，當 n 為 32 時 $a$ 只有可能是 1~32(因為 $\beta$ 的長度是 32)，這 32 個 $a$ 得到的 $T$ 是 non-singular 嗎? 可惜都不是，這代表 $T=I+L^a$ 這個形式的 $T$ 的 order 不會是 $2^n-1$ 週期無法保證夠長 --- ###### $T=(I+L^a)(I+R^b)$ 如果 $T=(I+L^a)(I+R^b)$ \begin{align} \beta T = \beta (I+L^a)(I+R^b) \end{align} 等同先左移再右移 \begin{align} \beta=\beta\oplus(\beta\ll a) \\ \beta=\beta\oplus(\beta\gg b) \end{align} 這個形式的 $T$ 由 $(a,b)$ 決定 $1\le a \le 32,1\le b \le 32$ ， $(a,b)$ 共有 $32*32=1024$ 種可能，所以 $T$ 也有 $1024$ 種那麼這 $1024$ 種 $T$ 有 non-singular 嗎?可惜也沒有 ###### $T=(I+L^a)(I+R^b)(I+L^c)$ 這種形式的 $T$ 做 $\beta T$ 等同於 \begin{align} \beta=\beta\oplus(\beta\ll a) \\ \beta=\beta\oplus(\beta\gg b) \\ \beta=\beta\oplus(\beta\ll c) \end{align} $T$ 是由 $(a,b,c)$ 得出的，$1\le a \le 32,1\le b \le 32,1\le c \le 32$ ， $(a,b,c)$ 共有 $32*32*32=32768$ 種可能，這 32768 種 $T$ 有多種是 non-singular 也就是 order $2^n-1$ 其中有 81 種 $a<c$，同時會在下面的式子代表的 $T$ 也會滿足 non-singular 所以也可以視作找到 $32*8=648$ 種 xorshift 的參數使週期等於 $2^{32}-1$ (在 n 為 32 的情況下) ###### 32-bit 週期為 $2^{32}-1$ 的 xorshift 參數 |$(a,b,c)$||||||||| |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| | 1, 3,10| 1, 5,16| 1, 5,19| 1, 9,29| 1,11, 6| 1,11,16| 1,19, 3| 1,21,20| 1,27,27| | 2, 5,15| 2, 5,21| 2, 7, 7| 2, 7, 9| 2, 7,25| 2, 9,15| 2,15,17| 2,15,25| 2,21, 9| | 3, 1,14| 3, 3,26| 3, 3,28| 3, 3,29| 3, 5,20| 3, 5,22| 3, 5,25| 3, 7,29| 3,13, 7| | 3,23,25| 3,25,24| 3,27,11| 4, 3,17| 4, 3,27| 4, 5,15| 5, 3,21| 5, 7,22| 5, 9,7 | | 5, 9,28| 5, 9,31| 5,13, 6| 5,15,17| 5,17,13| 5,21,12| 5,27, 8| 5,27,21| 5,27,25| | 5,27,28| 6, 1,11| 6, 3,17| 6,17, 9| 6,21, 7| 6,21,13| 7, 1, 9| 7, 1,18| 7, 1,25| | 7,13,25| 7,17,21| 7,25,12| 7,25,20| 8, 7,23| 8,9,23 | 9, 5,1 | 9, 5,25| 9,11,19| | 9,21,16|10, 9,21|10, 9,25|11, 7,12|11, 7,16|11,17,13|11,21,13|12, 9,23|13, 3,17| |13, 3,27|13, 5,19|13,17,15|14, 1,15|14,13,15|15, 1,29|17,15,20|17,15,23|17,15,26| ###### 32-bit 週期為 $2^{32}-1$ 的 xorshift 作法 ```txt yˆ=y<<a; yˆ=y>>b; yˆ=y<<c; yˆ=y<<c; yˆ=y>>b; yˆ=y<<a; yˆ=y>>a; yˆ=y<<b; yˆ=y>>c; yˆ=y>>c; yˆ=y<<b; yˆ=y>>a; yˆ=y<<a; yˆ=y<<c; yˆ=y>>b; yˆ=y<<c; yˆ=y<<a; yˆ=y>>b; yˆ=y>>a; yˆ=y>>c; yˆ=y<<b; yˆ=y>>c; yˆ=y>>a; yˆ=y<<b; ``` --- 64-bit 下則有 275 組 triple $(a,b,c)$ 可以讓 order 來到 $2^{64}-1$ 也可以視為 $8*275=2200$ 組 --- ### 論文證明的邏輯 1. non-singular matrix 的 order 為 $2{n}-1$ 2. non-singular matrix 可以讓 binary vector(可以視為 32-bit 數字) $\beta$ 作為 seed 時，RNG: $f(\beta)=\beta T$ 的週期為 $2^{n}-1$ 3. 如果 matrix 可以寫成這種形式 $T=(I+L^a)$ 就可以用 xorshift $\beta\text{^}(\beta\ll a)$ 取代 $\beta T$ 4. 暴力找所有 $(a,b,c)$ 的組合(32768 個)，得出多個(32768 個) $T=(I+L^a)(I+R^b)(I+L^c)$ 5. 找出其中是 non-singular 的 T，共 81 個 --- ### 總結因為 $\beta T$ 的週期為 $2^{n}-1$ 是在 $\beta$ 為 binary vector 的 space 成立的，也就是說只要是 binary vector(ex: 電腦的 unsigned int)就可以當作 seed ，用得出的任一 triple (a,b,c) 做 xorshift 週期必定為 $2^n-1$ |xorshift|其他 RNG| |:-:|:-:| |任一(for all) seed 都有很長的週期|某些 seed 週期可能很短| |只有 xor, shift 操作，速度快|可能會使用到乘法|