Jack Ho
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2020q3 Homework3 (dict) contributed by < `quantabase13` > ## CPY 和 REF 實作的效能差異分析 在 `dict` 目錄下完成編譯後,執行 ```shell $ ./test_common --bench CPY ``` 可以得到 `bench_ref.txt` 文件。 為了確認這份文件中的資料代表的意思,觀察了`test_common.c` 中的程式碼,其中一部份如下: ```cpp if (argc == 3 && strcmp(argv[1], "--bench") == 0) { int stat = bench_test(root, BENCH_TEST_FILE, LMAX); tst_free(root); free(pool); return stat; } ``` `bench_test` 的程式碼被定義在 `bench.c` ,裡面的程式碼如下: ```cpp #include <stdio.h> #include <stdlib.h> #include <time.h> #include "bench.h" #define DICT_FILE "cities.txt" #define WORDMAX 256 #define PREFIX_LEN 3 double tvgetf() { struct timespec ts; double sec; clock_gettime(CLOCK_REALTIME, &ts); sec = ts.tv_nsec; sec /= 1e9; sec += ts.tv_sec; return sec; } int bench_test(const tst_node *root, char *out_file, const int max) { char prefix[PREFIX_LEN + 1] = ""; char word[WORDMAX] = ""; char **sgl; FILE *fp = fopen(out_file, "w"); FILE *dict = fopen(DICT_FILE, "r"); int idx = 0, sidx = 0; double t1, t2; if (!fp || !dict) { if (fp) { fprintf(stderr, "error: file open failed in '%s'.\n", DICT_FILE); fclose(fp); } if (dict) { fprintf(stderr, "error: file open failed in '%s'.\n", out_file); fclose(dict); } return 1; } sgl = (char **) malloc(sizeof(char *) * max); while (fscanf(dict, "%s", word) != EOF) { if (strlen(word) < sizeof(prefix) - 1) continue; strncpy(prefix, word, sizeof(prefix) - 1); t1 = tvgetf(); tst_search_prefix(root, prefix, sgl, &sidx, max); t2 = tvgetf(); fprintf(fp, "%d %f msec\n", idx, (t2 - t1) * 1000); idx++; } free(sgl); fclose(fp); fclose(dict); return 0; } ``` 關鍵是中間的 `while` 迴圈。仔細觀察後可以發現 `bench_test` 的工作內容就是將 `cities.txt` 中的城市名稱取出,取其前三個字母作為 prefix (根據 `#define PREFIX_LEN 3` ),傳進函式 `tst_search_prefix` ,最後將 `tst_search_prefix` 執行的時間寫入 `bench_ref.txt`。 從 `test_common.c` 中,可以知道命令 ```shell $ ./test_common --bench CPY ``` 裡的 `CPY` 決定了 string 在 `ternary search tree` 的儲存方式。 試著將其視覺化後,得到下圖: ![](https://i.imgur.com/VtUmUft.png) 如果改成執行 ```shell $ ./test_common --bench REF ``` 視覺化後能得到下圖: ![](https://i.imgur.com/8wdbZQ8.png) 在 inputfile 皆為 `cities.txt` 的條件下,可以發現 `CPY` 的版本 與 `REF` 比起來,執行 `tst_search_prefix` 有明顯速度上的差異。 嘗試用 `perf` 來細部分析這兩種不同命令,得到如下兩個結果: ```shell= ho@ho-Nitro-AN515-54:~/dict$ sudo perf stat ./test_common --bench CPY CPY mechanism ternary_tree, loaded 206849 words in 0.098053 sec Performance counter stats for './test_common --bench CPY': 11,433.94 msec task-clock # 1.000 CPUs utilized 34 context-switches # 0.003 K/sec 0 cpu-migrations # 0.000 K/sec 7,347 page-faults # 0.643 K/sec 47,111,915,437 cycles # 4.120 GHz 50,429,200,586 instructions # 1.07 insn per cycle 7,537,729,166 branches # 659.241 M/sec 232,021,975 branch-misses # 3.08% of all branches 11.439066191 seconds time elapsed 11.390165000 seconds user 0.043992000 seconds sys ``` ```shell= ho@ho-Nitro-AN515-54:~/dict$ sudo perf stat ./test_common --bench REF REF mechanism ternary_tree, loaded 206849 words in 0.096954 sec Performance counter stats for './test_common --bench REF': 12,613.26 msec task-clock # 1.000 CPUs utilized 43 context-switches # 0.003 K/sec 0 cpu-migrations # 0.000 K/sec 7,170 page-faults # 0.568 K/sec 51,906,233,997 cycles # 4.115 GHz 50,404,576,872 instructions # 0.97 insn per cycle 7,533,050,891 branches # 597.232 M/sec 232,130,089 branch-misses # 3.08% of all branches 12.614219162 seconds time elapsed 12.569555000 seconds user 0.044005000 seconds sys ``` 奇怪的是,執行 ```shell= ho@ho-Nitro-AN515-54:~/dict$ sudo perf stat ./test_common --bench REF ``` 出來的結果中,發現每個 cycle 只執行不到1個 instruction。 為了深入釐清,我們使用 `perf record -g` 取樣: ```shell= ho@ho-Nitro-AN515-54:~/dict$ sudo perf record -g ./test_common --bench REF ``` 接著以 `perf report` 讀取報告: ```shell= ho@ho-Nitro-AN515-54:~/dict$ sudo perf report -g graph,0.5,caller ``` 得到以下結果: ```shell= Samples: 51K of event 'cycles', Event count (approx.): 52400465598 Children Self Command Shared Object Symbol + 99.48% 0.00% test_common [unknown] [k] 0x0c96258d4c544155 + 99.48% 0.00% test_common libc-2.27.so [.] __libc_start_main + 99.48% 0.09% test_common test_common [.] main + 98.06% 0.01% test_common test_common [.] bench_test + 97.99% 0.48% test_common test_common [.] tst_search_prefix + 97.50% 97.38% test_common test_common [.] tst_suggest + 0.51% 0.00% test_common libc-2.27.so [.] fprintf 0.45% 0.12% test_common libc-2.27.so [.] vfprintf 0.36% 0.36% test_common test_common [.] tst_ins_del 0.28% 0.26% test_common libc-2.27.so [.] __GI___printf_fp_l 0.25% 0.00% test_common [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwf 0.25% 0.01% test_common [kernel.kallsyms] [k] do_syscall_64 0.21% 0.00% test_common libc-2.27.so [.] __GI___libc_write 0.20% 0.00% test_common [kernel.kallsyms] [k] __x64_sys_write 0.20% 0.00% test_common [kernel.kallsyms] [k] ksys_write 0.20% 0.00% test_common [kernel.kallsyms] [k] vfs_write 0.16% 0.08% test_common test_common [.] bloom_add ``` 發現 `tst_suggest` 花掉最多CPU 週期(`self overhead` 佔總體的 `97.38%`),因此我們決定細部分析 `tst_suggest` 所用到的指令: ```shell= Samples: 51K of event 'cycles', 4000 Hz, Event count (approx.): 52400465598 tst_suggest /home/ho/dict/test_common [Percent: local period] 1.10 │ mov %al,-0xc(%rbp) │ if (!p || *n >= max) 2.09 │ cmpq $0x0,-0x8(%rbp) 0.89 │ ↓ je 119 2.73 │ mov -0x28(%rbp),%rax 1.17 │ mov (%rax),%eax 0.62 │ cmp %eax,-0x10(%rbp) │ ↓ jle 119 │ return; │ tst_suggest(p->lokid, c, nchr, a, n, max); 0.15 │ movsbl -0xc(%rbp),%esi 0.18 │ mov -0x8(%rbp),%rax 36.73 │ mov 0x8(%rax),%rax 0.29 │ mov -0x10(%rbp),%r8d 0.08 │ mov -0x28(%rbp),%rdi 0.12 │ mov -0x20(%rbp),%rcx 0.72 │ mov -0x18(%rbp),%rdx 0.22 │ mov %r8d,%r9d 0.09 │ mov %rdi,%r8 0.10 │ mov %rax,%rdi 0.92 │ → callq tst_suggest │ if (p->key) 0.68 │ mov -0x8(%rbp),%rax 0.66 │ movzbl (%rax),%eax ``` 這裡列出的其實就是 function 的 `disassembly code`。我們觀察到 `tst_suggest(p->lokid, c, nchr, a, n, max);` 這個 function call 中,`mov 0x8(%rax),%rax` 指令所占時間比例最高,於是我們想確認 `%rax` 對應 function call 中的那一個 argument。 參考相關的第一手資料 [x86-64 psABI](https://gitlab.com/x86-psABIs/x86-64-ABI) 3.2.3 [Parameter Passing] > • Arguments of types (signed and unsigned) _Bool , char , short , int , long , long long , and pointers are in the INTEGER class. > Passing Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows: >1. If the class is MEMORY, pass the argument on the stack. >2. If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used. . >3. If the class is SSE, the next available vector register is used, the registers >are taken in the order from %xmm0 to %xmm7. >4. If the class is SSEUP, the eightbyte is passed in the next available eightbyte chunk of the last used vector register. >5. If the class is X87, X87UP or COMPLEX_X87, it is passed in memory. 我們可以確認,在 `tst_suggest(p->lokid, c, nchr, a, n, max);` 中: * `%rdi` 存的是 `p->lokid` * `%esi` 存的是 `c` * `%rdx` 存的是 `nchr` * `%rcx` 存的是 `a` * `%r8` 存的是 `n` * `%r9d` 存的是 `max` 並且,`p->lokid` 這項操作花了最長的時間 。 到這裡似乎可以深入分析是否是 `cache misses` 造成的影響了,但是我遇到一個不知如何解釋的狀況: 執行命令 ```shell= $ sudo perf record -g -e cycles ./test_common --bench REF ``` 接著細部分析 `tst_suggest` 的指令熱點,會發現: ```shell= Samples: 56K of event 'cycles', 4000 Hz, Event count (approx.): 57186636114 tst_suggest /home/ho/dict/test_common [Percent: local period] Percent│ if (!p || *n >= max) 0.98 │ cmpq $0x0,-0x8(%rbp) 1.90 │ ↓ je 119 1.61 │ mov -0x28(%rbp),%rax 1.41 │ mov (%rax),%eax 1.01 │ cmp %eax,-0x10(%rbp) │ ↓ jle 119 │ return; │ tst_suggest(p->lokid, c, nchr, a, n, max); 0.52 │ movsbl -0xc(%rbp),%esi 0.14 │ mov -0x8(%rbp),%rax 0.17 │ mov 0x8(%rax),%rax 37.93 │ mov -0x10(%rbp),%r8d 0.24 │ mov -0x28(%rbp),%rdi 0.09 │ mov -0x20(%rbp),%rcx 0.09 │ mov -0x18(%rbp),%rdx 0.60 │ mov %r8d,%r9d 0.21 │ mov %rdi,%r8 0.11 │ mov %rax,%rdi 0.09 │ → callq tst_suggest │ if (p->key) 0.31 │ mov -0x8(%rbp),%rax 0.51 │ movzbl (%rax),%eax ``` 同樣都使用相同的 `event` ,只是差別在有沒有在命令列顯式給出使用的 `event` (根據[Perf wiki](https://perf.wiki.kernel.org/index.php/Tutorial) , perf record 預設採用 `cycles` 作為 `Sampling event`),最後的熱點竟然不同。 同時,若使用`perf top` 命令即時監控的話,會發現熱點仍然是 `mov 0x8(%rax),%rax` 。 另外,若是用 `perf` 分析 `CPY` 的版本,也會得到類似的 output: ```shell= Samples: 46K of event 'cycles', Event count (approx.): 47723392335 Children Self Command Shared Object Symbol + 99.46% 0.00% test_common [unknown] [.] 0x0c96258d4c544155 ◆ + 99.46% 0.00% test_common libc-2.27.so [.] __libc_start_main ▒ + 99.46% 0.13% test_common test_common [.] main ▒ + 97.95% 0.02% test_common test_common [.] bench_test ▒ + 97.84% 0.50% test_common test_common [.] tst_search_prefix ▒ + 97.36% 97.25% test_common test_common [.] tst_suggest ▒ + 0.52% 0.00% test_common libc-2.27.so [.] fprintf ▒ 0.47% 0.13% test_common libc-2.27.so [.] vfprintf ▒ 0.40% 0.39% test_common test_common [.] tst_ins_del ▒ 0.28% 0.26% test_common libc-2.27.so [.] __GI___printf_fp_l ▒ 0.22% 0.00% test_common [kernel.kallsyms] [k] entry_SYSCALL_64_after_hw▒ 0.22% 0.01% test_common [kernel.kallsyms] [k] do_syscall_64 ▒ 0.17% 0.00% test_common libc-2.27.so [.] __GI___libc_write ▒ 0.17% 0.00% test_common [kernel.kallsyms] [k] __x64_sys_write ▒ 0.17% 0.00% test_common [kernel.kallsyms] [k] ksys_write ▒ 0.16% 0.03% test_common libc-2.27.so [.] __isoc99_fscanf ▒ 0.15% 0.00% test_common [kernel.kallsyms] [k] vfs_write ▒ 0.15% 0.06% test_common test_common [.] bloom_add ▒ ``` ```shell= Samples: 46K of event 'cycles', 4000 Hz, Event count (approx.): 47723392335 tst_suggest /home/ho/dict/test_common [Percent: local period] 1.51 │ ↓ je 119 ▒ 4.55 │ mov -0x28(%rbp),%rax ▒ 1.65 │ mov (%rax),%eax ▒ 0.76 │ cmp %eax,-0x10(%rbp) ▒ │ ↓ jle 119 ▒ │ return; ▒ │ tst_suggest(p->lokid, c, nchr, a, n, max); ▒ 0.11 │ movsbl -0xc(%rbp),%esi ▒ 0.19 │ mov -0x8(%rbp),%rax ▒ 40.62 │ mov 0x8(%rax),%rax ▒ 0.33 │ mov -0x10(%rbp),%r8d ▒ 0.08 │ mov -0x28(%rbp),%rdi ▒ 0.09 │ mov -0x20(%rbp),%rcx ◆ 0.80 │ mov -0x18(%rbp),%rdx ▒ 0.26 │ mov %r8d,%r9d ▒ 0.05 │ mov %rdi,%r8 ▒ 0.12 │ mov %rax,%rdi ▒ 1.01 │ → callq tst_suggest ▒ Press 'h' for help on key bindings ``` 如果兩種實作都有同樣位置的熱點,是否表示 `CPY` 的實作方式的其實沒辦法降低 cache misses ? 我們比較看看 `cache-misses` event 的數量: * 以下是 `CPY` 的版本。 輸入命令為 ```shell= $ sudo perf record -g -e cache-misses,cache-references,instructions,cycles ./test_common --bench CPY ``` 得到 ```shell= Samples: 46K of event 'cache-misses', Event count (approx.): 1078849051 Children Self Command Shared Object Symbol + 99.52% 0.00% test_common [unknown] [.] 0x0c96258d4c544155 + 99.52% 0.00% test_common libc-2.27.so [.] __libc_start_main + 99.52% 0.00% test_common test_common [.] main + 99.21% 0.01% test_common test_common [.] bench_test + 99.14% 0.26% test_common test_common [.] tst_search_prefix + 98.89% 98.78% test_common test_common [.] tst_suggest 0.35% 0.00% test_common [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe 0.35% 0.00% test_common [kernel.kallsyms] [k] do_syscall_64 0.30% 0.00% test_common libc-2.27.so [.] __GI___libc_write 0.30% 0.00% test_common [kernel.kallsyms] [k] __x64_sys_write 0.30% 0.00% test_common [kernel.kallsyms] [k] ksys_write 0.30% 0.00% test_common [kernel.kallsyms] [k] vfs_write 0.23% 0.00% test_common [kernel.kallsyms] [k] __vfs_write ``` * 以下是 `REF` 的版本。 輸入命令為 ```shell= $ sudo perf record -g -e cache-misses,cache-references,instructions,cycles ./test_common --bench REF ``` 得到 ```shell= Samples: 54K of event 'cache-misses', Event count (approx.): 1329973047 Children Self Command Shared Object Symbol + 99.55% 0.00% test_common [unknown] [k] 0x0c96258d4c544155 + 99.55% 0.00% test_common libc-2.27.so [.] __libc_start_main + 99.54% 0.00% test_common test_common [.] main + 99.30% 0.01% test_common test_common [.] bench_test + 99.26% 0.18% test_common test_common [.] tst_search_prefix + 99.08% 98.70% test_common test_common [.] tst_suggest 0.34% 0.00% test_common [kernel.kallsyms] [k] do_syscall_64 0.34% 0.00% test_common [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe 0.31% 0.00% test_common libc-2.27.so [.] __GI___libc_write 0.30% 0.00% test_common [kernel.kallsyms] [k] __x64_sys_write 0.30% 0.00% test_common [kernel.kallsyms] [k] ksys_write 0.30% 0.00% test_common [kernel.kallsyms] [k] vfs_write 0.24% 0.00% test_common [kernel.kallsyms] [k] __vfs_write ``` 兩者的 `cache-misses` 的數量比為$\dfrac{1329973047}{1078849051}\approx1.23$。 再看看當輸入命令為 ```shell= $ sudo perf record -e cache--misses -g ./test_common --bench CPY ``` 或 ```shell= $ sudo perf record -e cache-misses -g ./test_common --bench REF ``` 時,`$ perf top -e cache-misses` 會給出哪個指令是熱點的判斷。 結果發現熱點是 `mov -0x10(%rbp),%r8d`,也就是 `mov 0x8(%rax),%rax` 的下一條指令。 ## 研讀 Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters 論文 [論文的連結在此](https://arxiv.org/abs/1912.08258) Xor Filters 的演算法可分為四個部份: * Membership Tests * Construction * Mapping Step * Assigning Step 以下我依序針對各個部份記錄我的理解。 ### Membership Tests 這部份的演算法很簡短,只有比較 $fingerprint(x)$ 是否與 $B[h_0(x)] \space xor \space B[h_1(x)] \space xor \space B[h_2(x)]$ 相等。我們透過這部份的演算法得知 input $x$ 是否屬於當初構建 Xor Filter 時所用的 Key Set (就是判斷資料庫裡有沒有 $x$)。 ### Construction 從這部份可以看出演算法的架構,內容如下: * 我們需要: * set of keys S * a fingerprint function * 演算法的工作流程: * 隨機選擇3個 hash function $h_0$, $h_1$, $h_2$, 直到 $map(S, h_0, h_1, h_2)$ 返回 Sucess 以及一個 stack $\sigma$ 。(關於這裡的隨機,有特別強調要 `independent from the fingerprint function`, 但我不懂一個 function `indepentdent from` 另一個 function 的定義是什麼。需要查閱相關資料。) * 設 B 為一個 大小為 $\lfloor{1.23}\cdot|S|\rfloor+32$ 的 array, 其中每一個元素有 k bits。 * $assign(\sigma, B, h_0, h_1, h_2)$ * 我們最後能得到: * array B and the hash functions $h_0$, $h_1$, $h_2$ 這一部份出現了兩個新 function $map(S, h_0, h_1, h_2)$ 及 $assign(\sigma, B, h_0, h_1, h_2)$,它們是 Xor Filter 這個演算法的核心,分別對應到 `Mapping Step` 和 `Assigning Step` ### Mapping Step ```graphviz digraph g { graph[rankdir=LR, center=true, margin=0.2, nodesep=1, ranksep=1] edge[arrowsize=1.0, arrowhead=vee] node[shape = record, fontsize=14, width=1.5, height=1, fixedsize=false]; subgraph cluster_0 { node1[label = "<f0> h0(x) = 0 |<f1>h1(x) = 1 |<f2>h2(x) = 2 |<f3>.\n.\n.\n|<f4> c - 1"]; label = "H"; color=none; } } ``` $map()$ 的想法如下: * 設 $H$ 為大小為 $\lfloor{1.23}\cdot|S|\rfloor+32$ 的 array, 其中每一個元素皆為 key 的集合(一開始為空)。 * 對用來構建 Xor Filter 的 Key Set 中的每一個成員 $x$ ,計算$h_0(x)$, $h_1(x)$, $h_2(x)$,並將 $x$ 填入 H 中對應的集合 $H[h_0(x)]$、$H[h_1(x)]$、$H[h_2(x)]$ 中 接下來: * 初始化一個 empty queue Q,並遍歷整個 $H$ ,如果 $H(i)$ 只有一個元素,就將數字 $i$ 加入 Q 中。 最後: * 初始化一個 empty stack $\sigma$。 * 檢查 Q 是否為空,如果不是,就將一個元素 $i$ 從 Q 中移除,並且: * 如果 $H[i]$ 只有一個元素 $x$,就把 ($x$, $i$) push 到 stack $\sigma$ * 把 $x$ 從 $H[h_j(x)]$ 移除 (j = 0 to 2),並且: * 如果移除 $x$ 後 $H[h_j(x)]$ 只剩一個元素,就把 $h_j(x)$ 加入 Q 中。 如果 stack $\sigma$ 大小 |$\sigma$|=|S| ,return success and the stack $\sigma$。 我們在此先分析 stack $\sigma$ 元素的特性: 根據 >如果 $H[i]$ 只有一個元素 $x$,就把 ($x$, $i$) push 到 stack $\sigma$ 這項規則,每次將 ($x$, $i$) push 到 stack $\sigma$ 後,$i$ 會與用之後遇到的每個 $x\prime$ 計算出的 $h_0(x\prime)$, $h_1(x\prime)$, $h_2(x\prime)$ 不同 因此,我們最後得到的 stack $\sigma$ 有一個非常重要、之後的 `Assigning Step` 會用到的特點: >$i$ is different from $h_0(x\prime)$,$h_1(x\prime)$,$h_2(x\prime)$ for all keys $x\prime$ encountered so far. 這裡的 `encounter so far `的時間線是指從 stack 頂端到 stack 底端遍歷的方向。 ### Assigning Step $assign()$ 的 參數 是 stack $\sigma$, B, hash funtions $h_0$、$h_1$、$h_2$。這一步的工作如下: 對 stack $\sigma$ 中的每一個元素($x$, $i$): * 令 $B[i] = 0$ * 令 $B[i] = fingerprint(x) \space xor\space B[h_0(x)]\space xor \space B[h_1(x)]\space xor \space B[h_2(x)]$ 由於 Stack 中元素的特性: >$i$ is different from $h_0(x\prime)$,$h_1(x\prime)$,$h_2(x\prime)$ for all keys $x\prime$ encountered so far. 所以每個 $B[i]$ 都只會被 set 一次。 雖然不妨礙理解演算法,但一個令人疑惑的問題是,`Assigning Step` 中,,$B[h_0(x)]\space 、\space B[h_1(x)]\space 、 \space B[h_2(x)]$(至少其中兩個) 在還沒被初始化前就被引用,這樣的表達方式是否合理? ## 參閱對應程式碼實作 xor_singleheader, 引入 dict 程式碼 [xor_singleheader 的實作程式碼在此](https://github.com/FastFilter/xor_singleheader) xor_singleheader 實作上跟 bloom filter 有一個差別,導致不能直接將其引入程式碼: * 用來構建 Xor filter 的 Key Set,其中不能有重複的元素。 相反的,bloom filter 容許重複元素。 因此,決定先對資料進行前處理。前處理的步驟如下: * 將 `cities.txt` 資料根據 [資料檔案的分析](https://hackmd.io/@sysprog/2020-dict#%E8%B3%87%E6%96%99%E6%AA%94%E6%A1%88%E7%9A%84%E5%88%86%E6%9E%90) 所述加以修改,以利解析。 * 讀取資料至 `array`,並進行排序。 * 將 `array` 中重複的元素移除。 * 針對 `array` 中每個元素計算其 hash 值(使用 `djb2` 作為 hash function) 以下是我前處理的核心程式碼(第一次嘗試): ```c= u_int64_t *cities_parsing(FILE *fp, char **stringSet, int *setsize) { int size = 0; char line[WRDMAX]; while (fgets(line, WRDMAX, fp)!= NULL) { char city[WRDMAX] = ""; char province[WRDMAX] = ""; char nation[WRDMAX] = ""; sscanf(line, "%[^,^\n], %[^,^\n], %[^,^\n]", city, province, nation); char *city_dup = strdup(city); char *province_dup = strdup(province); char *nation_dup = strdup(nation); if (*nation) { *(stringSet + size) = city_dup; *(stringSet + size + 1) = province_dup; *(stringSet + size + 2) = nation_dup; size = size + 3; } else { *(stringSet + size) = city_dup; *(stringSet + size + 1) = province_dup; free(nation_dup); size = size + 2; } } qsort(stringSet, size, sizeof(char *), cstring_cmp); rm_dup(stringSet, &size); *setsize = size; u_int64_t *stringSet_hashed = malloc(size * sizeof(u_int64_t)); for (int j = 0; j < size; j++) { *(stringSet_hashed + j) = djb2(*(stringSet + j)); } return stringSet_hashed; } ``` 我把它稱作第一次嘗試,是因為檢查 hash 出來的結果時,發現有duplicate (共有 1 個)。 把 hash funciton 從 `dbj2` 換成 `jenkins` 後就沒有 duplicate 發生。 使用的 `jenkins` 實作如下: ```c= static unsigned long jenkins(const void *_str) { const char *key = _str; unsigned long hash = 0; while (*key) { hash += *key; hash += (hash << 10); hash ^= (hash >> 6); key++; } hash += (hash << 3); hash ^= (hash >> 11); hash += (hash << 15); return hash; } ``` 接下來就是實作 benchmark 的相關程式: * Xor filter 的 benchmark 程式碼部份如下: ```c= FILE *fp_xor = fopen(IN_FILE, "r"); char **stringSet = malloc(300000 * sizeof(char *)); u_int64_t *stringSet_hashed = cities_parsing(fp_xor, stringSet, &setsize); xor8_t filter; xor8_allocate(setsize, &filter); bool is_ok = xor8_populate(stringSet_hashed, setsize, &filter); if (!is_ok) { printf("You have duplicate keys"); return 0; } if (argc == 3 && strcmp(argv[1], "--benchXOR") == 0) { int min = 0; int max = setsize - 1; FILE *fp = fopen(BENCH_TEST_FILE, "w"); for (int i = 0; i < setsize; i++) { int x = rand() % (max - min) + min; char *tmp = stringSet[x]; t1 = tvgetf(); if (!(xor8_contain(jenkins(tmp), &filter))) { t2 = tvgetf(); fprintf(fp, "%d %f msec\n", i, (t2 - t1) * 1000); continue; } res = tst_search(root, tmp); t2 = tvgetf(); fprintf(fp, "%d %f msec\n", i, (t2 - t1) * 1000); } tst_free(root); free(pool); return 0; } ``` 想法是:從 `stringSet` 裡隨機選擇一個 string (`stringSet` 就是儲存 `cities.txt` 中所有名稱(且不重複)的 array),將其輸入 `Xor filter` 確認是否存在。 以下是分別針對 CPY 及 REF 測試的圖: ![](https://i.imgur.com/Lz5efpQ.png) ![](https://i.imgur.com/r1BQtpR.png) 奇怪的地方是,兩張圖在 y = 0 ( x 軸) 處都仍有數據點分佈,理論上這些數據點代表不屬於 filter 的 string (直接出局不用到 tree 裡 search)。但所有輸入的 string 都屬於 filter,應該不會出現這種分佈。 關於 Bloom filter 的部份,benchmark程式碼如下: ```c= if (argc == 3 && strcmp(argv[1], "--benchBloom") == 0) { int min = 0; int max = setsize - 1; FILE *fp = fopen(BENCH_TEST_FILE, "w"); for (int i = 0; i < setsize; i++) { int x = rand() % (max - min) + min; char *tmp = stringSet[x]; t1 = tvgetf(); if (!(bloom_test(bloom, word))) { t2 = tvgetf(); fprintf(fp, "%d %f msec\n", i, (t2 - t1) * 1000); continue; } res = tst_search(root, tmp); t2 = tvgetf(); fprintf(fp, "%d %f msec\n", i, (t2 - t1) * 1000); } tst_free(root); free(pool); return 0; } ``` 以下是分別針對 CPY 及 REF 測試的圖: ![](https://i.imgur.com/VwddqAr.png) ![](https://i.imgur.com/4sJbdfd.png) 到這邊可以知道我的 bench mark 設計一定有問題,需要再花時間去釐清。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully