雷承勳
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    2018q1 Homework2 (prefix-search) === contributed by <`raygoah`> ## 系統環境 ```shell $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz Stepping: 3 CPU MHz: 2394.425 CPU max MHz: 3400.0000 CPU min MHz: 800.0000 BogoMIPS: 4788.85 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmxest tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single ptitpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts ``` ## 理解 Ternary search tree ### 閱讀 * [`rex662624`](https://hackmd.io/K4v3wF-8SFudCT2-ckK9Hg) 詳細的解說 * [Ternary Search Tree 視覺化](https://www.cs.usfca.edu/~galles/visualization/TST.html) * [Trie 和 Ternary Search Tree 的學習總結 - JK_RUSH - 博客園](http://www.cnblogs.com/rush/archive/2012/12/30/2839996.html) ### 介紹 * Trie 的時間複雜度為 $O(n)$ ,且能實現使用 BST 或是 Hash 難以實現的 prefix-search * 但因為在 Trie 中每個節點中,都要同時保留 26 個英文字母的子節點,因此在空間使用的效率上會很不好,為了改善這點,但同時想要保持 Trie 在時間複雜度方面表現良好的特點,因此結合了 Trie 在時間效率以及 BST 在空間效率上兩方的優點,產生了 Ternary search tree * 和 Trie 在每個節點中保留 26 個子節點的作法不同,Ternary search tree 的作法和 BST 較為接近,只是不同的地方在於 BST 每個 node 的 children 只有 left child 和 right child,但 Ternary search tree 的分支數是三,分成代表大於 key 的 higher child,代表等於 key 的 middle chlid,以及表示小於 key 的 lower child,如下所示: * 每個節點 (Node) 最多有三個分支,以及一個 key * Key 用來記錄 Keys 中的其中一個字元 * lower (left) :用來指向比當前 node 的 key 小 的 node * equal (middle) :用來指向與當前 node 的 key 一樣大 的 node * high (right):用來指向比當前 node 的 Key 大 的 node * 插入順序 ``` 插入 dog 插入 hero 因為 h 大於 d 插入 dish 因為 d 相同 所以放在 d 代表大於的右子樹中 比較 o 和 i 得到 i 小於 o 所以放在左邊 d d d | | \ | \ o --> o h --> o h | | | / | | g g e i g e | | | r s r | | | o h o ``` 由 [Ternary Search Tree 視覺化](https://www.cs.usfca.edu/~galles/visualization/TST.html) 繪製的結果: ![](https://i.imgur.com/NbvyDVO.png) --> ![](https://i.imgur.com/azmtcYd.png) --> ![](https://i.imgur.com/RXvXAhr.png) ## 實作 ### 原始版本 * make 後執行,看看這個程式是要做什麼 ``` $ make $ ./test_cpy ``` * 程式的執行畫面如下 ```shell ternary_tree, loaded 259112 words in 0.119982 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ``` * 可以依照所需輸入 command,在這邊輸入作業說明中嘗試的 f,會出現 `find word in tree:` 接著可以輸入要查詢的字串 * 沒找到 America ```shell Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: America America not found. ``` * 有找到 China ```shell Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: China found China in 0.000001 sec. ``` * 再來嘗試 command s,可以尋找和使用者輸入具有相同 prefix 的字串,以下只節錄部份結果 ```shell choice: s find words matching prefix (at least 1 char): Chin Chin - searched prefix in 0.000255 sec suggest[0] : Chiná, suggest[1] : Chinácota, suggest[2] : Chinú, ... suggest[45] : Chinteni, suggest[46] : Chiny, ``` ### 修改 Makefile * 增加 astyle ,輸入 `$ make astyle` 即可執行 astyle 的檢查及排版 ``` astyle:     astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.\[ch\] ``` * 參考 hw1 phonebook 的 makefile,加以修改以達成作業要求的 `$ make bench`,以及可對 ternary search tree 的實作做出效能評比 ``` cache-test: $(TESTS) echo 3 | sudo tee /proc/sys/vm/drop_caches; perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_cpy --bench $(TEST_DATA) perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_ref --bench $(TEST_DATA) ``` * 修改完 Makefile 後,同時必須將 `test_cpy.c` 以及 `test_ref.c` 做修改,加入判別 `--bench` 的部份,並且將 `for` 迴圈中的 `switch` `case` 進行對應的修改 ```clike /* 判斷執行程式時有無加入'--bench' or '--BENCH' 有的話將 flag 設為 1 */ if (argc > 1) { if ((strcmp(argv[1],"--bench") == 0 ) || (strcmp(argv[1], "--BENCH") == 0)) { bench_flag = 1; } else { bench_flag = 0; } } ``` * 而此時跑跑看 `$ make cache-test`,會發現程式無法結束 * 原因在於 `perf stat --repeat 100` 是要讓程式重複執行 100 次後,再計算效能評比,但目前的程式是會一直停留在 `for(;;)` 的迴圈中,直到使用者輸入了 `q` 後,才會結束,因此必須修改為若是在 `--bench` 的模式,第一次進入 `switch` `case` 後,把字串 `word` 設為 "q",這樣便可以跳出迴圈,結束這一次的執行,即可順利利用 `peref` 重複執行多次,並得到分析的結果,如下所示: ```shell Performance counter stats for './test_cpy --bench s Tai' (100 runs): 120,3726 cache-misses # 38.584 % of all cache refs ( +- 0.90% ) 311,9773 cache-references ( +- 1.63% ) 5,3514,1267 instructions # 1.01 insn per cycle ( +- 0.02% ) 5,2921,4313 cycles ( +- 0.68% ) 0.268488851 seconds time elapsed ( +- 2.26% ) ``` * 插曲: ```shell $ git commit --- .merge_file_MC22k0 2018-03-21 15:16:56.404597484 +0800 +++ /tmp/.merge_file_MC22k0.SkJUOb 2018-03-21 15:16:56.408597505 +0800 @@ -177,7 +177,7 @@ int main(int argc, char **argv) break; default: fprintf(stderr, "error: invalid selection.\n"); - break; + break; } } [!] test_cpy.c does not follow the consistent coding style. Make sure you have run astyle as the following: astyle --style=kr --indent=spaces=4 --suffix=none test_ref.c ``` * 錯誤訊息表示 `break;` 那行不符合 coding style,嗯..... * 按照提示訊息輸入 `astyle --style=kr --indent=spaces=4 --suffix=none test_ref.c` ,但卻還是一樣的錯誤,原來不是每次用 astyle 就可以改成對的,不然就是我用錯了 QQ * 改了很多次之後,同樣的錯誤訊息仍舊一直重複出現,此時耐心耗盡的我決定採用之前學長教的方法,睡眠 debug 法 * 睡一覺起來,真的有用,原來是我原本的 code 中,在分號後面還有好幾個空白,自己都忘記什麼時候改到的,但因為在錯誤訊息中實在是看不出來,而我刪掉 `break;` 重打一次,但後方的空白仍舊沒被刪掉,所以還是錯誤,最後是突然想到才發現的,又學了一課....賺!! > 請使用 git diff 比較程式更改的地方,可以更快了解出錯的部份,好用的 tool 如 meld, tig , gitk 等,請行查尋關鍵字 > [name=ryanpatiency][color=green] > > OK ,感謝指教 [name=raygoah][color=blue] ### Fixed test_ref.c * 開始閱讀程式碼,發現一開始就不知道 `enum` 是什麼,Google 是大家的老師,在 [enum 的用途](http://ascii-iicsa.blogspot.tw/2010/08/enum.html) 這篇文章中,有清楚的介紹 ```clike enum { INS, DEL, WRDMAX = 256, STKMAX = 512, LMAX = 1024 } ``` 因此我們可以理解到,在這裡 INS 為 0,而 DEL 則為 1 ``` #define REF INS #define CPY DEL ``` 所以在這邊 REF 也為 0,而 CPY 為 1 * 這邊開始遇到 FIXME 的部份,因為我們需要分析 test_ref.c 以及 test_cpy.c 的效能,而在前面 makefile 修改完後,因為尚未修改 test_ref.c 的部份,所以結果只有對於 cpy 部份的效能分析,因此現在要將 ref 的部份補上 * 在這裡會發現,test_ref.c 中共有三個 FIXME,而這三個 FIXME 皆為 ```clike= tst_ins_del(&root, &p, INS, CPY) ``` 將 CPY 改為 REF ```clike= tst_ins_del(&root, &p, INS, REF) ``` * 想說改完了,好開心,馬上來跑跑看 ```shell choice: s find words matching prefix (at least 1 char): Tain Tain - searched prefix in 0.000015 sec suggest[0] : Tain suggest[1] : Tain suggest[2] : Tain suggest[3] : Tain suggest[4] : Tain suggest[5] : Tain suggest[6] : Tain suggest[7] : Tain suggest[8] : Tain suggest[9] : Tain suggest[10] : Tain suggest[11] : Tain ``` 夢醒了,果然沒這麼簡單... * 參考 [`tina0405`](https://hackmd.io/mI8HxMMJRqy_I2BmDzZWAQ#) * 觀察 `tst_ins_del()`,先從 test.h 中宣告 function 的部份看起,會發現到註解寫的十分詳細,其中有寫到 `if 'cpy' is non-zero allocate storage for 's', otherwise save pointer to 's'`,因為之前已經將 CPY 改為 REF,因此這裡傳入的參數其值會為 0,進入 `else` 的部份,直接指向傳入字串 `s` 的地址 ```clike /* Place nodes until end of the string, at end of stign allocate * space for data, copy data as final eqkid, and return. */ if (*p++ == 0) { if (cpy) { /* allocate storage for 's' */ const char *eqdata = strdup(*s); if (!eqdata) return NULL; curr->eqkid = (tst_node *) eqdata; return (void *) eqdata; } else { /* save pointer to 's' (allocated elsewhere) */ curr->eqkid = (tst_node *) *s; return (void *) *s; } } ``` * 利用 array,保有 `save pointer to ‘s’`,且回傳不一樣的空間,將 input 的城市存入 array 中 ```clike char all_word[20000][WRDMAX]; // fix while ((rtn = fscanf(fp, "%s", all_word[index])) != EOF) { char *p = all_word[index++]; /* FIXME: insert reference to each string */ if (!tst_ins_del(&root, &p, INS, REF)) { fprintf(stderr, "error: memory exhausted, tst_insert.\\n"); fclose(fp); return 1; } idx++; } ``` * 在這邊要將 cities.txt 的資料量減少,否則會造成 core dumped,在這邊先用 4999 筆測試看看,程式可以順利執行,並得到正確的結果 ``` choice: s find words matching prefix (at least 1 char): Am Am - searched prefix in 0.000025 sec suggest[0] : Amadora, suggest[1] : Amagasaki, suggest[2] : Amaigbo, suggest[3] : Amalner, suggest[4] : Amarillo, suggest[5] : Amarnāth, suggest[6] : Amasya, suggest[7] : Ambāla, suggest[8] : Ambarawa, suggest[9] : Ambato, suggest[10] : Ambattūr, suggest[11] : Ambon, suggest[12] : Americana, suggest[13] : Amersfoort, suggest[14] : Amiens, suggest[15] : Amman, suggest[16] : Amrāvati, suggest[17] : Amreli, suggest[18] : Amritsar, suggest[19] : Amroha, suggest[20] : Amsterdam, suggest[21] : Amsterdam-Zuidoost, ``` * 但在這邊發現,雖然程式更改完後,執行得到的結果是正確的,但是在選擇 q 結束後,卻會出現錯誤訊息 * 參考 [`rex662624`](https://hackmd.io/s/SJbHD5sYM) 1. 在 tst_ref 要 free 時,改用 `tst_free(tst_node *p)`,因為現在利用 array 來儲存字串,不能將 array 的空間做 `free()`,因此改用`tst_free(tst_node *p)`,防止 `tst_free_all(tst_node *p)` 中 `free(p->eqkid)` 2. 另外是關於刪除會有問題的部份,tst_ins_del() 中 : `return tst_del_word(root, curr, &stk, 1);` 改為 `return tst_del_word(root, curr, &stk, cpy);`,這個問題一開始我還沒有發現,是看了 `rex662624` 的筆記後才發現的,非常感謝這位同學詳細的筆記 ## 比較效能 * 在修改完 tst_ref 的部份後,現在可以對 ref 以及 cpy 兩種版本,進行效能上的比較,兩者各跑 1000 次 * CPY: ```shell Performance counter stats for './test_cpy --bench s Tai' (1000 runs): 4,6064 cache-misses # 42.489 % of all cache refs ( +- 1.14% ) 10,8414 cache-references ( +- 0.32% ) 3115,0273 instructions # 1.23 insn per cycle ( +- 0.00% ) 2527,6335 cycles ( +- 0.34% ) 0.008810699 seconds time elapsed ( +- 1.47% ) ``` * REF: ```shell Performance counter stats for './test_ref --bench s Tai' (1000 runs): 5,2886 cache-misses # 40.790 % of all cache refs ( +- 1.03% ) 12,9653 cache-references ( +- 0.35% ) 3102,1250 instructions # 1.11 insn per cycle ( +- 0.00% ) 2807,2603 cycles ( +- 0.37% ) 0.009552937 seconds time elapsed ( +- 0.48% ) ``` ## 參考 * [`rex662624`](https://hackmd.io/K4v3wF-8SFudCT2-ckK9Hg) * [`tina0405`](https://hackmd.io/mI8HxMMJRqy_I2BmDzZWAQ#) * [Trie 和 Ternary Search Tree 的學習總結 - JK_RUSH - 博客園](http://www.cnblogs.com/rush/archive/2012/12/30/2839996.html) * [enum 的用途](http://ascii-iicsa.blogspot.tw/2010/08/enum.html)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully