何元皓
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    2018q1 Homework2 (prefix-search) === contributed by <`YuanHaoHo`> ## 開發環境 ```shell $ lscpu Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數:2 每通訊端核心數:2 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 58 Model name: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz 製程: 9 CPU MHz: 1203.109 CPU max MHz: 3200.0000 CPU min MHz: 1200.0000 BogoMIPS: 5188.13 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 3072K NUMA node0 CPU(s): 0-3 ``` ## 理解 Ternary search tree 在 Computer Sience 中, Ternary search tree 是一種 trie 的資料結構,與 Binary search tree 有幾分相似,但細部上卻有所不同。因此先來了解 trie 的特色: * trie 的時間複雜度為 $O(n)$ ,能實現使用 BST 或是 Hash 難以實現的 prefix-search。 * trie 的基本性質: * 根節點不包含字符,除根節點外的每個節點只包含一個字符。 * 從根節點到某一個節點,路徑上經過的字符連接起來,為該節點對應的字符串。 * 每個節點的所有子節點包含的字符串不相同。 * 如果大多數的字串有相同的 prefix,使用trie可以省下很多空間。但如果 trie 每個節點都有 26 個字母,這會消耗龐大的內存空間來執行建樹的動作,為了改善這點然後又保持原本 $O(n)$ 的搜索時間,因此有了結合 BST 和 trie 的 Ternary search tree 。 ### Ternary search tree Ternary search tree 每個節點存儲了一個字符、一個值對象或值指針以及三個指向子節點的指針。這三個字節點常被稱為等位子節點 `=`、低位子節點 `<` 和高位子節點 `>`,因此中文也翻做三元樹。 簡易呈現TST結果: ![](https://i.imgur.com/JCSHHCH.png)->![](https://i.imgur.com/P6zLNbY.png)->![](https://i.imgur.com/wyIUGAv.png)->![](https://i.imgur.com/gzrLjog.png) >以上的步驟為:創建 CABC 的順序 -> insert CABD -> insert DABC -> insert ABCD 其中比較要介紹的是,當我們起始 insert 的值不同時,可以看到有三種不同的狀態,分別是 `<C` `=C` `>C`,這個狀態可以在進行收尋時,比起 trie 的收尋方法減少很多比較的次數,可以加速搜索效率。 ->![](https://i.imgur.com/jbE62z6.png)->![](https://i.imgur.com/ZKqWsCF.png) >以上為接續上圖的模式進行以下步驟: delete CABC -> deleate CABD 可以由上看出,當我在選擇刪除 CABD 後,剩下的兩個子節點會由較小的子節點當作比較的根節點,因此在刪除時,在選擇根節點時,會出現一個情況,就是由以上圖看的出來,當 A 為根節點時,只能選擇 `=A`和`>A`,這會變成跟 BST 的缺點狀況一樣,然而在 tst 下會發生這個狀況的機率很低,只有在碰到 A 當根節點時才會發生。 ### 討論將 TST 引入到電話簿或戶政管理系統的適用性 將 tst 引入到電話簿或戶政管理系統的適用性,我覺得有以下幾點關於電話簿和戶政管理系統的狀況可以先討論: * 觀察[中華黃頁電話簿](https://www.iyp.com.tw/)的分類模式,先分成生活、文化、工商、工業和社會服務,再由以上幾點細分成更多細項。就電話簿的編制上,每一類分類只有將各縣市類似性質的職業整合在一起,但就號碼上沒有甚麼相似性,只有市號相同。 * 觀察[戶政司](https://www.ris.gov.tw/doorplateX/)的村里街路門牌查詢,有兩項搜尋模式,一是[以編釘日期、編釘類別查詢](https://www.ris.gov.tw/doorplateX/map?searchType=date),另一個是[以部分街路門牌查詢](https://www.ris.gov.tw/doorplateX/map?searchType=doorplate),就此兩種不同模式,在收尋上比較偏向是同類層級數量多,舉例:台北市->中正區->文祥里...,在搜尋區里級別時,就會出現很多分支,搜尋方向不同會使結果多樣化。 了解要分析的狀況後,來思考一下 tst 收尋的特色: * 每一收尋層級有三個子節點 * $O(n)$ 的搜索時間 * Delete 和 Insert 的方式與搜索時間差,比 BST 小。 就以上特色,對應到電話簿上,會發現不同職業性質種類的繁多,這會大幅增加 tst 的收尋層級,並且增加搜索時間,就三個子節點而言,要建出每個店名都不同的樹,光是要收尋號碼,在店名上就要多消耗很多時間,電話簿的性質確實不大適合使用 tst 。而戶政司上的門牌編號,我覺得在 insert 和 Delete 時,使用 trie 的方式會比 tst 更有效率,畢竟使用 trie 在執行 insert 和 Delete 時,不大需要改變樹的結構,但在收尋時間上 tst 會比較有效率,概念就與這次功課的為何要使用 tst 相似。 ==參考:== * [rex662624的範例](https://hackmd.io/K4v3wF-8SFudCT2-ckK9Hg) * [Ternary Search Tree 視覺化](https://www.cs.usfca.edu/~galles/visualization/TST.html) --- ## 實作 ### 修改 Makefile 這裡的作業要求是`測試程式碼應該接受 --bench 的執行參數,得以在沒有使用者輸入的前題下,對 tst 程式碼進行效能分析,應涵蓋 cpy 和 ref 兩種途徑 (詳情參閱 tst.h 的程式碼註解)`。一開始看到要寫 Makefile 有點惶恐,畢竟之前只用過沒寫過,點開來看都是`@ $ % ^`這些奇怪的東西,滿腦子問號,幸好上網查了一下能稍微了解了,但在寫 Makefile 上仍不熟悉,之後會花時間好好研究。言歸正傳,功課需要把 --bench 執行參數的判斷寫在 test_cpy.c 及 test_ref.c 當中, 一旦判斷式成立就進行效能評估, 並在 makefile 裡面增加 `bench` 要觸發的指令, 如下: ``` BEN = \ test_cpy \     test_ref bench: $(BEN)     ./test_cpy --bench;     ./test_ref --bench; ``` 這裡的 bench 是用來分析**每次**做搜索所花的時間長度,搭配 test_cpy.c 和 test_ref.c 完成這個 bench 命令。在 test_cpy.c 和 test_ref.c 中加入: ``` C= if (argc == 2 && strcmp(argv\[1\], "--bench") == 0) { int stat = bench_test(root,BENCH\_TEST\_FILE, LMAX,t2-t1,1); tst_free(root); return stat; } ``` 將以上 bench_test()寫在 bench.c 裡,並讓 Makefile 在編譯時,一起編譯。 接著還有參考`raygoah`同學的共筆,發現可以用之前在 phonebook 所用的 Pref 來分析效能,命令test_cpy.c 及 test_ref.c **各做1000收尋**後,可以看他的 cache-misses 的狀況,和每次跑搜尋時所花的時間。搜尋的字寫在 TEST_D 中可更改。 ``` TEST_D = s Tai pref: $(TESTS)     echo 3 | sudo tee /proc/sys/vm/drop_caches;     perf stat --repeat 1000 \ -e cache-misses,cache-references,instructions,cycles \ ./test_ref --bench $(TEST_D)     perf stat --repeat 1000 \ -e cache-misses,cache-references,instructions,cycles \ ./test_cpy --bench $(TEST_D) ``` ==參考:== * [簡單學 makefile](http://mropengate.blogspot.tw/2018/01/makefile.html) * [跟我一起寫 makefile](http://wiki.ubuntu.org.cn/index.php?title=%E8%B7%9F%E6%88%91%E4%B8%80%E8%B5%B7%E5%86%99Makefile:MakeFile%E4%BB%8B%E7%BB%8D&variant=zh-hant) * [raygoah](https://hackmd.io/dwDOR7H2Q8qyxGxSOsc-vA?both) 同學的共筆 * [ZixinYang](https://hackmd.io/s/S1UlwCTnZ#)同學的共筆 ## 修改 test_ref.c 除了上面在寫 Makefile 上有更改到 test_ref 外,仔細看會找到 FIXME ,因此先看這一部份。有 FIXME 的地方上都有`tst_ins_del()`,去看一下在 tst.c 裡,這 function 被寫了什麼: ```C= if (*p++ == 0) { if (cpy) { /* allocate storage for 's' */ const char *eqdata = strdup(*s); if (!eqdata) return NULL; curr->eqkid = (tst_node *) eqdata; return (void *) eqdata; } else { /* save pointer to 's' (allocated elsewhere) */ curr->eqkid = (tst_node *) *s; return (void *) *s; } } ``` 看這邊可以看的出來,若是跑 cpy 是跑 if 判斷室內的,比起 else 內的多了一行 `const char *eqdata = strdup(*s); `,阿原來在 ref 內跟 cpy 都使用一樣的函數,但使用內容受到 function 影響,因此輸入有差,造成一開始輸入就只有丟 word 的第一個字過去,沒有內容。那直覺一點,將函式有使用到的值改掉就好拉~: ```clike= - p = word; + p = strdup(word); res = tst_ins_del(&root, &p, INS, CPY); ``` 這樣把所有有 FIXME 的地方都改掉後,眼尖就覺得`tst_ins_del()`內的 CPY 怪怪的,看了其人的共筆後,發現改成 REF 才是對阿!因此才又再改了一遍: ```clike= -res = tst_ins_del(&root, &p, INS, CPY); +res = tst_ins_del(&root, &p, INS, REF); ``` 再跑一次程式後可以執行搜尋了,但在離開時總是怪怪的,會 core dumped ,回頭看程式碼,離開時是按`q`,而 `case q`裡只包含了`tst_free_all(root)`,因此去看看 tst.c 裡面這函式的全貌,會發現它會做`free(p->eqkid)`,清掉了到儲存字串的位置,而下面剛好有個`tst_free(tst_node *p)`沒有做這件事,因此改用這個函式。 ```clike= -tst_free_all(root); +tst_free(root); ``` 再執行程式後,沒問題拉可以好好的跳出來~ ==參考:== * [ZixinYang](https://hackmd.io/o6NZRCRBTBuU42-SyMwDXA?both) * [strdup](http://en.cppreference.com/w/c/experimental/dynamic/strdup) ## 效能分析 分析效能方面,我們先看 pref 的比較,在這階段我想先各做100次並將每次搜尋的時間紀錄下來。 ### pref s Tai CPY ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 1,601,235 cache-misses # 57.855 % of all cache refs ( +- 0.36% ) 2,767,680 cache-references ( +- 0.11% ) 532,161,204 instructions # 0.95 insns per cycle ( +- 0.01% ) 559,055,387 cycles ( +- 0.24% ) 0.186330545 seconds time elapsed ( +- 0.94% ) ``` REF ``` Performance counter stats for './test_ref --bench s Tai' (100 runs): 1,733,662 cache-misses # 59.082 % of all cache refs ( +- 0.59% ) 2,934,353 cache-references ( +- 0.41% ) 568,578,335 instructions # 0.96 insns per cycle ( +- 0.00% ) 592,256,378 cycles ( +- 0.58% ) 0.195488493 seconds time elapsed ( +- 0.76% ) ``` 從這邊可以看的出來 REF 的 `cache-miss` 的比率比 CPY 大,但在我的電腦所呈獻的效果差不了多少。於是我決定各跑1000次,並用 gunplot 畫了他們分佈的圖片: ![](https://i.imgur.com/4ta3xXZ.png) 但這張圖實在是看不出來什麼,於是我花了一些時間,寫了calulate_pref.c 這程式就為了分析這張圖。我的設想是將這 1000 筆資料的時間長度分成兩百等份,從 0.000005 開始往上遞加到0.001,然後去累積每個等分內有多少比資料在這範圍內,這類似 Histogram 的分析方法,就有了以下這張圖: ![](https://i.imgur.com/hAV2CVg.png) 看了以後,確實清楚多了,能看到他們在時間上的分佈, ref 確實比 cpy 更花時間,雖然波形相似,且從 0.000005 到 0.001 內分佈平均,但實際上 ref 的資料確實會花更多時間。 ### bench_test 在 bench 分析上我主要是重現[ZixinYang](https://hackmd.io/o6NZRCRBTBuU42-SyMwDXA?both)的實驗,但是我發現他的分析重點似乎出了一點問題。我將 bench 內做每一次 tst 搜尋返回的時間存起,且取用前0~5000 次的時間大小,得到了以下的圖: ![](https://i.imgur.com/1NHdI6O.png) 後來想想其實這樣的作法沒什麼意義,搜尋時間不大會因為硬體效能或 CPY 和 REF 不同而有差,而對此我覺得`ZixinYang`同學再這個部份確實沒有分析到什麼,但確實我在看時沒有想到結果。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully