Uno Wu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2017q3 Homework2 (prefix-search) ###### tags: `sysprogram` contributed by <`st9007a`> ## [Github](https://github.com/st9007a/prefix-search) ### Reviewed by `amikai` - 建議提及理論的 prefix search 以及 TST, 在關聯到實做, 報告會比較有完整性 - 使用 `s` 來搜尋 `Tai` 的結果, 建議提供正確的版本作為對照, 可以用一些 grep 的方式之類的 - tst_traverse_fn 這個函式老師以提供了, 建議先用原本的函式的 code 做舉例 (e.g. 印出整棵樹的值),在提到 javascipt 的應用會比較流暢, 直接跳到 javascript 舉例好像怪怪的 以上是小弟淺見, 如有冒犯請見諒 ### Reviewed by `HTYISABUG` * C 有提供 `getopt.h` 幫助編程者管理 option, 直接用 `argv` 取得 argument 不夠嚴謹 ## 開發環境 ``` Linux 4.4.0-92-generic Ubuntu 16.04.1 LTS L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K ``` ## 修正 test_ref.c 首先打開,`test_ref.c` 會發現需要修正的地方都使用 copy 的方式 ```c tst_ins_del(&root, &p, INS, CPY) ``` 如果直接把 `CPY` 改成 `REF` 然後執行的話,雖然能正確編譯,但是執行結果是錯誤的,以下是使用 `s` 來搜尋 `Tai` 的結果 ``` suggest[961] : Tai suggest[962] : Tai suggest[963] : Tai suggest[964] : Tai suggest[965] : Tai suggest[966] : Tai suggest[967] : Tai suggest[968] : Tai suggest[969] : Tai suggest[970] : Tai ``` 檢查 `tst.c` 的 `tst_ins_del` ```c void *tst_ins_del(tst_node **root, char *const *s, const int del, const int cpy) { ... for (;;) { ... if (*p++ == 0) { if (cpy) { /* allocate storage for 's' */ const char *eqdata = strdup(*s); if (!eqdata) return NULL; curr->eqkid = (tst_node *) eqdata; return (void *) eqdata; } else { /* save pointer to 's' (allocated elsewhere) */ curr->eqkid = (tst_node *) *s; return (void *) *s; } } pcurr = &(curr->eqkid); } } ``` 從上面程式碼可以看到如果 `cpy` 被設為 0 時,`*s` 會被直接存進去,而配合 `test_ref.c` 的程式碼,`*s` 對映到 `word` 這個 pointer to character,換句話說所有 insert 操作的節點都存到以 `word` 為起點的連續記憶體位址的其中一個位址,所以修正方法就很明顯了,就是不要讓他們都存取到 `word` 的位址 ```c p = strdup(word); tst_ins_del(&root, &p, INS, REF); ``` >> `strdup` 的代價要考慮,對 cache locality 有衝擊。不能只比較 prefix search 時間,建立 TST 的時間和空間開銷也該考慮。 >> [name="jserv"][color=red] >> 更新嘗試使用連續記憶體的方式來取代 strdup >> [name="st9007a"] 我的作法是在 `word` assign 給 `p` 之前用 `strdup` 重新分配一段空間,如此就能保證每次 `p` 的值是不同的記憶體位址,由於 `tst_ins_del` 裡面有針對 `*s` 是否為 null 做判斷,所以這邊就不對 `p == null` 的狀況做檢查,編譯出來後執行結果就正常了 ``` suggest[28] : Taivalkoski, suggest[29] : Taivassalo, suggest[30] : Taiwala, suggest[31] : Taiwan suggest[32] : Taixing, suggest[33] : Taiynsha, suggest[34] : Taiyuan, suggest[35] : Taizhou, ``` ## 效能評比 寫一段程式來針對 prefix 長度為 3 的所有可能組合作測試 ```c static const char *upper_case_alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; static const char *lower_case_alphabet = "abcdefghijklmnopqrstuvwxyz"; int bench_test(const tst_node *root, char *out_file, const int max) { char prefix[3]; char **sgl; FILE *fp = fopen(out_file, "w"); int idx = 0, sidx = 0; double t1, t2; if (!fp) { fprintf(stderr, "error: file open failed '%s'.\n", out_file); return 1; } sgl = (char **)malloc(sizeof(char *) * max); for (int i = 0; i < 26; i++) { prefix[0] = upper_case_alphabet[i]; for (int j = 0; j < 26; j++) { prefix[1] = lower_case_alphabet[j]; for (int k = 0; k < 26; k++) { prefix[2] = lower_case_alphabet[k]; t1 = tvgetf(); tst_search_prefix(root, prefix, sgl, &sidx, max); t2 = tvgetf(); idx++; fprintf(fp, "%d %.6f sec\n", idx, t2 - t1); } } } fclose(fp); return 0; } ``` 回傳值代表程式執行結果,是給 main function 回傳用的,將這個 function 應用在 `test_cpy.c` 跟 `test_ref.c` 中 ```c if (argc == 2 && strcmp(argv[1], "--bench") == 0) { int stat = bench_test(root, BENCH_TEST_FILE, LMAX); tst_free_all(root); return stat; } ``` 在 `Makefile` 中加入 `$ make bench` ``` BIN = \ test_cpy \ test_ref bench: $(BIN) @for test in $(BIN); do\ ./$$test --bench; \ done ``` 首先先比較看看建立 tst 的速度 ``` $ make bench CPY: ternary_tree, loaded 259112 words in 0.093352 sec REF: ternary_tree, loaded 259112 words in 0.098250 sec $ make bench CPY: ternary_tree, loaded 259112 words in 0.089738 sec REF: ternary_tree, loaded 259112 words in 0.109493 sec $ make bench CPY: ternary_tree, loaded 259112 words in 0.089678 sec REF: ternary_tree, loaded 259112 words in 0.096827 sec ``` 執行三次可以看到 cpy 的方法都比 ref 快 $0.05\sim0.1$ 秒 ![](https://i.imgur.com/Rvwkg9Y.png) 上圖為針對 prefix 長度為 3 的所有可能做搜尋的執行速度比較,x 軸的數字代表的是第幾次搜尋,看起來好像差不多,找兩個比較明顯起伏的區間來看看 ![](https://i.imgur.com/k7JiQyp.png) ![](https://i.imgur.com/XhGKYgr.png) 這兩個區間分別是 $1000\sim2000$ 跟 $12000\sim14000$,從這兩個區間可以看出來 cpy 的執行速度比 ref 快一點 ### 修改 bench_test 由於原本的 `bench_test` 用三個 for loop 來測試所有字串長度為 3 的 prefix,這樣的作法會讓程式碼可讀性變差,而且也不符合實際狀況,所以改成用一個字典來儲存要測試的 prefix,這裡的字典取自 `cities.txt` 前 10000 筆資料 ```c while (fscanf(dict, "%s", word) != EOF) { if (strlen(word) < 4) continue; strncpy(prefix, word, 3); t1 = tvgetf(); tst_search_prefix(root, prefix, sgl, &sidx, max); t2 = tvgetf(); fprintf(fp, "%d %.6f sec\n", idx, t2 - t1); idx++; } ``` ## 修正 test_ref.c (Part 2) 由於之前的做法使用了 `strdup` 來避免參考到同一段位址的問題,但是根據第一周 phonebook 作業的結論,使用非連續記憶體來存取龐大的資料會導致cache miss 機率提高,所以這裡換一個方法來修正 `test_ref.c` 中建立 tst 的部分 ``` enum { INS, DEL, WRDMAX = 256, STKMAX = 512, LMAX = 1024, DICTMAX = 300000 }; char (*cities_dict)[WRDMAX] = malloc(sizeof *cities_dict * DICTMAX); while ((rtn = fscanf(fp, "%s", cities_dict[idx])) != EOF) { char *p = cities_dict[idx]; if (!tst_ins_del(&root, &p, INS, REF)) { fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } idx++; } ``` 這裡使用一個字串陣列 `cities_dict` 來存取資料,如此可以保證資料不會參考到同一段位址,同時一次性的分配記憶體也比每次 `strdup` 有效率多了,接著執行後選擇 `q` 這個指令會發現有錯誤 ``` *** Error in `./test_ref': free(): invalid pointer: 0x00007fa76abe4f10 *** ``` 研究了一下發現原本 `q` 這個指令所使用的 api 是 `tst_free_all`,這個 function 的註解有說明這個 api 是用來釋放儲存在內部的資料用的,但是現在的做法等同於把資料存在 `cities_dict` 這個變數中,所以要改成使用 `tst_free` 這個 api,註解有提到這個 api 是用來釋放儲存在外部的資料 再來是新增資料的部分,保持資料都存在 `cities_dict` 內,所以用 `strcpy` 把讀到的字複製進去 ```c strcpy(cities_dict[idx], word); p = cities_dict[idx]; ``` 最後刪除資料的部分,由於 cpy 跟 ref 的操作只會在新增時有差別,所以不對 `word` 做任何處理 ## 分析 copy / reference 兩種方式的效能 這裡利用 `$ perf stat` 分別對 `$ ./test_cpy --bench` 以及 `$ ./test_ref --bench` 做測試 ``` Performance counter stats for './test_ref --bench' (100 runs): 1,9833,7717 cache-misses # 58.596 % of all cache refs ( +- 0.03% ) 3,3848,2694 cache-references ( +- 0.03% ) 55,1851,5511 instructions # 0.98 insns per cycle ( +- 0.00% ) 56,5946,9075 cycles ( +- 0.02% ) 2.193260663 seconds time elapsed ( +- 0.05% ) ``` ``` Performance counter stats for './test_cpy --bench' (100 runs): 1,6970,8821 cache-misses # 54.942 % of all cache refs ( +- 0.05% ) 3,0888,9526 cache-references ( +- 0.07% ) 55,4853,7937 instructions # 1.11 insns per cycle ( +- 0.00% ) 49,8513,9041 cycles ( +- 0.04% ) 1.866943152 seconds time elapsed ( +- 0.15% ) ``` 可以看到 cpy 的效能比 ref 還好,有意思的是 ref 的 cache miss 比 cpy 還高,這邊可以分成兩部分來探討: 1. 建立 tst 2. prefix search > 不要搞錯動名詞使用的場合,抽象詞彙場合用 cache miss,而探討在特定場合的行為時,才用 cache missing,如 [Cache Missing for Fun and Profit](http://www.daemonology.net/papers/cachemissing.pdf) > [name="jserv"][color=red] ### 建立 tst 建立 tst 時,cpy 透過重複存取 `word` 來完成 tst,而 ref 則是在 `cities_dict` 上移動來完成 tst,以重複存取變數的效率來說 cpy 應該是比較好的 再來考慮到空間使用的開銷,cpy 只要維持 tst 的結構就好,而 ref 除了 tst 以外,還需要額外保存 `cities_dict` 這個變數,因此空間使用的效率上也是 cpy 比較好 ### prefix search 雖然 ref 的 key 是由連續記憶體來存取,但是一個 `word` 的長度為 256 bytes,資料尺寸過大讓 cache 沒辦法有好的效率,接著是 search 時在節點上的移動是在非連續的記憶體上,所以 prefix search 並沒有讓 ref 佔太大優勢 ## Benchmark 分析與改進 原本拿來做測試的用的字典檔是拿原本的字典檔前 10000 筆資料的前 3 個字元去做 prefix 的測試,其測試結果可以在上一節看到,但是字典檔的排序是隨機的,在這種狀況下沒辦法有效測試 cache hit 的效能( temporal locality ),所以將測試用字典檔做排序後重新測試 ``` Performance counter stats for './test_cpy --bench' (100 runs): 1,0217,7314 cache-misses # 34.198 % of all cache refs ( +- 0.20% ) 2,9878,5062 cache-references ( +- 0.07% ) 1.408432966 seconds time elapsed ( +- 0.13% ) Performance counter stats for './test_ref --bench' (100 runs): 1,2274,2242 cache-misses # 37.353 % of all cache refs ( +- 0.16% ) 3,2860,1004 cache-references ( +- 0.08% ) 1.606459157 seconds time elapsed ( +- 0.16% ) ``` 以上為新的測試結果,兩種方法的 cache miss 都有下降不少,其原因就是這種測試方式讓資料的存取有 temporal locality 的特性 ## 引入 Memory Pool 參考 [tina0405 同學共筆](https://hackmd.io/JwDhDMAYHYGMBYC0sDMliPgNgIa0QEbgBMKiWkWF4KWArBQIxA==#),嘗試將之前用字串陣列來存取外部資料的方式改成用 memory pool 來存取外部資料 ```clike typedef struct __MEM_POOL MemPool; MemPool *new_mem_pool(size_t size); char *poalloc(size_t size, MemPool *p); char *get_pool_curr(MemPool *p); void mv_pool_curr(int dist, MemPool *p); ``` 以上為 `mempool.h` 內容,由於在建立 tst 時,若使用 `poalloc()` 會需要在額外做 `strcpy()` 把 `word` 複製到 `p` 中,所以額外設計 `get_pool_curr()` 跟 `mv_pool_curr()` 來取代建立 tst 時的 memory pool allocation ``` Performance counter stats for './test_cpy --bench' (100 runs): 1,0299,9538 cache-misses # 34.449 % of all cache refs ( +- 0.20% ) 2,9898,9009 cache-references ( +- 0.05% ) 1.399240554 seconds time elapsed ( +- 0.16% ) Performance counter stats for './test_ref --bench' (100 runs): 1,1457,0305 cache-misses # 34.869 % of all cache refs ( +- 0.15% ) 3,2857,3670 cache-references ( +- 0.05% ) 1.549336182 seconds time elapsed ( +- 0.15% ) ``` 其結果 reference 的方式較原本下降了 3% 左右 ## 問題討論 ### 詳細觀察 tst.h 和 tst.c 為何要 forward declaration 呢?分離介面和實作有何好處呢?這樣的思維又該如何運用於 phonebook 呢?提出想法並且實作 forward declaration 跟分離介面可以在保持相同介面的狀況下抽換 context,在介面被大量使用的狀況下只需修改實作介面的的部分即可讓所有使用介面的程式碼套用,forward declaration 也有封裝資料的作用,能保護 struct 的欄位不會被直接存取 在第一周的 phonebook 的作業中可以實作一個通用介面 `phonebook.h` ```clike typedef struct __PHONE_BOOK_ENTRY entry; entry *findName(char lastName[], entry *pHead); void append(char lastName[], entry *pHead); ``` 然後依照不同介面實作方式來寫在不同的 .c 檔,但是寫完後基本上編譯是不會過的,因為 `main.c` 裡面會直接操作 struct 的欄位,由於 main.c 只有 include 通用介面,而通用介面並沒有做出完整的 struct 所以會出現 `incomplete type` 的錯誤 修正方法可以在通用介面中加入存取 struct 欄位的 function 來取代直接操作 struct 欄位 ```clike char *getLastName(entry *pNode); ``` 相關程式碼已經更新在 [github](https://github.com/st9007a/phonebook) ### 研究程式碼的 tst_traverse_fn 函式,並思考如何運用這個實作,注意到 callback function 的程式開發技巧/模式 [wikipedia](https://en.wikipedia.org/wiki/Callback_(computer_programming)) 提到 callback 就是將一段可執行的程式碼(即 function )作為參數傳入另一個 function,可以被應用到不少場合,比如說 Javascript 操作 Array 的 function 就有大量使用 callback (sort, filter, map ... 等等) ```javascript var arr = [1, 3, 2, 60, 0]; var sorted = arr.sort(function (a, b) { return a - b; }); // 排序陣列 var lessThan3 = arr.filter(function (el) { return el < 3; }); // 過濾掉陣列中大於等於 3 的 element var mapTo = arr.map(function (el) { return el + 1; }); // 將陣列的每個 element 加 3 ``` 在 tst_traverse_fn 中,如果不傳入 fn 的話,他就只會單純的遍歷整棵樹,可以傳入一個 fn 來印出每個節點的資訊,如此就能利用 tst_traverse_fn 來 debug ## 參考資料 [stackoverflow -- Malloc a 2D array in C](https://stackoverflow.com/questions/36890624/malloc-a-2d-array-in-c) [wikipedia -- callback (computer programming)](https://en.wikipedia.org/wiki/Callback_(computer_programming)) [tina0405 同學共筆](https://hackmd.io/JwDhDMAYHYGMBYC0sDMliPgNgIa0QEbgBMKiWkWF4KWArBQIxA==#) [Memory and Locality](https://www.cs.cornell.edu/courses/cs3110/2010sp/lectures/lec24.html)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully