陳家年
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2018q3 Homework3 (dict) ## 作業目標 * 在 GitHub 上 fork [dict](https://github.com/sysprog21/dict),研讀並修改 `Makefile` 以允許 `$ make plot` 時,視覺化 ternary search tree + bloom filter 的效能表現。注意,需要設計實驗,允許不同的字串輸入並探討 TST 回應的統計分佈 * 要考慮到 prefix search 的特性還有詞彙的涵蓋率 * 限用 gnuplot 來製圖 * 參閱 [Hash Table Benchmarks](http://incise.org/hash-table-benchmarks.html) * 以 perf 分析 TST 程式碼的執行時期行為並且充分解讀 * 考慮到 CPY 和 REF 這兩種實作機制,思考效能落差的緣故,並且提出後續效能改善機制 * 充分量化並以現代處理器架構的行為解讀 * 注意:這兩者都該加入 Bloom filter,而且分析具體效益 (原有程式存在若干缺陷,也該指出並修正) ## 作業筆記 * 只要在系統輸入Key,便能找到相對應的Value,這就是Dictionary的基本概念。 * Ternary Search Tree 概念上等於 binary tree + trie tree ## 輸出有些問題 ``` allen@Mac:~/dict$ ./test_cpy ternary_tree, loaded 259112 words in 0.149099 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: Taiwan Taiwan not found. Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: q allen@Mac:~/dict$ ls bench.c bloom.h cities.txt Makefile test_cpy test_ref tst.c bench.h bloom.o cpy.txt README.md test_cpy.c test_ref.c tst.h bloom.c calculate.c LICENSE scripts test_cpy.o test_ref.o tst.o allen@Mac:~/dict$ ./test_ref ternary_tree, loaded 259112 words in 0.190092 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: s find words matching prefix (at least 1 char): Tai Tai - not found Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ``` ## Bloom Filter [Bloom Filter 影片解說](https://www.youtube.com/watch?v=bEmBh1HtYrw) ### False Positive 機率 False Positive 直白翻譯"假正確"。舉例:程式搜尋資料庫完畢後,告訴我們搜尋的東西存在資料庫裡面,但事實上資料庫裡面不存在搜尋的東西。下面這張圖片有比較好的解釋。 ![](https://i.imgur.com/4nbkY1b.png) * Table 中某個位址經過 all of elements 設定後還是 false 的機率是 ==$(1-\dfrac{1}{m})^{kn}$== * m : Table 大小 * n : elements 數量 * k : hash funtions 數量 * False Positive = all k bits of new element are already set = ==$(1-e^{-\frac{kn}{m}})^k$== * 就是丟進去新的元素發現經過 hash functions 後產生出的位置在 table 中都被設定為 True ,會讓我誤以為原來新的元素已經存在在資料庫,明明不在,卻說存在 -> False Positive ## 20181008 進度 :::info test_ref.c : tst + bloom filter test_cpy.c : tst ::: * test_ref.c 裡面的 bloom filter 是用來新增字串與查詢字串在不在,但不知道實際字串是甚麼( 實際字串就是自己搜尋的,哈哈)。主要還是透過 tst.h 來做 prefix search 與,應該還有更好的結合。 ![](https://i.imgur.com/xwamm9n.jpg) ## 20181009 進度 ### 作業問題 * 考慮到 CPY 和 REF 這兩種實作機制,思考效能落差的緣故,並且提出後續效能改善機制 * 充分量化並以現代處理器架構的行為解讀 * 注意:這兩者都該加入 Bloom filter,而且分析具體效益 (原有程式存在若干缺陷,也該指出並修正) :::warning 作業的第三點如上, REF 是 CPY 加進 bloom filter 功能。不明白的是 Bloom filter 要在加進去 CPY ?這樣不就是 REF ?還是意思是希望我們想如何讓 Bloom filter 有更好的結合與 tst ::: * intel 7700k ``` Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數:2 每通訊端核心數:4 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 158 Model name: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz 製程: 9 CPU MHz: 800.027 CPU max MHz: 4500.0000 CPU min MHz: 800.0000 BogoMIPS: 8400.00 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 8192K NUMA node0 CPU(s): 0-7 ``` L1d + L1i + L2 + L3 = 8512 8512/1024 = 8.3125 MB 與 [Intel® Core™ i7-7700K spec 描述一樣](https://ark.intel.com/zh-tw/products/97129/Intel-Core-i7-7700K-Processor-8M-Cache-up-to-4-50-GHz-) `$sudo make test` * `./test_ref performance` ``` Performance counter stats for './test_ref --bench s Tai' (100 runs): 458,0888 cache-misses # 41.055 % of all cache refs ( +- 0.17% ) 1115,7977 cache-references ( +- 0.12% ) 5,9634,0103 instructions # 1.17 insn per cycle ( +- 0.00% ) 5,0979,4107 cycles ( +- 0.11% ) 0.115807915 seconds time elapsed ( +- 0.12% ) ``` * `./test_cpy performance` ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 415,1947 cache-misses # 43.241 % of all cache refs ( +- 0.71% ) 960,1796 cache-references ( +- 0.33% ) 5,3562,9405 instructions # 1.22 insn per cycle ( +- 0.00% ) 4,3817,3754 cycles ( +- 0.33% ) 0.100212966 seconds time elapsed ( +- 0.35% ) ``` * cache-misses : 代表 cache 沒辦法提供給 memory 多餘空間的次數,換個想法就是, cpu 在 cache 中找不到想要的東西 * cache-references : 代表 cache 命中 * 藉由 insn per cycle 與 of all cache refs 這兩個指標可以了解到 cpu 執行的指令的速度與 cache 有沒有符合 cpu 的預期給資料 * [GitHub reference](https://stackoverflow.com/questions/12601474/what-are-perf-cache-events-meaning) * [CSDN reference](https://blog.csdn.net/witsmakemen/article/details/17715775) * CSDN 應該已經 ban 掉 140.116.xxx.xxx,學校網路都進不去,可以透過家用網路,或手機都還看的到連結。 ___ * MacBook air ``` Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數:2 每通訊端核心數:2 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 61 Model name: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz 製程: 4 CPU MHz: 1385.380 CPU max MHz: 2700.0000 CPU min MHz: 500.0000 BogoMIPS: 3199.98 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 3072K ``` * `./test_ref performance` ``` Performance counter stats for './test_ref --bench s Tai' (100 runs): 3,426,304 cache-misses # 51.351 % of all cache refs ( +- 0.79% ) 6,672,330 cache-references ( +- 0.55% ) 596,753,595 instructions # 1.09 insn per cycle ( +- 0.00% ) 549,840,002 cycles ( +- 0.59% ) 0.208506766 seconds time elapsed ( +- 0.66% ) ``` * `./test_cpy performance` ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 3,210,788 cache-misses # 52.980 % of all cache refs ( +- 0.87% ) 6,060,422 cache-references ( +- 0.65% ) 535,947,375 instructions # 1.10 insn per cycle ( +- 0.01% ) 487,229,023 cycles ( +- 0.64% ) 0.187340189 seconds time elapsed ( +- 0.76% ) ``` 可以發現到在跑這兩個程式時,這兩台電腦的確在 cpu 及 cache 效能有反應出來都是 高時脈及大快取的 i7-7700k 跑出來效能較好 ## 作業筆記紀錄 * 有些程式慢是因為計算量太大,其多數時間都應該在使用 CPU 進行計算,這叫做 CPU bound 型;有些程式慢是因為過多的 IO ,這種時候其 CPU 利用率應該不高,這叫做 IO bound 型;對于 CPU bound 程式的調校和 IO bound 的調校是不同的,出問題的方式不一樣的。 * [reference](https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/index.html) * cpy.txt 與 ref.txt 紀錄 load cities.txt 時間理論上兩個時間應該要差不多 * `fopen()` 以附加 `a` 的方式打開只寫文件。若文件不存在,則會創造該文件。如果文件存在,寫入的資料會延續上次的繼續寫,即文件原先的内容會被保留。 20181024 作業紀錄 * 一個 ref 與 cpy 差在 mempry pool 這個,之前在寫其他程式時有看過,但不了解是什麼,路慢慢,路漫漫。 * plot 可以針對者兩個在 prefix 搜尋 比較效能, 當初想的很簡單,以為只差別在 bloom filter 。但後來應該是兩總實做都要加上去,可好好比較 ref 與 cpy 差別 * Bloom filter 常用再檢查這個垃圾郵件,檢查這個地址是不是有出現在之前的黑名單,等等...。之前有想到一個很笨的困擾我很久的問題,就是 Bloom filter 儲存資料並非實際把資料存進去,有點像是利用編碼,結合 hash table 來找尋是否存在,所以我很好奇那就算知道在,那資料呢?突然恍然大悟,資料就是自己查詢的資料,只是有機率會是錯的,取決你用 table size 還有 hash functin 數量。如果是要做bloom filter 結合 prefix search 就不能。 Reference: * [CPU Cache 原理探討](https://hackmd.io/s/H1U6NgK3Z#) * [Perf 使用](https://hk.saowen.com/a/b6b4110cddc0a8d10a9a5a14a44e9ab3317f88324c133bca50d1fc0ce73afed5) * [Perf 中文說明](https://wizardforcel.gitbooks.io/100-gdb-tips/modify-pc-register.html) ___ `static void rmcrlf(char *s)` ```c= static void rmcrlf(char *s) { size_t len = strlen(s); if (len && s[len - 1] == '\n'){ s[--len] = 0; } ``` `static void rmcrlf(char *s)` 是因為在原本使用 fgets 時候會多包含一個 '\n' ,在這邊做去除。實際為什麼會多一個需要再查詢 `fgets()` 實際怎麼用 。 ___ ```= allen@Mac:~/dict$ ./test_ref --bench ternary_tree, loaded 259112 words in 0.185805 sec --bench-success-- 程式記憶體區段錯誤 (core dumped) ``` :::danger 修改 ./test_ref ,讓他可以使用 --bench ,會出現 core dumped 。 在 search 與 bench 時,都沒有去新增或減少記憶體,只有查詢而已。 而在 ./test_ref 做 prefix search 時,也沒發生這個問題。 ::: ##### 找出問題點 開啟 Core Dump 功能 ```= mec@mec-MS-7A63:~/JN_work/embededHW/dict$ ulimit -c unlimited mec@mec-MS-7A63:~/JN_work/embededHW/dict$ sudo ./test_ref --bench bloom filter build success ternary_tree, loaded 259112 words in 0.100380 sec 程式記憶體區段錯誤 (core dumped) mec@mec-MS-7A63:~/JN_work/embededHW/dict$ sudo gdb ./test_ref core ... Core was generated by `./test_ref --bench'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007ffa9a299532 in __GI___libc_free (mem=0x7ffa7b555eea) at malloc.c:2967 2967 malloc.c: 沒有此一檔案或目錄. ``` 不知道錯誤是什麼意思,覺得是應該有存取到不該存取的記憶體位置, free() 到不應該 free() 的地方。 * [Reference](https://hk.saowen.com/a/f8959115092480021e7adf7b9c23f9e6a40e8b611735e06f8a06ddbbd022413d) ___ ![](https://i.imgur.com/0YCatfS.png) * prefix search compare

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully