ujiet
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2018q1 Homework1 (phonebook) contributed by <`ujiet`> ### Reviewed by `potter903p` * [git commit message](https://github.com/ujiet/phonebook/commit/0442dbf2d942afbda2373a370f6f0a9b2b68362c) 只有標題,沒有敘述內容的實做。 參考資料:[如何寫好 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/) * 即使嘗試 Cache Alignment 後沒有得到顯著的進步成果但是還是可以把它 push 到 github 上當作這一次嘗試的紀錄。 * 既然知道 malloc 每次都會浪費一些記憶體來當做系統管理的資訊,建議可以參考使用 memory pool 來取代 malloc ,每次安排固定大小的連續記憶體片段來減少不連續時所造成的效能浪費。 參考資料:[ Memory pool](https://www.codeproject.com/Articles/27487/Why-to-use-memory-pool-and-how-to-implement-it) ### Reviewed by `AliennCheng` * 採取的改善作法多專注於硬體的特性上,有時間的話也可以嘗試演算法上的優化,例如改用各種其他的資料結構等等 * 輸入及操作使用上也都只有使用原本給的單一輸入,是否可以嘗試實際的電話簿資料操作,看看結果如何 * 一開始 memory leak 的問題來自原始碼本身,[原出處已更新](https://www.facebook.com/groups/system.software2018/permalink/375496642916504/) * 參考資料的超連結如果可以附上參考來源的標題或更明確的名稱,會比使用 `ref` 更直觀 ---- ## 基本資料 開發環境 ``` Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz Stepping: 9 CPU MHz: 3500.000 CPU max MHz: 4100.0000 CPU min MHz: 800.0000 BogoMIPS: 7008.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 ``` 清空 cache ``` $ echo 1 | sudo tee /proc/sys/vm/drop_caches ``` 原始版本輸出測試 ``` $ sudo make cache-test ``` 結果 ``` size of entry : 136 bytes execution time of append() : 0.035937 sec execution time of findName() : 0.004291 sec Performance counter stats for './phonebook_orig' (100 runs): 4,471,526 cache-misses # 89.792 % of all cache refs ( +- 0.05% ) 4,979,863 cache-references ( +- 0.06% ) 274,907,715 instructions # 1.36 insn per cycle ( +- 0.02% ) 201,661,803 cycles ( +- 0.09% ) 0.051002481 seconds time elapsed ( +- 0.73% ) ``` GNU plot 結果 ![](https://i.imgur.com/3xwL5r9.png) --- ## 構想一 - 減少 entry size 參考 [`ryanpatiency`](https://hackmd.io/s/SJONH8fuz#) 和 [`ryanwang522`](https://hackmd.io/s/Sk-qaWiYl#) 前輩的方法,將 node 中除了 lastName 以外的資料放進另一個 struct 中,方法如下: ```clike= typedef struct __PHONE_BOOK_DETAIL_ENTRY { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detailEntry; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detailEntry *detail; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 結果 ``` size of entry : 32 bytes execution time of append() : 0.026747 sec execution time of findName() : 0.001482 sec Performance counter stats for './phonebook_opt' (100 runs): 1380800 cache-misses # 61.855 % of all cache refs ( +- 0.46% ) 2232311 cache-references ( +- 0.09% ) 254380021 instructions # 1.97 insn per cycle ( +- 0.02% ) 129337744 cycles ( +- 0.09% ) 0.032863948 seconds time elapsed ( +- 0.17% ) ``` ![](https://i.imgur.com/rGcNKzV.png) Size of entry 從 136 降到 32 bytes 後,可看到 Miss rate 從原本的 89.79% 降低到 61.86% ,有著將近 28% 的進步。但對照一看,可以發現和`ryanwang522`的 40.11% 相距甚遠。 >原因不明,回頭再研究... >[name=ujiet] --- ## 構想二 - Cache Alignment 由於想知道這些 node 在 main memory 裡的配置狀況,於是在 main.c 的最後,釋放 node 空間之前,加入一小段程式碼測試: ```clike= entry *pp = pHead; printf("\n"); for (int j=0; j<8; j++) { printf("%p\n", pp); pp = pp -> pNext; } ``` 此程式碼應列出前 8 個 node 的記憶體位置。用不到時以註解隱藏。 將 Makefile 裡的重複次數改為 1 方便測試,正式求結果時再改回來。 對 orig 和 opt 各擷取一段測試結果: ``` ... size of entry : 136 bytes execution time of append() : 0.047045 sec execution time of findName() : 0.004405 sec 0x55a399920490 0x55a399921940 0x55a3999219d0 0x55a399921a60 0x55a399921af0 0x55a399921b80 0x55a399921c10 0x55a399921ca0 ... size of entry : 32 bytes execution time of append() : 0.027053 sec execution time of findName() : 0.001509 sec 0x560a50e0e490 0x560a50e0f8e0 0x560a50e0f910 0x560a50e0f940 0x560a50e0f970 0x560a50e0f9a0 0x560a50e0f9d0 0x560a50e0fa00 ... ``` 觀察發現,除了第一個 node 的指標 pHead 之外,其餘 node 可以說是 Contiguous Allocation ,且實際佔有空間如下。 | | Size of entry ( bytes ) | Mem Allocated ( bytes ) | | :--------: | :--------: | :--------: | | orig | 136 | 144 | | opt | 32 | 48 | 而 cache line size 由[這裡](https://stackoverflow.com/questions/794632/programmatically-get-the-cache-line-size)得知 `$ getconf LEVEL1_DCACHE_LINESIZE` 得到 L1 cache line size ,結果為 64 bytes 。 --- 仔細看後發現,明明每個 entry 只需要 32 bytes ,但不知為何 OS 配置了 48 bytes 給它。 於是將 opt 中多出來的 16 個 bytes 以 16 進制印出來後,得到: `0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0` 其中的 31 在 ASCII 中為 unit operator,但還是不知從何而來。 由於功力太差無從解決這多出來的 16 bytes ,只好改以手動調整讓它對齊。 參考[這裡](https://stackoverflow.com/questions/227897/how-to-allocate-aligned-memory-only-using-the-standard-library),將 phonebook_opt.c 中稍微修改如下: ```clike= if ((int64_t)e & 0x3f){ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; } else { entry *align = (entry *) malloc(sizeof(entry)+32); e->pNext = (entry *)(((int64_t)align+64) & ~ (int64_t)0x3f); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; } return e; ``` 跑`$ sudo make cache-test` 的結果為: ``` Performance counter stats for './phonebook_opt': 16,587 cache-misses # 25.891 % of all cache refs 64,065 cache-references 878,855 instructions # 0.74 insn per cycle 1,180,181 cycles 0.080130732 seconds time elapsed ``` Miss rate 大幅降低至 25.891% ,但基本上由於嚴重的 Memory leak ( 如下 ),會有大量的錯誤訊息產生,就連 `$ make run` 和 `$ make plot` 都無法正確執行出來,因此不建議嘗試。 ``` $ sudo valgrind ./phonebook_opt ==9799== HEAP SUMMARY: ==9799== in use at exit: 20,997,504 bytes in 349,901 blocks ==9799== total heap usage: 349,905 allocs, 4 frees, 21,003,728 bytes allocated ==9799== ==9799== LEAK SUMMARY: ==9799== definitely lost: 17,504,256 bytes in 273,504 blocks ==9799== indirectly lost: 0 bytes in 0 blocks ==9799== possibly lost: 2,097,088 bytes in 32,767 blocks ==9799== still reachable: 1,396,160 bytes in 43,630 blocks ==9799== suppressed: 0 bytes in 0 blocks ==9799== Rerun with --leak-check=full to see details of leaked memory ``` 到此,這個構想算是失敗了,只得再接再厲。 :::danger 不要輕易說「不建議嘗試」,上面的實驗充其量只能說明你的程式有瑕疵,不能證明想法是錯誤的,開發紀錄主要給自己看,讓旁人有機會陪你思考。就這樣淺嘗輒止的話,不符合科學精神。請繼續分析 :notes: jserv ::: >好的,我的原意只是說明上述的程式碼有錯誤且不確定是否會對系統造成影響,因此不建議其他同學沒有修改過就執行,可能讓老師有點誤解了。我會繼續嘗試改善的。 >[name=ujiet] --- ## 構想二之改善 看到[這篇文章](http://bbs.bccn.net/thread-82212-1-1.html)才解決上面的疑惑。以下節錄: >malloc()申请的时候实际上占用的内存要比申请的大。因为超出的空间是用来记录对这块内存的管理信息。…… > >大多数实现所分配的存储空间比所要求的要稍大一些,额外的空间用来记录管理信息——分配块的长度,指向下一个分配块的指针等等。这就意味着如果写过一个已分配区的尾端,则会改写后一块的管理信息。这种类型的错误是灾难性的,但是因为这种错误不会很快就暴露出来,所以也就很难发现。将指向分配块的指针向后移动也可能会改写本块的管理信息。…… - 簡言之,每次進行 malloc() ,系統都會配置比你所宣告的更多一點的空間,並在該空間的最後面存放系統所需的管理資訊。構想二裡那多出來的 16 bytes 中,後面 8 個 byte (0xf0000000) 即是管理資訊,剩下的 8 個 byte 則是只是 padding。因此若將 padding 省去,估計每個 node 只會多出 8 bytes。一般如果不是像此題這樣,實作 link list 且每一個 node 都重新 malloc() 一次,的確是很有可能被忽略的地方。 - 我試著將原 node 中,指向 detailEntry 的指標拿掉之後,entry size 變成了 24 bytes,而實測位址果然變成間隔 32 bytes,證實了我的猜想。 - 如此一來,上面程式的發生錯誤,應該是將指標向後移之後,新配置的空間和尚未釋放的記憶體空間重疊造成的錯誤。重新修改程式碼之後,總算是完成了對齊。 ```clike= entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ if (((int64_t)e & 0x1f) != 0x00){ e->pNext = (entry *) malloc(sizeof(entry)+64); void *new_addr = (void *) (((int64_t)&(e->pNext)+64) & ~(int64_t)0x3f); void *old_addr = (void *) e->pNext; free(e->pNext); char *padding = (char *) malloc((int64_t)new_addr - (int64_t)old_addr); e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; free(padding); } else { e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; } return e; } ``` - 我的構想是,在第一次建 node 時就先向後做好對齊,之後的 node 就可以連續的配置下去。中間在配置對齊過的位址時,必須先將原本重疊的記憶體釋放掉,以免產生錯誤。 - `$ sudo make plot` 實測結果,可以看到位址很好地對齊了。 ``` execution time of append() : 0.028094 sec execution time of findName() : 0.001281 sec 0x55efd8a3f490 0x7fe2000008c0 0x7fe2000008e0 0x7fe200000900 0x7fe200000920 0x7fe200000940 0x7fe200000960 0x7fe200000980 Performance counter stats for './phonebook_opt' (100 runs): 966,105 cache-misses # 60.500 % of all cache refs ( +- 0.73% ) 1,596,872 cache-references ( +- 0.10% ) 257,834,774 instructions # 1.99 insn per cycle ( +- 0.02% ) 129,517,327 cycles ( +- 0.07% ) 0.032958023 seconds time elapsed ( +- 0.14% ) ``` - 結果和原來只刪減 node size 並沒有太大區別,圖畫出來也幾乎看不出差異。猜想上次做出的結果可能只是因為程式碼錯誤導致的量測異常。只能再想看看有沒有辦法改進。 ![](https://i.imgur.com/5q7APwC.png) - 分析了一下 main.c 中的程式碼,在[這裡](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)列出了 __builtin__clear_cache 的功能。 >Built-in Function: _void_ **\_\_builtin\_\_\_clear_cache** _(char *begin, char *end)_ > >This function is used to flush the processor’s instruction cache for the region of memory between begin inclusive and end exclusive. Some targets require that the instruction cache be flushed, after modifying memory containing code, in order to obtain deterministic behavior. > >If the target does not require instruction cache flushes, `__builtin___clear_cache` has no effect. Otherwise either instructions are emitted in-line to clear the instruction cache or a call to the `__clear_cache` function in libgcc is made. 大意就是 flush 掉某個區間內的 instruction cache 。實測把該行註解掉後,似乎沒有明顯差異,所以在此先不改動。裡面還有提到 prefetch ,晚點再看。 --- ## reference - [CPU cache alignment](https://http://blog.csdn.net/zhang_shuai_2011/article/details/38119657) - [ref](http://mark-shih.blogspot.tw/2012/10/compiler-data-alignment.html)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully