2017q1 Homework1 (phonebook)

tags: `embedded`

contributed by <Cayonliow>
tags: perf gnuplot astyle phonebook hash cache

Reviewed by `yanang`

在改變資料結構情況下 (參考劉亮谷的共筆)，可以考慮將資料宣告為 pointer ，在確定有資料進入後在配置記憶體。
使用 hash 的情況下，除了不同的 hash function 以外， size of hash table 也會影響結果。可以嘗試使用不同的 table 大小去觀察實驗結果。

Reviewed by `ryanwang522`

Commit message 可以不用註明日期，github 會有紀錄。另外 Commit 的內容目的以及如何達到可以寫在 message 的內文，這樣日後回頭看也會比較摸得著頭緒，The future maintainer that thanks you may be yourself！
可以嘗試 Binary Search Tree 的方法提昇效能，注意 Cache line 的大小也是需要考慮的因素，參考關於 CPU Cache。

Reviewed by `SeanCCC`

hash table的大小跟查詢效率有直接相關，建議嘗試各種大小。

作業

題目： B01: phonebook
github(原來的): phonebook
hackmd(示範)]示範用的 Phonebook 作業

開發環境

























cayon@cayon-X550JX:~$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Model name:            Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Stepping:              3
CPU MHz:               2423.484
CPU max MHz:           3600.0000
CPU min MHz:           800.0000
BogoMIPS:              5188.45
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7

工具

Perf
- Linux 效能分析工具
- 文章： perf 原理和實務
gnuplot
- 制圖工具
- 文章： gnuplot 語法解說和示範
Astyle
- 代码程式碼格式化工具
- 我們現在要用的的是 K&Rstyle
- 注意： $ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]
  - *.[ch] 代表全部的意思，就不用個別檔案改
  - 改了之後要重新 git add .

"code" 台灣/繁體中文的慣用說法是「程式碼」，請留意 "jserv"

原理與概念

cache
- 文章： cache 原理和實驗
Hash
- 之前上資訊安全課程的時候有接觸過
- 文章：散列函數 (wikipedia) , hash function 觀念和實務 , 哈希表之bkdrhash算法解析及扩展 , 各種字符串Hash函數比較

開發記錄

20-23 Feb
- 開學第一周，什麼都不會,就先把老師的專案 clone 下來看，然後也弄了雙系統，開始試用在 linux 的環境下作業
- 去研究要用到的開發工具
  - perf
    - 常常忘記權限問題（sudo）
  - gnuplot
    - 位移的部分(histogram)還是有一點不懂，還在試
  - Astyle
    - 之前的 coding style 不好，在改了
    - 可是 git hook 一直弄不好
24 Feb
- 突然發現 clone 到老師的專案，這樣會 push 錯地方，所以重新 fork 了一次
- 把 hackmd 再弄了一次，之前的太亂了
25 Feb
- 研究 cache
- 第一個優化版本，參考示範用的 Phonebook 作業

原始版本

執行 phonebook_orig ，得知 append() , findName() 的時間

$ ./phonebook_orig

結果：

size of entry : 136 bytes
execution time of append() : 0.036544 sec
execution time of findName() : 0.004856 sec

檢查 cache misses

$ make cache-test

cache misses 是 85.730%

 Performance counter stats for './phonebook_orig' (100 runs):

           89,3575      cache-misses              #   85.730 % of all cache refs      ( +-  1.33% )
          104,2318      cache-references                                              ( +-  1.54% )
       2,6095,1819      instructions              #    1.38  insns per cycle          ( +-  0.02% )
       1,8911,4788      cycles                                                        ( +-  0.75% )

       0.054679129 seconds time elapsed                                          ( +-  1.12% )

透過 perf 找熱點:

$ ./phonebook_orig & sudo perf top -p $!

結果：
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

避免用圖片表示文字資訊，直接複製終端機資訊貼上來
課程助教

好的，已修正，謝謝助教
Cayon

  23.07%  libc-2.23.so      [.] __strcasecmp_l_avx
  21.91%  phonebook_orig    [.] findName
  11.77%  phonebook_orig    [.] main
   5.50%  libc-2.23.so      [.] _IO_fgets
   4.78%  libc-2.23.so      [.] __memcpy_sse2
   4.23%  libc-2.23.so      [.] _int_malloc
   4.01%  [kernel]          [k] _raw_spin_lock
   2.81%  [unknown]         [k] 0x00007f57f681f02e
   2.55%  libc-2.23.so      [.] malloc
   2.42%  libc-2.23.so      [.] __strcpy_sse2_unaligned
   2.31%  libc-2.23.so      [.] _IO_getline_info
   1.83%  [kernel]          [k] free_pcppages_bulk
   1.78%  [kernel]          [k] __mod_zone_page_state
   1.66%  phonebook_orig    [.] append
   1.22%  [kernel]          [k] page_remove_rmap
   1.12%  [kernel]          [k] __do_page_fault
   1.09%  [kernel]          [k] handle_mm_fault
   0.79%  [kernel]          [k] mem_cgroup_try_charge
   0.61%  [kernel]          [k] free_pages_prepare
   0.61%  [kernel]          [k] uncharge_list
   0.61%  [kernel]          [k] unmap_page_range
   0.58%  phonebook_orig    [.] strcasecmp@plt
   0.56%  [kernel]          [k] perf_event_aux.part.57
   0.54%  [kernel]          [k] clear_page_c_e
   0.54%  phonebook_orig    [.] malloc@plt
   0.54%  libc-2.23.so      [.] memchr
   0.54%  [kernel]          [k] __rmqueue.isra.79
   0.03%  [kernel]          [k] native_write_msr_safe

最後結果
- findName() 所占的時間： 21.91% ，執行時間： 0.004856 sec
- append() 所占的時間： 1.66% ，執行時間： 0.036544 sec
- findName()所占的時間比 append() 多，但是 append() 所花的執行時間卻比 findName() 長

優化版本

進度要加快摟亮谷

版本一：改變資料結構大小

參考示範用的 Phonebook 作業, 發現findName()只有傳入 lastName ，所以在 phonebook_opt.h 裏的 typedef struct __PHONE_BOOK_ENTRY 刪除沒有用到的參數，改變 struct 的大小。

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    /*1at version: Changing the size of the data structure*/
    /*char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];*/
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

執行時間
- entry 的大小從原本的 136bytes 減少到現在的 24bytes
- append() 的執行時間從 0.036544 sec 變成 0.035499 sec
- findName() 的執行時間從 0.004856 sec 變成 0.001753 sec
- append() 的執行時間減少不大， findName() 的執行時間減少很多，可是在真實情況不可能只有lastName, 所有效用不大

size of entry : 24 bytes
execution time of append() : 0.035499 sec
execution time of findName() : 0.001753 sec

cache-misses
- 從原本的 85.73 % 降到現在的 52.099 %
  - 因爲 entry 的大小改變

 Performance counter stats for './phonebook_opt' (100 runs):

            8,6652      cache-misses              #   52.099 % of all cache refs      ( +-  1.09% )
           16,6322      cache-references                                              ( +-  1.11% )
       2,4064,7152      instructions              #    2.03  insns per cycle          ( +-  0.02% )
       1,1844,9973      cycles                                                        ( +-  0.23% )

       0.034387209 seconds time elapsed                                          ( +-  0.27% )

plot
然後試着創一個新的結構，裏頭的結構就只有存 lastName 的 array 、指向子結構（將原本的 entry 變成 sencondary_entry ）的指標跟指向下一個的指標

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    /*2nd version: use a smaller size of structure as the main searching structure*/
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} secondary_entry;

typedef struct __LAST_NAME_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    secondary_entry *detail;
    struct __LAST_NAME_ENTRY *pNext;
} entry;

執行時間
- 執行時間變長了一點，可是這的架構在真實情況下是可行的

size of entry : 32 bytes
execution time of append() : 0.056086 sec
execution time of findName() : 0.001877 sec

cache-misses

 Performance counter stats for './phonebook_opt' (100 runs):

           13,1142      cache-misses              #   44.918 % of all cache refs      ( +-  0.51% )
           29,1957      cache-references                                              ( +-  0.48% )
       2,4393,2710      instructions              #    1.99  insns per cycle          ( +-  0.02% )
       1,2235,4305      cycles                                                        ( +-  0.12% )

       0.035973557 seconds time elapsed                                          ( +-  0.16% )

plot

版本二： (Hash)

使用 BKDRHash 的方式來進行優化
BKDRHash 的程式碼，參考各種字符串Hash函數比較

unsigned int BKDRHash(char *str) {
	unsigned int seed = 131; //(2^n)-1, n>=5
	unsigned int hash = 0;

	while (*str) 
		hash = hash * seed + (*str++);

	return (hash & 0x7FFFFFFF) % TABLE_SIZE;
}

執行時間：
- append() 的執行時間增加了很多，可是 findName() 的執行時間卻減少了更多，所以在整體的效能上是有增長的

size of entry : 32 bytes
execution time of append() : 0.099565 sec
execution time of findName() : 0.000007 sec

cache-misses
- cache misses 是 24.444%,遠遠低於原始版本的 85.730%

 Performance counter stats for './phonebook_opt_hash' (100 runs):

           10,1023      cache-misses              #   24.444 % of all cache refs      ( +-  2.24% )
           41,3287      cache-references                                              ( +-  0.34% )
       2,4139,0624      instructions              #    1.54  insns per cycle          ( +-  0.02% )
       1,5709,8979      cycles                                                        ( +-  0.35% )

       0.045364188 seconds time elapsed                                          ( +-  0.38% )

plot
我嘗試用別的 hash function 來實做
- 我選的是DJBHash

unsigned int DJBHash(char *str)
{
    unsigned int hash = 5381;

    while (*str) {
        hash += (hash << 5) + (*str++);
    }

    return (hash & 0x7FFFFFFF) % TABLE_SIZE;
}

執行時間：
- 發現append() 跟 findName() 的執行時間比 BKDRHash 的來的短

size of entry : 32 bytes
execution time of append() : 0.042704 sec
execution time of findName() : 0.000005 sec

cache-misses
- cache misses 也從 BKDRHash 的 24.44% 下降到 19.33%

 Performance counter stats for './phonebook_opt_hash' (100 runs):

            7,7377      cache-misses              #   19.330 % of all cache refs      ( +-  1.07% )
           40,0295      cache-references                                              ( +-  0.34% )
       2,4100,9693      instructions              #    1.63  insns per cycle          ( +-  0.02% )
       1,4751,4201      cycles                                                        ( +-  0.12% )

       0.042510503 seconds time elapsed                                          ( +-  0.13% )

問答

本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題？

Q: 有代表性嗎？跟真實英文姓氏差異大嗎？
A: 根據對 ./dictionary/words.txt 的觀察，裡面有很多不是非名詞，甚至有一些是無意義的字串，所以代表性不大。檔案裡頭的資料與真實英文姓氏差異不大，裡頭里的確存有很多與真實英文姓氏的字串，只是也同時有很多無意義的字串。
Q: 資料難道「數大就是美」嗎？
A: 個人認為不是，以 ./dictionary/words.txt 來說，資料量很大，可是存在很多無意義的資料，這樣會造成降低搜尋的速度，導致效率下降。數據大而有意義才美。
Q: 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？
A: 尋找真實英文姓氏的字典

2017q1 Homework1 (phonebook)

tags: embedded

Reviewed by yanang

Reviewed by ryanwang522

Reviewed by SeanCCC

作業

開發環境

工具

原理與概念

開發記錄

原始版本

優化版本

版本一： 改變資料結構大小

版本二： (Hash)

問答

tags: `embedded`

Reviewed by `yanang`

Reviewed by `ryanwang522`

Reviewed by `SeanCCC`

版本一：改變資料結構大小