changed 8 years ago
Linked with GitHub

2017q1 Homework1 (phonebook)

contributed by <bananahuang>
作業要求

Reviewed by claaaaassic

  • Github 上的 commit message 盡量打清楚這段程式碼修改的目的,使用 test 為標題會讓別人看不懂為何做這段修改,以及自己也會過一段時間就忘記原因
  • 可以嘗試使用 hash function 改善效能,可使 findName() 時間大幅縮短
  • 在你解決 memory leak 的地方有提出一段程式碼,這邊會把你原先建立出來的鏈接串列從頭開始,每一個不等於 NULL 的節點都會先讓它指向他的下一個節點,然後讓它執行 free(),建議 while 結束可以加上一行 free(pHead); ,才能讓所有 malloc 出來的都釋放
while (pHead->pNext) {
    entry *next = pHead->pNext;
    pHead->pNext = next->pNext;
    free(next);
}
free(pHead);

開發環境

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"

$lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Model name:            Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
Stepping:              7
CPU MHz:               3245.917
CPU max MHz:           3400.0000
CPU min MHz:           1600.0000
BogoMIPS:              6186.21
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3

了解電腦的系統架構圖

指令:
$ sudo apt-get install hwloc
$ lstopo

安裝相關開發工具

2017 年春季作業說明

工具熟悉: perf

step1:

檢查是否啟用 perf

$ cat "/boot/config-uname -r" | grep "PERF_EVENT"

終端機輸出:

CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_HAVE_PERF_EVENTS=y`
CONFIG_PERF_EVENTS=y`
CONFIG_HAVE_PERF_EVENTS_NMI=y

step2:

安裝 perf
$ sudo apt-get install linux-tools-common

step3:

使用 perf list
$ perf list

  List of pre-defined events (to be used in -e):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]

step4(法1):

使用圓周率程式測試 perf

注意:用原本 terminal 收尋完 pid 要開另外一個terminal,並且要在圓周率的 process 尚未結束前在新開的terminal打上perf top -p pid,如果怕來不及可以在圓周率程式裡增加迴圈

原本 terminal
指令:
$ g++ -c perf_top_example.c
$ g++ perf_top_example.o -o example
$ ./example

新開 terminal
指令:
$perf top -p 8628

錯誤版本(當process已經執行完了在新開的視窗再打perf top -p pid會無效)

step4(法2):

指令:
$ ./ "執行檔案名稱" & sudo perf top -p $!

打完 password 就會出現下圖

reference

Documents/Perf FAQ
perf 原理和實務

工具熟悉: gnuplot(針對未改善前 phonebook 使用 gnuplot 練習)

$ cd phonebook
$ make phonebook
$ make(編譯)
$ make run(測試)
$ make plot
接下來進入 phonebook 就會看到用gnuplot建立的圖表

開發目標:

1.減少 cache miss 情形

keyword:cache line
cache line

改善方法1
將 lastname 提出來變成一個struct,收尋時先比對 lastname

閱讀程式碼

main.c

  • IMPL
  • diff_in_second:算時間比較精準
  • assert:後面判斷為真繼續執行,為否停止執行

reference
louielu

效能分析(優化前)

提醒:在執行程式(phonebook_orig 和 phonebook_opt) 前,先清空 cache

指令:$ echo 1 | sudo tee /proc/sys/vm/drop_caches
執行:$ ./phonebook_orig

顯示append()及findName()所執行時間

size of entry : 136 bytes
execution time of append() : 0.057090 sec
execution time of findName() : 0.005835 sec

找 cache-misses 發現高達90.506%

$ make cache-test(makefile 有寫好)

Performance counter stats for './phonebook_orig' (100 runs):

203,8137      cache-misses       #   90.506 % of all cache refs     ( +-  0.06% )
225,1932      cache-references                                      ( +-  0.03% )
2,6169,4498   instructions       #    1.30  insns per cycle         ( +-  0.02% )
2,0144,5323   cycles                                                ( +-  0.09% )

0.064792767 seconds time elapsed                                    ( +-  0.38% )

鎖定想優化的目標可以使用 perf stat,下面的數字可以自行設定要 run 幾次

$ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./執行檔

 Performance counter stats for './phonebook_orig' (5 runs):

205,2030      cache-misses         #   90.374 % of all cache refs      ( +-  0.28% )
227,0589      cache-references                                         ( +-  0.22% )
2,6144,2354      instructions      #    1.29  insns per cycle          ( +-  0.11% )
2,0344,6395      cycles                                                ( +-  0.19% )

0.064507173 seconds time elapsed                                       ( +-  1.21% )

透過 perf 找熱點:$./phonebook_orig & sudo perf top -p $!

  • findName 佔了24.67%而 append 佔了1.47%,但是 append 執行時間比 findName 長
  24.67%  phonebook_orig  [.] findName                                         
   6.09%  libc-2.23.so    [.] _IO_fgets                                        
   5.82%  libc-2.23.so    [.] _int_malloc                                      
   5.54%  phonebook_orig  [.] main                                             
   3.39%  libc-2.23.so    [.] __memcpy_sse2                                    
   3.32%  [kernel]        [k] page_fault                                       
   2.36%  [kernel]        [k] handle_mm_fault                                  
   2.16%  [kernel]        [k] clear_page                                       
   1.55%  [kernel]        [k] unmap_page_range                                 
   1.53%  [kernel]        [k] free_hot_cold_page                               
   1.47%  phonebook_orig  [.] append                                           
   1.29%  libc-2.23.so    [.] __strcpy_sse2_unaligned                          
   1.23%  [kernel]        [k] policy_zonelist                                  
   1.12%  libc-2.23.so    [.] memchr                                           
   1.03%  libc-2.23.so    [.] _IO_getline_info                                 
   0.85%  [kernel]        [k] __do_page_fault                                  
   0.84%  [kernel]        [k] __rmqueue.isra.80                                
   0.83%  [kernel]        [k] nv04_fifo_intr                                   
   0.79%  [kernel]        [k] __internal_add_timer                             
   0.79%  libc-2.23.so    [.] __strcasecmp_avx 

效能分析(改變 struct)

方法1:更改原先 struct

typedef struct __PHONE_BOOK_ENTRY { //char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } other_entry; typedef struct entry{ char lastName[MAX_LAST_NAME_SIZE]; other_entry *information; struct entry *pNext; } entry entry *findName(char lastName[], entry *pHead); entry *append(char lastName[], entry *e);

執行:$ ./phonebook_opt

  • 顯示 append()及 findName()所執行時間
size of entry : 32 bytes
execution time of append() : 0.049802 sec
execution time of findName() : 0.002554 sec

cache test:$ make cache-test

  • 發現 cache-misses 降到57.459%
 Performance counter stats for './phonebook_opt' (100 runs):

           37,6305     cache-misses     # 57.459 % of all cache refs   ( +-  0.12% )
           65,4911     cache-references                                ( +-  0.13% )
       2,4415,7636     instructions     #  1.71  insns per cycle       ( +-  0.02% )
       1,4276,3682     cycles                                          ( +-  0.28% )

       0.047078644 seconds time elapsed                                ( +-  0.72% )


entry size 縮小對 cache miss 影響

Lid cache size = 32kB, Cache line=64bytes

計算 Cache line個數
32 * 1024 / 64= 512

改善結構前,每筆 entry 有136bytes,所以一次能存入Ld1 cache size 只有
64 * 512/ 136= 大約 240筆 Entry

根據大小改善的原始結構後

每筆Entry剩下32Bytes,因此一次存入L1d的數量增加為
64 * 512 / 32 大約 1024筆 Entry

觀察是否有資料留在heap上,造成memory leak

安裝工具:$sudo apt install valgrind
執行:$ valgrind --leak-check=full ./phonebook_opt

==3854== HEAP SUMMARY:
==3854==     in use at exit: 11,196,768 bytes in 349,899 blocks
==3854==   total heap usage: 349,906 allocs, 7 frees, 11,207,152 bytes allocated   
==3854== 
==3854== 11,196,768 (32 direct, 11,196,736 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 2
==3854==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3854==    by 0x400D50: append (in /home/huangtingshieh/phonebook/phonebook_opt)
==3854==    by 0x400AC8: main (in /home/huangtingshieh/phonebook/phonebook_opt)
==3854== 
==3854== LEAK SUMMARY:
==3854==    definitely lost: 32 bytes in 1 blocks
==3854==    indirectly lost: 11,196,736 bytes in 349,898 blocks
==3854==      possibly lost: 0 bytes in 0 blocks
==3854==    still reachable: 0 bytes in 0 blocks
==3854==         suppressed: 0 bytes in 0 blocks
==3854== 
==3854== For counts of detected and suppressed errors, rerun with: -v
==3854== Use --track-origins=yes to see where uninitialised values come from
==3854== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)

分析:

total heap usage:程式配置新記憶體片段349906次但只釋放7次,且有11,196,768 bytes 留在 heap 上

解決memory leak

原本

if (pHead->pNext) free(pHead->pNext); free(pHead);

修改(尚未十分理解此為參考寫法) <0140454>

while(pHead->pNext) { entry *next = pHead->pNext; pHead->pNext = next->pNext; free(next);

解決了大部份 memory leak

==4014== HEAP SUMMARY:
==4014==     in use at exit: 32 bytes in 1 blocks
==4014==   total heap usage: 349,906 allocs, 349,905 frees, 11,207,152 bytes allocated
==4014== 
==4014== 32 bytes in 1 blocks are definitely lost in loss record 1 of 1
==4014==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4014==    by 0x400A44: main (in /home/huangtingshieh/phonebook/phonebook_opt)
==4014== 
==4014== LEAK SUMMARY:
==4014==    definitely lost: 32 bytes in 1 blocks
==4014==    indirectly lost: 0 bytes in 0 blocks
==4014==      possibly lost: 0 bytes in 0 blocks
==4014==    still reachable: 0 bytes in 0 blocks
==4014==         suppressed: 0 bytes in 0 blocks
==4014== 
==4014== For counts of detected and suppressed errors, rerun with: -v
==4014== Use --track-origins=yes to see where uninitialised values come from
==4014== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)

效能也有改善一點

reference
網站資源:
淺談memory cache
共筆閱讀:
Hsiang
0140454

效能分析(使用 hash function)

新增phonebook_hash.c

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #include "phonebook_hash.h" /* FILL YOUR OWN IMPLEMENTATION HERE! */ /* original version */ entry *findName(char lastName[], entry *pHead[]) { /* TODO: implement */ unsigned int key=BKDRHash(lastName)%3571; while (pHead[key] != NULL) { if (strcasecmp(lastName, pHead[key]->lastName) == 0){ printf("%15s is found\n",lastName); return pHead[key]; } pHead[key] = pHead[key]->pNext; } return NULL; } entry *append(char lastName[], entry *e[]) { unsigned int key=BKDRHash(lastName)%3571; /* allocate memory for the new entry and put lastName */ e[key]->pNext = (entry *) malloc(sizeof(entry)); e[key] = e[key]->pNext; strcpy(e[key]->lastName, lastName); e[key]->pNext = NULL; return e[key]; } unsigned int BKDRHash(char lastName[]) { unsigned int seed = 131; // 31 131 1313 13131 131313 etc.. unsigned int hash = 0; while (*lastName) { hash = hash * seed + (*lastName++); } return hash; }

新增phonebook_hash.h

#ifndef _PHONEBOOK_H #define _PHONEBOOK_H #define MAX_LAST_NAME_SIZE 16 /* TODO: After modifying the original version, uncomment the following * line to set OPT properly */ #define HASH 1 typedef struct __PHONE_BOOK_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } phonebookdetail; typedef struct entry { char lastName[MAX_LAST_NAME_SIZE]; //__PHONE_BOOK_ENTRY phonebookdetail *detail; struct entry *pNext; } entry; entry *findName(char lastName[], entry *pHead[]); entry *append(char lastName[], entry *e[]); unsigned int BKDRHash(char lastname[]); #endif

修改runtime.gp

reset set ylabel 'time(sec)' set style fill solid set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime.png' plot [:][:0.150]'output.txt' using 2:xtic(1) with histogram title 'original', \ '' using ($0-0.06):($2+0.001):2 with labels title ' ', \ '' using 3:xtic(1) with histogram title 'optimized' , \ '' using 4:xtic(1) with histogram title 'hash' , \ '' using ($0+0.3):($3+0.0015):3 with labels title ' ', \ '' using ($0+0.4):($4+0.0015):4 with labels title ' '

修改main.c

修改calculate.c

修改Makefile

效能改進

Performance counter stats for './phonebook_hash' (100 runs):

           29,7280      cache-misses              #   41.245 % of all cache refs      ( +-  0.09% )
           72,0775      cache-references                                              ( +-  0.11% )
       2,4553,5637      instructions              #    1.51  insns per cycle          ( +-  0.02% )
       1,6262,9456      cycles                                                        ( +-  0.07% )

       0.051042278 seconds time elapsed                                          ( +-  0.61% )

reference
hash function 觀念和實務
Hash Function
為何hash seed 是131
哈希表之bkdrhash算法解析及扩展
Which hashing algorithm is best for uniqueness and speed?
各种字符串Hash函数比较

非常重要:算法与数据结构,有HASH各種介紹
參考共筆:
Hsiang
changyuanhua

未來工作

先閱讀共筆後,再進行更多實作
陳品睿
yenWu
Hsiang
twzjwang
louielu
0140454
Weinux
hugikun999
ktvexe
0xff07
jeff60907

同步 Github fork 出来的分支

step1
先登入 github,對要 fork 的專案按下右上角的 fork

step2
打開 terminal,打上$ git clone 剛剛fork的URL
$ git clone https://github.com/bananahuang/phonebook.git

step3
$ git remote -v 羅列遠端主機並顯示網址

origin	https://github.com/bananahuang/phonebook.git (fetch)
origin	https://github.com/bananahuang/phonebook.git (push)

step4
顯示當前狀態
$git status
顯示

On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   phonebook_opt.c
	modified:   phonebook_opt.h

no changes added to commit (use "git add" and/or "git commit -a")

step5
$ git add phonebook_opt.c
$ git add phonebook_opt.h
顯示

On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   phonebook_opt.c
	modified:   phonebook_opt.h

step6
$ git commit(local workspace)會觸發 git hooks astyle, cppcheck
test 是自己的註解,要打上做了什麼修改,打錯了可以用$git commit --amend修正
最後註解會上傳github
How to Write a Git Commit Message

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Your branch is up-to-date with 'origin/master'.
#
# Changes to be committed:
#       modified:   Makefile
#       modified:   math-toolkit.h
#
# Untracked files:
#       out.jpg
#       output.jpg
#       output.png
#


[master 8f2853d] test
 2 files changed, 25 insertions(+), 5 deletions(-)

step7
$ git status
顯示

On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean

step8
上傳到遠端儲存庫 $ git push
顯示

Username for 'https://github.com': brennenhuang
Password for 'https://brennenhuang@github.com': 
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 760 bytes | 0 bytes/s, done.
Total 4 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/bananahuang/phonebook.git
   88e1942..8f2853d  master -> master


reference
ssh key
ssh key 觀念
30 天精通 Git 版本控管
10分鐘掌握Git常用命令,做一名有知識的程式設計師
Git 教學(1) : Git 的基本使用
介绍如何使用git及gitlab
Git
Git遠程操作詳解
gnuplot
How to use makefile
etc276
如果再github建立一個空的resp,那麼再本地端要直接上傳檔案的話,用這個

[code review]

coding skill:

while()?
if(==NULL)=>if(!..)
strcmp意義比較字串後是回傳>0=0<0可能是把ascII相減

other

malloc可能配置不到記憶體位置要小心(可用man malloc去看)
c/C++已經大不同
hash function 有六種 數學意義??
=>加速資料查詢的速度?
http://blog.xuite.net/yh96301/blog/242333268-安裝Ubuntu+16.04
http://marktwtn.blogspot.tw/2014/05/diy-win8-ubuntu.html
CPU cache miss wiki
思維提升:
反藍標示 可以變成orig hash bst全都標示在main.c變成一個struct

Select a repo