contributed by <bananahuang>
作業要求
claaaaassic
free(pHead);
,才能讓所有 malloc 出來的都釋放while (pHead->pNext) {
entry *next = pHead->pNext;
pHead->pNext = next->pNext;
free(next);
}
free(pHead);
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
$lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Model name: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
Stepping: 7
CPU MHz: 3245.917
CPU max MHz: 3400.0000
CPU min MHz: 1600.0000
BogoMIPS: 6186.21
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
指令:
$ sudo apt-get install hwloc
$ lstopo
檢查是否啟用 perf
$ cat "/boot/config-uname -r
" | grep "PERF_EVENT"
終端機輸出:
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_HAVE_PERF_EVENTS=y`
CONFIG_PERF_EVENTS=y`
CONFIG_HAVE_PERF_EVENTS_NMI=y
安裝 perf
$ sudo apt-get install linux-tools-common
使用 perf list
$ perf list
List of pre-defined events (to be used in -e):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
使用圓周率程式測試 perf
注意:用原本 terminal 收尋完 pid 要開另外一個terminal,並且要在圓周率的 process 尚未結束前在新開的terminal打上perf top -p pid
,如果怕來不及可以在圓周率程式裡增加迴圈
原本 terminal
指令:
$ g++ -c perf_top_example.c
$ g++ perf_top_example.o -o example
$ ./example
新開 terminal
指令:
$perf top -p 8628
錯誤版本(當process已經執行完了在新開的視窗再打perf top -p pid
會無效)
指令:
$ ./ "執行檔案名稱" & sudo perf top -p $!
打完 password 就會出現下圖
$ cd phonebook
$ make phonebook
$ make
(編譯)
$ make run
(測試)
$ make plot
接下來進入 phonebook 就會看到用gnuplot建立的圖表
1.減少 cache miss 情形
keyword:cache line
cache line
改善方法1:
將 lastname 提出來變成一個struct,收尋時先比對 lastname
main.c
reference
louielu
提醒:在執行程式(phonebook_orig 和 phonebook_opt) 前,先清空 cache
$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ ./phonebook_orig
size of entry : 136 bytes
execution time of append() : 0.057090 sec
execution time of findName() : 0.005835 sec
$ make cache-test
(makefile 有寫好)
Performance counter stats for './phonebook_orig' (100 runs):
203,8137 cache-misses # 90.506 % of all cache refs ( +- 0.06% )
225,1932 cache-references ( +- 0.03% )
2,6169,4498 instructions # 1.30 insns per cycle ( +- 0.02% )
2,0144,5323 cycles ( +- 0.09% )
0.064792767 seconds time elapsed ( +- 0.38% )
$ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./執行檔
Performance counter stats for './phonebook_orig' (5 runs):
205,2030 cache-misses # 90.374 % of all cache refs ( +- 0.28% )
227,0589 cache-references ( +- 0.22% )
2,6144,2354 instructions # 1.29 insns per cycle ( +- 0.11% )
2,0344,6395 cycles ( +- 0.19% )
0.064507173 seconds time elapsed ( +- 1.21% )
$./phonebook_orig & sudo perf top -p $!
24.67% phonebook_orig [.] findName
6.09% libc-2.23.so [.] _IO_fgets
5.82% libc-2.23.so [.] _int_malloc
5.54% phonebook_orig [.] main
3.39% libc-2.23.so [.] __memcpy_sse2
3.32% [kernel] [k] page_fault
2.36% [kernel] [k] handle_mm_fault
2.16% [kernel] [k] clear_page
1.55% [kernel] [k] unmap_page_range
1.53% [kernel] [k] free_hot_cold_page
1.47% phonebook_orig [.] append
1.29% libc-2.23.so [.] __strcpy_sse2_unaligned
1.23% [kernel] [k] policy_zonelist
1.12% libc-2.23.so [.] memchr
1.03% libc-2.23.so [.] _IO_getline_info
0.85% [kernel] [k] __do_page_fault
0.84% [kernel] [k] __rmqueue.isra.80
0.83% [kernel] [k] nv04_fifo_intr
0.79% [kernel] [k] __internal_add_timer
0.79% libc-2.23.so [.] __strcasecmp_avx
typedef struct __PHONE_BOOK_ENTRY {
//char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} other_entry;
typedef struct entry{
char lastName[MAX_LAST_NAME_SIZE];
other_entry *information;
struct entry *pNext;
} entry
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
執行:$ ./phonebook_opt
size of entry : 32 bytes
execution time of append() : 0.049802 sec
execution time of findName() : 0.002554 sec
cache test:$ make cache-test
Performance counter stats for './phonebook_opt' (100 runs):
37,6305 cache-misses # 57.459 % of all cache refs ( +- 0.12% )
65,4911 cache-references ( +- 0.13% )
2,4415,7636 instructions # 1.71 insns per cycle ( +- 0.02% )
1,4276,3682 cycles ( +- 0.28% )
0.047078644 seconds time elapsed ( +- 0.72% )
Lid cache size = 32kB, Cache line=64bytes
計算 Cache line個數
32 * 1024 / 64= 512
改善結構前,每筆 entry 有136bytes,所以一次能存入Ld1 cache size 只有
64 * 512/ 136= 大約 240筆 Entry
根據大小改善的原始結構後
每筆Entry剩下32Bytes,因此一次存入L1d的數量增加為
64 * 512 / 32 大約 1024筆 Entry
安裝工具:$sudo apt install valgrind
執行:$ valgrind --leak-check=full ./phonebook_opt
==3854== HEAP SUMMARY:
==3854== in use at exit: 11,196,768 bytes in 349,899 blocks
==3854== total heap usage: 349,906 allocs, 7 frees, 11,207,152 bytes allocated
==3854==
==3854== 11,196,768 (32 direct, 11,196,736 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 2
==3854== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3854== by 0x400D50: append (in /home/huangtingshieh/phonebook/phonebook_opt)
==3854== by 0x400AC8: main (in /home/huangtingshieh/phonebook/phonebook_opt)
==3854==
==3854== LEAK SUMMARY:
==3854== definitely lost: 32 bytes in 1 blocks
==3854== indirectly lost: 11,196,736 bytes in 349,898 blocks
==3854== possibly lost: 0 bytes in 0 blocks
==3854== still reachable: 0 bytes in 0 blocks
==3854== suppressed: 0 bytes in 0 blocks
==3854==
==3854== For counts of detected and suppressed errors, rerun with: -v
==3854== Use --track-origins=yes to see where uninitialised values come from
==3854== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)
total heap usage:程式配置新記憶體片段349906次但只釋放7次,且有11,196,768 bytes 留在 heap 上
原本
if (pHead->pNext) free(pHead->pNext);
free(pHead);
修改(尚未十分理解此為參考寫法) <0140454>
while(pHead->pNext) {
entry *next = pHead->pNext;
pHead->pNext = next->pNext;
free(next);
解決了大部份 memory leak
==4014== HEAP SUMMARY:
==4014== in use at exit: 32 bytes in 1 blocks
==4014== total heap usage: 349,906 allocs, 349,905 frees, 11,207,152 bytes allocated
==4014==
==4014== 32 bytes in 1 blocks are definitely lost in loss record 1 of 1
==4014== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4014== by 0x400A44: main (in /home/huangtingshieh/phonebook/phonebook_opt)
==4014==
==4014== LEAK SUMMARY:
==4014== definitely lost: 32 bytes in 1 blocks
==4014== indirectly lost: 0 bytes in 0 blocks
==4014== possibly lost: 0 bytes in 0 blocks
==4014== still reachable: 0 bytes in 0 blocks
==4014== suppressed: 0 bytes in 0 blocks
==4014==
==4014== For counts of detected and suppressed errors, rerun with: -v
==4014== Use --track-origins=yes to see where uninitialised values come from
==4014== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)
效能也有改善一點
reference
網站資源:
淺談memory cache
共筆閱讀:
Hsiang
0140454
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "phonebook_hash.h"
/* FILL YOUR OWN IMPLEMENTATION HERE! */
/* original version */
entry *findName(char lastName[], entry *pHead[])
{
/* TODO: implement */
unsigned int key=BKDRHash(lastName)%3571;
while (pHead[key] != NULL) {
if (strcasecmp(lastName, pHead[key]->lastName) == 0){
printf("%15s is found\n",lastName);
return pHead[key];
}
pHead[key] = pHead[key]->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e[])
{
unsigned int key=BKDRHash(lastName)%3571;
/* allocate memory for the new entry and put lastName */
e[key]->pNext = (entry *) malloc(sizeof(entry));
e[key] = e[key]->pNext;
strcpy(e[key]->lastName, lastName);
e[key]->pNext = NULL;
return e[key];
}
unsigned int BKDRHash(char lastName[])
{
unsigned int seed = 131;
// 31 131 1313 13131 131313 etc..
unsigned int hash = 0;
while (*lastName) {
hash = hash * seed + (*lastName++);
}
return hash;
}
#ifndef _PHONEBOOK_H
#define _PHONEBOOK_H
#define MAX_LAST_NAME_SIZE 16
/* TODO: After modifying the original version, uncomment the following
* line to set OPT properly */
#define HASH 1
typedef struct __PHONE_BOOK_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} phonebookdetail;
typedef struct entry {
char lastName[MAX_LAST_NAME_SIZE]; //__PHONE_BOOK_ENTRY
phonebookdetail *detail;
struct entry *pNext;
} entry;
entry *findName(char lastName[], entry *pHead[]);
entry *append(char lastName[], entry *e[]);
unsigned int BKDRHash(char lastname[]);
#endif
reset
set ylabel 'time(sec)'
set style fill solid
set title 'perfomance comparison'
set term png enhanced font 'Verdana,10'
set output 'runtime.png'
plot [:][:0.150]'output.txt' using 2:xtic(1) with histogram title 'original', \
'' using ($0-0.06):($2+0.001):2 with labels title ' ', \
'' using 3:xtic(1) with histogram title 'optimized' , \
'' using 4:xtic(1) with histogram title 'hash' , \
'' using ($0+0.3):($3+0.0015):3 with labels title ' ', \
'' using ($0+0.4):($4+0.0015):4 with labels title ' '
Performance counter stats for './phonebook_hash' (100 runs):
29,7280 cache-misses # 41.245 % of all cache refs ( +- 0.09% )
72,0775 cache-references ( +- 0.11% )
2,4553,5637 instructions # 1.51 insns per cycle ( +- 0.02% )
1,6262,9456 cycles ( +- 0.07% )
0.051042278 seconds time elapsed ( +- 0.61% )
reference
hash function 觀念和實務
Hash Function
為何hash seed 是131
哈希表之bkdrhash算法解析及扩展
Which hashing algorithm is best for uniqueness and speed?
各种字符串Hash函数比较
非常重要:算法与数据结构,有HASH各種介紹
參考共筆:
Hsiang
changyuanhua
先閱讀共筆後,再進行更多實作
陳品睿
yenWu
Hsiang
twzjwang
louielu
0140454
Weinux
hugikun999
ktvexe
0xff07
jeff60907
step1
先登入 github,對要 fork 的專案按下右上角的 fork
step2
打開 terminal,打上$ git clone 剛剛fork的URL
$ git clone https://github.com/bananahuang/phonebook.git
step3
$ git remote -v
羅列遠端主機並顯示網址
origin https://github.com/bananahuang/phonebook.git (fetch)
origin https://github.com/bananahuang/phonebook.git (push)
step4
顯示當前狀態
$git status
顯示
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: phonebook_opt.c
modified: phonebook_opt.h
no changes added to commit (use "git add" and/or "git commit -a")
step5
$ git add phonebook_opt.c
$ git add phonebook_opt.h
顯示
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: phonebook_opt.c
modified: phonebook_opt.h
step6
$ git commit
(local workspace)會觸發 git hooks astyle, cppcheck
test 是自己的註解,要打上做了什麼修改,打錯了可以用$git commit --amend
修正
最後註解會上傳github
How to Write a Git Commit Message
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Your branch is up-to-date with 'origin/master'.
#
# Changes to be committed:
# modified: Makefile
# modified: math-toolkit.h
#
# Untracked files:
# out.jpg
# output.jpg
# output.png
#
[master 8f2853d] test
2 files changed, 25 insertions(+), 5 deletions(-)
step7
$ git status
顯示
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working directory clean
step8
上傳到遠端儲存庫 $ git push
顯示
Username for 'https://github.com': brennenhuang
Password for 'https://brennenhuang@github.com':
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 760 bytes | 0 bytes/s, done.
Total 4 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/bananahuang/phonebook.git
88e1942..8f2853d master -> master
reference
ssh key
ssh key 觀念
30 天精通 Git 版本控管
10分鐘掌握Git常用命令,做一名有知識的程式設計師
Git 教學(1) : Git 的基本使用
介绍如何使用git及gitlab
Git
Git遠程操作詳解
gnuplot
How to use makefile
etc276
如果再github建立一個空的resp,那麼再本地端要直接上傳檔案的話,用這個
while()?
if(…==NULL)=>if(!..)
strcmp意義比較字串後是回傳>0=0<0可能是把ascII相減
malloc可能配置不到記憶體位置要小心(可用man malloc去看)
c/C++已經大不同
hash function 有六種 數學意義??
=>加速資料查詢的速度?
http://blog.xuite.net/yh96301/blog/242333268-安裝Ubuntu+16.04
http://marktwtn.blogspot.tw/2014/05/diy-win8-ubuntu.html
CPU cache miss wiki
思維提升:
反藍標示 可以變成orig hash bst全都標示在main.c變成一個struct