contributed by <ChiuYiTang
>
st9007a
defined(__GNUC__)
跟 diff_in_second
的敘述寫反了MAX_TABLE_SIZE
設定為 1024,可以說明一下選擇這個數字的理由tableHead
這個變數的用意是 ?OPT
來區分使用的優化方法可讀性不高,可以嘗試使用 gcc 的 -D 來處理CFLAGS
來定義 gcc 編譯的優化程度,考慮這個 repo 的使用情境,可以整合成一個 CFLAGS
( 或者作者想比較 hash table方法 + -O0 跟 binary search tree + -O1 之間的效能,如果是的話,這點請無視 )CFLAGS
統一使用的狀況下,Makefile
在產生執行檔的部分可以寫得更精簡findName
的實作中,root == NULL
的判斷提前做可讀性比較高,也可以省掉 current != NULL
跟 if(root)
這兩個判斷main.c
第 20 行再次 define MAX_TABLE_SIZE 的用意是 ? 在 phonebook_opt.h 中已經 define 過了Ubuntu 16.04.5
Linux 4.4.0-96-generic
gcc version 5.4.0 20160609
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
$ lscpu
To format your file you can execute below command:
$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]
style決定程式中,大括弧{ }的擺放方式。
indent=space決定程式中,每個縮排為設定數值的倍數。
indent-switches讓程式中 switch-case 的 case statements 對齊
suffix=none不保留格式化前的文件
利用此行指令設定程式風格,就可以自動格式化。
diff_in_second()
,以及main函數。main()
函數分為五大部份#include IMPL
參考gcc –help得知此為gcc -DIMLP=""的macro。觀察make file發現, gcc -Dname=definition 的東西會被翻譯成 #define name definition。
其中$@為Makefule裡的target,這邊可以替換phonebook_orig 或 phonebook_opt。
defined(__GNUC__)
計算時間差的subfunction。
diff_in_second
判斷編譯器是不是GCC。
觀察free(pHead),發現似乎只有free掉部份memory。
利用valgrind檢查 phonebook_orig:
$ valgrind --leak-check=full ./phonebook_orig
結果:
==3708== HEAP SUMMARY:
==3708== in use at exit: 47,586,264 bytes in 349,899 blocks
==3708== total heap usage: 349,906 allocs, 7 frees, 47,596,856 bytes allocated
可以發現malloc 349,906次卻只free掉7次,嘗試修改 main.c 中釋放記憶體的部份:
entry *tmp;
while ((tmp = pHead) != NULL) {
pHead = pHead->pNext;
free(tmp);
}
重新檢查:
==3961== HEAP SUMMARY:
==3961== in use at exit: 0 bytes in 0 blocks
==3961== total heap usage: 349,906 allocs, 349,906 frees, 47,596,856 bytes allocated
==3961==
==3961== All heap blocks were freed -- no leaks are possible
$ perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig
size of entry : 136 bytes
execution time of append() : 0.047000 sec
execution time of findName() : 0.005270 sec
Performance counter stats for './phonebook_orig' (100 runs):
4,774,624 cache-misses # 93.941 % of all cache refs ( +- 0.13% )
5,082,555 cache-references ( +- 0.20% )
261,003,454 instructions # 1.21 insns per cycle ( +- 0.02% )
215,517,588 cycles ( +- 0.28% )
0.059957913 seconds time elapsed ( +- 0.33% )
typedef struct __PHONE_BOOK_ENTRY {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} DetailEntry;
typedef struct __LAST_NAME_ENTRY{
char lastName[MAX_LAST_NAME_SIZE];
DetailEntry *detail;
struct __LAST_NAME_ENTRY *pNext;
} entry;
size of entry : 32 bytes
execution time of append() : 0.041626 sec
execution time of findName() : 0.001664 sec
Performance counter stats for './phonebook_opt' (100 runs):
1,632,174 cache-misses # 70.076 % of all cache refs ( +- 0.31% )
2,329,162 cache-references ( +- 0.47% )
244,053,016 instructions # 1.79 insns per cycle ( +- 0.02% )
136,425,581 cycles ( +- 0.75% )
0.038485186 seconds time elapsed ( +- 0.79% )
BKDR hash function
:
unsigned int BKDRHash(char *str)
{
unsigned int seed = 131;
unsigned int hash = 0;
while(*str)
hash = hash * seed + (*str++);
return (hash % MAX_TABLE_SIZE);
}
而當碰撞發生時,即透過鏈結(Chaining)碰撞解決方法。
(可直接套用原先append()以及findName()的 linked-list 操作)
[ source ]
size of entry : 32 bytes
execution time of append() : 0.052022 sec
execution time of findName() : 0.000002 sec
Performance counter stats for './phonebook_opt' (100 runs):
586,290 cache-misses # 39.869 % of all cache refs ( +- 0.42% )
1,470,529 cache-references ( +- 1.20% )
233,415,491 instructions # 1.69 insns per cycle ( +- 0.02% )
137,911,226 cycles ( +- 0.73% )
0.038492741 seconds time elapsed ( +- 0.92% )
strcasecmp()
的回傳值為依據。strcasecmp()
的結果:The strcasecmp() function performs a byte-by-byte comparison of the
strings s1 and s2, ignoring the case of the characters. It returns
an integer less than, equal to, or greater than zero if s1 is found,
respectively, to be less than, to match, or be greater than s2.
int Length(entry *head)
{
int count = 0;
entry *current = head;
while (current != NULL) {
count++;
current = current -> pNext;
}
return count;
}
treeNode *BuildBST(entry **headRef, int length)
{
if(length <= 0)
return NULL;
treeNode *left = BuildBST(headRef, length/2);
treeNode *root = (treeNode *) malloc(sizeof(treeNode));
root->entry = *headRef;
root->left = left;
*headRef = (*headRef)->pNext;
root->right = BuildBST(headRef, length - (length / 2 + 1));
return root;
}
typedef struct __TREE_NODE {
entry *entry;
struct __TREE_NODE *right;
struct __TREE_NODE *left;
} treeNode;
size of entry : 32 bytes
execution time of append() : 0.058509 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_bst' (100 runs):
1,603,989 cache-misses # 73.761 % of all cache refs ( +- 0.15% )
2,174,566 cache-references ( +- 0.33% )
283,643,852 instructions # 1.85 insns per cycle ( +- 0.02% )
153,239,845 cycles ( +- 0.31% )
0.042808917 seconds time elapsed ( +- 0.62% )
CPU Cashe中最小的儲存單位
主流CPU Cache Line大小為 64 bytes
若能將每個 BST node 的大小剛好等於 cache line 大小或其一半,或許能降低一個 node 需要填充兩個 cache line 造成的呼叫 cache 的時間。
修改之BST節點為 64 bytes (1 cache line)
typedef struct __TREE_NODE {
entry *entry[6];
struct __TREE_NODE *right;
struct __TREE_NODE *left;
} treeNode;
append()
的部份更加有效率。對於此微小的差異感到存疑…ChiuYiTang
strcasecmp() — Compare Strings without Case Sensitivity
常见数据结构(一)-栈,队列,堆,哈希表
什么是Cache Line