contributed by <jkrvivian
>
0140454
git revert
來復原 commit。smaz.txt
。malloc()
是一個開銷很大的 function,未來可提出相對應的改善方案。繼續改進效能與研究:
挑戰進階題>"<
優化前:
避免用圖片表示文字訊息 jserv
用perf找尋熱點:
./phonebook_orig & sudo perf top -p $!
findName()
為熱點,佔了12.91%; append()只佔了1.34%,
但append()的執行時間卻遠遠超過findName()
cache misses 居然到96.047%
把lastName以外的資訊刪掉
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
/*char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5]; */
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
結果:
structure size從136byte縮減為24byte
cache misses
cache missis從原先的96.047%降至70.018%
plot
之前的只有註解掉好遜喔! 所以把資料搬動到新結構中
typedef struct __PHONE_INFO {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} info;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_INFO *information;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
結果:
size of entry : 32 bytes
execution time of append() : 0.038601 sec
execution time of findName() : 0.002589 sec
Performance counter stats for './phonebook_opt' (100 runs):
407,872 cache-misses # 62.990 % of all cache refs ( +- 0.49% )
647,521 cache-references ( +- 0.93% )
246,533,337 instructions # 1.67 insns per cycle ( +- 0.02% )
147,266,812 cycles ( +- 1.02% )
0.050819024 seconds time elapsed ( +- 1.33% )
hash : 一般是一個整數,通過某種算法,可以把一個字符串”壓縮” 成一個整數
感覺這裡用把字符串"壓縮"成一個整數這個解釋好像有點奇怪,應該是透過算法把字符串變成一個index ,這個index在 hash table裡面就會指向這個字串的bucket(可以當作門牌)
衝突:對不同的關鍵字可能得到同一雜湊地址,即k1!=k2, f(k1)=f(k2)
hash table: hash table大小越是質數,mod取餘數就越可能均勻分布在表的各處
unsigned int BKDRHash(char lastName[])
{
unsigned int seed=31;
unsigned int hash=0;
int i=0;
while (lastName[i] != '\0') {
hash = hash * seed + lastName[i];
++i;
}
return hash %= SIZE;
}
size of entry : 24 bytes
execution time of append() : 12.750197 sec
execution time of findName() : 0.000022 sec
Performance counter stats for './phonebook_hash1' (10 runs):
153,348,502 cache-misses # 64.743 % of all cache refs ( +- 0.89% )
236,855,570 cache-references ( +- 0.14% )
2,033,121,783 instructions # 0.06 insns per cycle ( +- 0.24% )
36,137,260,010 cycles ( +- 1.41% )
12.420656874 seconds time elapsed ( +- 0.90% )
推測應該是Hash Table size太小,造成碰撞率很高,所以改成1051
size of entry : 24 bytes
execution time of append() : 3.235966 sec
execution time of findName() : 0.000005 sec
再從維基百科中找一個大一點的質數 試試3571
size of entry : 24 bytes
execution time of append() : 0.575651 sec
execution time of findName() : 0.000000 sec
size of entry : 24 bytes
execution time of append() : 0.083959 sec
execution time of findName() : 0.000001 sec
Performance counter stats for './phonebook_hash1' (100 runs):
865,677 cache-misses # 57.058 % of all cache refs ( +- 0.49% )
1,517,185 cache-references ( +- 1.46% )
348,528,862 instructions # 1.16 insns per cycle ( +- 0.02% )
300,277,793 cycles ( +- 1.04% )
0.100966169 seconds time elapsed ( +- 0.99% )
words.txt
,裡頭的名字跟真實英文姓氏差異很大,很多沒有意思的辭彙(ex:aaaa)該怎麼切入這個問題呢?
在字串壓縮法中,有很多朋友們介紹了不同的字串壓縮法,決定先以最高人氣的SMAZ試驗
int smaz_compress(char *in, int inlen, char *out, int outlen);
int smaz_decompress(char *in, int inlen, char *out, int outlen);
'This is a small string' compressed by 50%
'foobar' compressed by 34%
'the end' compressed by 58%
'not-a-g00d-Exampl333' enlarged by 15%
'1000 numbers 2000 will 10 20 30 compress very little' compressed by 10%
findName()
和append()
entry *findName(char lastname[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastname, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
/* compress string */
smaz_compress(lastName,sizeof(lastName),e->lastName,sizeof(lastName));
e->pNext = NULL;
return e;
}
結果:
size of entry : 32 bytes
execution time of append() : 0.269562 sec
execution time of findName() : 0.003497 sec
Performance counter stats for './phonebook_smaz' (100 runs):
613,100 cache-misses # 56.363 % of all cache refs ( +- 0.57% )
1,087,780 cache-references ( +- 2.52% )
1,143,419,207 instructions # 1.44 insns per cycle ( +- 0.00% )
794,834,236 cycles ( +- 0.61% )
0.264604214 seconds time elapsed
plot
問題
system embedded
HW1-1