# 2017q1 Homework1 (phonebook)
contributed by <`GoblinBear`>
## 開發環境
- 作業系統:Kubuntu 16.04.01 LTS (64bit)
- 硬體
- 查看硬體資訊:`$ lscpu`
```shell
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
每核心執行緒數:2
每通訊端核心數:4
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 94
Model name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
製程: 3
CPU MHz: 808.429
CPU max MHz: 4000.0000
CPU min MHz: 800.0000
BogoMIPS: 6816.61
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 8192K
NUMA node0 CPU(s): 0-7
```
## Git Commit Message
### 重要的七大規則
- 用一行空白行分隔標題與內容
- 限制標題最多只有 50 字元
- 標題開頭要大寫
- 標題不以句點結尾
- 以祈使句撰寫標題
- 內文每行最多 72 字
- 用內文解釋 what 以及 why vs. how
### 參考資料
- [Git commit message 撰寫和改進實例: mini-arm-os](https://hackmd.io/s/HJQk5dt2x)
- [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/)
- [如何寫一個 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/)
- [1.5 開始 - 初次設定Git](https://git-scm.com/book/zh-tw/v1/%E9%96%8B%E5%A7%8B-%E5%88%9D%E6%AC%A1%E8%A8%AD%E5%AE%9AGit)
## 理解 phonebook
- `__builtin___clear_cache((char *) pHead, (char *) pHead + sizeof(entry));`
- 查 gcc 文件:**Built-in Function**: `void __builtin___clear_cache (char *begin, char *end)`
This function is used to flush the processor’s instruction cache for the region of memory between begin inclusive and end exclusive. Some targets require that the instruction cache be flushed, after modifying memory containing code, in order to obtain deterministic behavior.
If the target does not require instruction cache flushes, `__builtin___clear_cache` has no effect. Otherwise either instructions are emitted in-line to clear the instruction cache or a call to the `__clear_cache` function in libgcc is made.
- 清理特定區域的指令快取
- 參考資料:[6.59 Other Built-in Functions Provided by GCC](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)
> 不太懂 begin inclusive and end exclusive 的意思
## 重構 phonebook
- 在做優化之前,先整個重構,改寫成比較容易擴充的架構
- 拿掉條件式編譯,用繼承與多型的概念重構 phonebook 程式碼
### 程式碼說明
- 父類別 `Module`
```clike=
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
#define ModuleMembers() \
void *(*findName)(char lastName[], void *pHead); \
void *(*append)(char lastName[], void *e); \
char output[16];
typedef struct _Module {
ModuleMembers();
} Module;
```
- 利用 C 語言的 preprocessor macro 定義 `Module` 的成員函式和成員變數
- 一開始有想過用最直接的方式
```clike=
struct A {
int x;
int y;
};
struct B {
structA objA;
float f;
char s[10];
};
```
但在子類別中,訪問父類別成員是非常頻繁的,使用上不太方便
```clike=
struct B objB;
objB.objA.x = 10;
((struct A*)&objB)->x = 10;
```
- 通過 preprocessor macro 定義,可以很容易實現和維護這種繼承關係
```clike=
# define STRUCT_A \
intx; \
inty;
struct A {
STRUCT_A;
};
struct B {
STRUCT_A;
float f;
char s[10];
};
```
- 原本的作法和每種優化的實作都會繼承這個類別
- 指標函式`findName`和`append`會在各個子類別中指向各別的實作
- `output`為輸出檔案
- 將`findName()`和`append()`的參數和回傳值的資料型態改成`void*`,因為其中一個優化方式是「減少`entry`空間」,而 linked list 節點的資料結構就不再是`entry`
- 子類別
```clike=
#include "module.h"
typedef struct _Orig {
ModuleMembers();
} Orig;
Orig* create_Orig();
void delete_Orig(Orig* orig);
void *Orig_findName(char lastName[], void *pHead);
void *Orig_append(char lastName[], void *e);
```
- `ModuleMembers()`是繼承父類別`Module`
- 指標函式`findName`指向`Orig_findName()`
- 指標函式`append`指向`Orig_append()`
- `create_Orig()`:建構子
- `delete_Orig()`:解構子
- main.c
- 由命令行參數選擇欲執行的優化方式
```clike=
Orig* orig = create_Orig(); // origin version
Tiny* tiny = create_Tiny(); // reduce entry version
Hash* hash = create_Hash(); // hash version
Module *list[] = { (Module*)orig, (Module*)tiny, (Module*)hash };
num = atoi(argv[1]);
Module *module = list[num];
```
- 用指標函式可以避免`if-else`分支
- 產生一個 linked list 節點
```clike=
void *pHead, *e;
if (num == 1) {
pHead = (tiny_entry *) malloc(sizeof(tiny_entry));
printf("size of tiny_entry : %lu bytes\n", sizeof(tiny_entry));
e = pHead;
((tiny_entry*)e)->pNext = NULL;
}
else {
pHead = (entry *) malloc(sizeof(entry));
printf("size of entry : %lu bytes\n", sizeof(entry));
e = pHead;
((entry*)e)->pNext = NULL;
}
```
- `argv[1]`為 1 時的優化方式是縮小`entry`,這等於改變節點的資料結構
- 會有兩種 linked list 節點的資料結構,所以`pHead`和`e`是`void*`型態,再依`argv[1]`選擇`entry`或`tiny_entry`
- 將資料讀入 linked list
```clike=
e = module->append(line, e);
```
- 每個優化方式的界面都相同
- 搜尋 linked list
```clike=
module->findName(input, e);
```
- 輸出檔案
```clike=
FILE *output;
output = fopen(module->output, "a");
```
- calculate.c
```clike=
Sum* calculate(char* filename)
{
FILE *fp = fopen(filename, "r");
if (!fp) {
printf("ERROR opening input file orig.txt\n");
exit(0);
}
int i = 0;
double orig_sum_a = 0.0, orig_sum_f = 0.0, orig_a, orig_f;
for (i = 0; i < 100; i++) {
if (feof(fp)) {
printf("ERROR: You need 100 datum instead of %d\n", i);
printf("run 'make run' longer to get enough information\n\n");
exit(0);
}
fscanf(fp, "%s %s %lf %lf\n", append, find, &orig_a, &orig_f);
orig_sum_a += orig_a;
orig_sum_f += orig_f;
}
fclose(fp);
Sum* sum = (Sum*)malloc(sizeof(Sum));
sum->sum_a = orig_sum_a;
sum->sum_f = orig_sum_f;
return sum;
}
```
- 計算執行多次的平均時間的小工具
- 原本每多一個優化方式,就要改一次 calculate.c 的程式碼
- 改成每多一個優化方式,只要在命令行參數新增欲計算平均的檔名即可,而不用改程式碼
- 用法:`./calculate <file1> <file2> ...`
- e.g. `./calculate orig.txt tiny.txt`
### 參考資料
- [Inheritance and Polymorphism in C](https://www.codeproject.com/articles/108830/inheritance-and-polymorphism-in-c)
- [高等C語言 - 物件導向](https://shengwen1997.gitbooks.io/program_with_c/content/oop.html)
## 原始版本 (phonebook_orig)
- 執行時間
- `append()`:0.033310246 sec
- `findName()`:0.005050636 sec
- Cache miss
```shell
Performance counter stats for './main 0' (100 runs):
4,466,628 cache-misses # 89.160 % of all cache refs ( +- 0.24% )
5,009,673 cache-references ( +- 0.25% )
262,945,324 instructions # 1.38 insns per cycle ( +- 0.02% )
190,740,704 cycles ( +- 0.40% )
0.051425448 seconds time elapsed ( +- 1.46% )
```
## 優化效能
### 可能的效能改進方向
- 改寫`struct __PHONE_BOOK_ENTRY`的成員,搬動到新的結構中
- 使用 hash function 來加速查詢
- 既然 first name, last name, address 都是合法的英文 (可假設成立),使用字串壓縮的演算法,降低資料表示的成本
- 使用 binary search tree 改寫演算法
### Opt1:改寫資料結構 (phonebook_tiny)
- 減少 entry 空間
- 只會用到`lastName`,所以將其他用不到的結構成員隱藏在`detail`中,因此`entry`的空間由 136 byte 降到 32 (更名為`tiny_entry`)
```clike=
typedef struct __PHONE_BOOK_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} detail;
typedef struct __PHONE_BOOK_TINY_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
detail *d;
struct __PHONE_BOOK_TINY_ENTRY *pNext;
} tiny_entry;
```
- 執行時間
- `append()`:0.027626633 sec
- `findName()`:0.001604628 sec
- Cache miss
```shell
Performance counter stats for './main 1' (100 runs):
1,354,381 cache-misses # 59.092 % of all cache refs ( +- 0.53% )
2,291,976 cache-references ( +- 0.57% )
246,136,478 instructions # 1.99 insns per cycle ( +- 0.02% )
123,493,633 cycles ( +- 0.59% )
0.033825093 seconds time elapsed ( +- 1.66% )
```
- 效能分析
- `append()`和`findName()`都稍微變快
- L1d cache = 32K,`entry`資料變少,cache 可以存更多筆`entry`,因此 cache miss 降低
### Opt2:利用 Hash function 加速查詢 (phonebook_hash)
- 作法
- Hash function:djb2
```clike=
static unsigned long int ht_hash(hashtable_t* hashtable, char* key)
{
unsigned long int hash = 5381;
int c;
while ((c = *key++)) {
hash = ((hash << 5) + hash) + c;
}
return hash % hashtable->size;
}
```
- Table size = 42737,有約35萬個英文單字,為了減少碰撞機會,挑了一了滿大的**質數**
- 執行時間精度:優化後`findName()`執行時間小於 10^-6^ 秒,所以我把`fprintf()`的參數`%lf`改成`%.9lf`
- findName 與 append
```clike=
hashtable_t *hashtable;
void *Hash_findName(char lastName[], void *pHead)
{
return ht_get(hashtable, lastName);
}
void *Hash_append(char lastName[], void *e)
{
ht_set(hashtable, lastName);
return NULL;
}
```
- 將 hash table 建在 phonebook_hash.c 內部,避免修改到 main.c
- 執行時間
- `append()`:0.162833367 sec
- `findName()`:0.000000141 sec
- Cache miss
```shell
Performance counter stats for './main 2' (100 runs):
12,609,290 cache-misses # 65.168 % of all cache refs ( +- 0.43% )
19,349,018 cache-references ( +- 0.29% )
335,036,849 instructions # 0.53 insns per cycle ( +- 0.05% )
630,145,352 cycles ( +- 0.33% )
0.165413841 seconds time elapsed ( +- 0.53% )
```
- 效能分析
- `append()`增至 490%
- `findName()`降至 0.0028%
- 參考資料
- [A quick hashtable implementation in c](https://gist.github.com/tonious/1377667)
- [Hash Functions](http://www.cse.yorku.ca/~oz/hash.html)
- [Programming Small](https://hackmd.io/s/S1rbwmZ6)
### 三種實做比較
- 執行時間
