# 2020q3 Homework1 (lab0)
contributed by < `blueskyson` >
###### tags: `linux2020`
## 環境
在 VirtualBox 中安裝 Ubuntu 20.04.1
```
$ uname -a
Linux vm1 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
```
## 實作 queue.h
### queue_t
- 在此結構加入 `list_ele_t *tail` ,使 `bool q_insert_tail()` 時間複雜度 $O(1)$
- 加入 `unsigned size` 儲存佇列長度,使 `int q_size(queue_t *q)` 時間複雜度 $O(1)$
```cpp=
typedef struct {
list_ele_t *head;
list_ele_t *tail;
unsigned size;
} queue_t;
```
## 實作 queue.c
### queue_t *q_new()
- 第 4 行先確認記憶體是否分配成功,若否就直接回傳 `NULL`
- 接下來初始化 `head` 以及 `tail` 為 `NULL`
```cpp=
queue_t *q_new()
{
queue_t *q = malloc(sizeof(queue_t));
if (!q)
return NULL;
q->head = NULL;
q->tail = NULL;
q->size = 0;
return q;
}
```
### void q_free(queue_t *q)
- 第 3 行先確認傳入的佇列是否為 NULL,若為 NULL 則直接跳出函式
- 接下來,當 `q->head` 不是 NULL 時,先用指標 `*del` 指向 `q->head` ,再把 `q->head` 指向 `q->head->next` ,然後釋放 `*del`
- 最後釋放 `q`
```cpp=
void q_free(queue_t *q)
{
if(!q)
return;
free(q);
while (q->head) {
list_ele_t *del = q->head;
q->head = q->head->next;
free(del->value);
free(del);
}
free(q);
}
```
### bool q_insert_head(queue_t *q, char *s)
- 第 3 行確認傳入的佇列是否為 NULL,若為 NULL 則直接跳出函式
- 第 6 行確認新元素 `newh` 記憶體是否配置成功
- 第 10 行確認 `*val` 的記憶體是否配置成功,若沒有配置成功,要記得把先前配置的 `newh` 釋放
- 然後使用 `memcpy()` 將傳入的字串複製進到 `val` 並讓 `newh->value` 指向 `val` ,這裡要注意記得要預留 `'\0'` 的位置
- 最後把原來的 `q->head` 接在 `newh` 後面,並把 `newh` 設為新的 `q->head`
```cpp=
bool q_insert_head(queue_t *q, char *s)
{
if (!q)
return false;
list_ele_t *newh = malloc(sizeof(list_ele_t));
if (!newh)
return false;
unsigned len = strlen(s) + 1;
char *val = malloc(sizeof(char) * len);
if (!val) {
free(newh);
return false;
}
memcpy(val, s, len);
newh->value = val;
newh->next = q->head;
q->head = newh;
if (q->size == 0)
q->tail = newh;
(q->size)++;
return true;
}
```
### bool q_insert_tail(queue_t *q, char *s)
- 前面的動作跟 `q_insert_head()` 差不多,接下來先判斷佇列 `q` 否為空,若為空,則把 `q->head` 及 `q->tail` 設為 `newt` ;若不為空,則把 `newt` 接在原本的 `q->tail` 後面,並將其設為 `q->tail`
```cpp=
bool q_insert_tail(queue_t *q, char *s)
{
if (q == NULL)
return false;
list_ele_t *newt = malloc(sizeof(list_ele_t));
if (!newt)
return false;
unsigned len = strlen(s) + 1;
char *val = malloc(sizeof(char) * len);
if (!val) {
free(newt);
return false;
}
memcpy(val, s, len);
newt->value = val;
newt->next = NULL;
if (q->size == 0) {
q->head = newt;
q->tail = newt;
} else {
q->tail->next = newt;
q->tail = newt;
}
(q->size)++;
return true;
}
```
### bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
- 首先確認 `q` 是否為 `NULL` 以及佇列是否為空,然後確認佇列長度是否為 1 ,如果是,就把 `q->tail` 指向 `NULL`
- 將原本的 `q->head` 指向 `q->head->next`
- 接下來把字串複製到 `sp` ,如果字串長度大於 `bufsize` ,則只複製 `bufsize` 個字元,並且把最後一個字元改為 `\0`
- 最後將目標元素刪除,`q->size` 減 1
```cpp=
bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
{
if (!q || !(q->head))
return false;
if (q->size == 1)
q->tail = NULL;
list_ele_t *del;
del = q->head;
q->head = q->head->next;
if (sp) {
if (strlen(del->value) > bufsize) {
memcpy(sp, del->value, bufsize);
sp[bufsize - 1] = '\0';
} else {
memcpy(sp, del->value, sizeof(char) * (strlen(del->value) + 1));
}
}
free(del->value);
free(del);
(q->size)--;
return true;
}
```
### int q_size(queue_t *q)
```cpp=
int q_size(queue_t *q)
{
if (!q)
return 0;
return q->size;
}
```
### void q_reverse(queue_t *q)
- `*prev` `*curr` `*next` 代表相鄰的三個節點。每當 `curr` 非 NULL,先將 `next` 指向 `curr->next` ,然後將 `curr->next` 指向 `prev` ,把三個節點往前平移,持續前述動作直到 `curr` 為 NULL,達到反轉 linked list 的效果。圖示如下:
```graphviz
digraph g1{
rankdir=LR;
node [shape=record];
1 [label = "{<data>1 |<ref>}"];
2 [label = "{<data>2 |<ref>}"];
3 [label = "{<data>3 |<ref>}"];
4 [label = "{<data>4 |<ref>}"];
5 [label = "{<data>... |<ref>}"];
1:ref -> 2:data [color=black];
2:ref -> 3:data [color=black];
3:ref -> 4:data [color=black];
4:ref -> 5:data [color=black];
prev->1;
next->3;
curr->2;
2:ref -> 1:data [color=red]
}
```
```graphviz
digraph g2{
rankdir=LR;
node [shape=record];
1 [label = "{<data>1 |<ref>}"];
2 [label = "{<data>2 |<ref>}"];
3 [label = "{<data>3 |<ref>}"];
4 [label = "{<data>4 |<ref>}"];
5 [label = "{<data>... |<ref>}"];
2:ref -> 1:data [color=black];
3:ref -> 4:data [color=black];
4:ref -> 5:data [color=black];
prev->2;
next->4;
curr->3;
3:ref -> 2:data [color=red]
}
```
```cpp=
void q_reverse(queue_t *q)
{
if (!q || !(q->head))
return;
list_ele_t *prev, *curr, *next;
prev = q->head;
q->tail = q->head;
curr = prev->next;
while (curr) {
next = curr->next;
curr->next = prev;
prev = curr;
curr = next;
}
q->head = prev;
q->tail->next = NULL;
}
```
### void q_sort(queue_t *q)
- 原本很隨便的寫了 bubble sort 交差了事,然而在 trace-15 和 trace-16 會發生 TLE,所以搜尋了其他適合 linked list 的排序法,參考 [Merge Sort for Linked Lists](https://www.geeksforgeeks.org/merge-sort-for-linked-list/) 實作 merge sort
- 由於我需要直接修改整個 linked list 的連接順序,所以需要使用 `q->head` 指標的指標,也就是 `&(q->head)` 操作其儲存的頭節點位址,而非單純修改節點的連接順序
- 首先,宣告指標 `*fast` `*slow` ,來找出佇列的中間節點, `fast` 每次會移動 2 個單位,`slow` 每次會移動 1 個單位,所以當 `fast` 指向 `NULL` 時, `slow` 正好指向中間的節點
- 找到中間的節點之後,把佇列切成兩個分割,左分割的頭是 `*headref` 和 `slow` ,右分割的頭是 `slow->next`,接下來再對其進行分割直到每個分割剩下 1 個或 0 個節點。
- 最後透過 `strcmp()` 比較字串長度及大小,把左分割及右分割的所有元素由小到大組合在一起。
```cpp=
void sort_recur(list_ele_t **headref)
{
list_ele_t *head = *headref;
if (head == NULL || head->next == NULL)
return;
/* find the middle node */
list_ele_t *fast = head->next, *slow = head;
while (fast != NULL) {
fast = fast->next;
if (fast != NULL) {
fast = fast->next;
slow = slow->next;
}
}
/* make partitions */
list_ele_t *right_head = slow->next;
slow->next = NULL;
sort_recur(&head);
sort_recur(&right_head);
list_ele_t *head_tmp;
/* find out the smallest node in 2 partitions
* and set it as the head of combined partition */
if (strcmp(head->value, right_head->value) <= 0) {
*headref = head;
head = head->next;
} else {
*headref = right_head;
right_head = right_head->next;
}
head_tmp = *headref;
/* compare values in 2 partitions and
* reorder them in combined partition */
while (true) {
if (head == NULL) {
head_tmp->next = right_head;
break;
} else if (right_head == NULL) {
head_tmp->next = head;
break;
}
if (strcmp(head->value, right_head->value) <= 0) {
head_tmp->next = head;
head = head->next;
} else {
head_tmp->next = right_head;
right_head = right_head->next;
}
head_tmp = head_tmp->next;
}
}
void q_sort(queue_t *q)
{
if (!q || q->size < 2)
return;
sort_recur(&(q->head));
}
```
## Valgrind 排除 qtest 實作的記憶體錯誤
首先,使用 valgrind leak-check 工具針對每個 trace 檢查是否有 memory leak
### 測試 trace-01-ops.cmd
```bash
$ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd
cmd> # Test of insert_head and remove_head
cmd> option fail 0
cmd> option malloc 0
cmd> new
q = []
cmd> ih gerbil
q = [gerbil]
cmd> ih bear
q = [bear gerbil]
cmd> ih dolphin
q = [dolphin bear gerbil]
cmd> rh dolphin
Removed dolphin from queue
q = [bear gerbil]
cmd> rh bear
Removed bear from queue
q = [gerbil]
cmd> rh gerbil
Removed gerbil from queue
q = []
cmd>
Freeing queue
==8226== 8 bytes in 1 blocks are still reachable in loss record 1 of 3
==8226== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8226== by 0x4A4D50E: strdup (strdup.c:42)
==8226== by 0x10FEF9: linenoiseHistoryAdd (linenoise.c:1236)
==8226== by 0x110A8C: linenoiseHistoryLoad (linenoise.c:1325)
==8226== by 0x10C214: main (qtest.c:769)
==8226==
==8226== 160 bytes in 1 blocks are still reachable in loss record 2 of 3
==8226== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8226== by 0x10FEB9: linenoiseHistoryAdd (linenoise.c:1224)
==8226== by 0x110A8C: linenoiseHistoryLoad (linenoise.c:1325)
==8226== by 0x10C214: main (qtest.c:769)
==8226==
==8226== 321 bytes in 19 blocks are still reachable in loss record 3 of 3
==8226== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8226== by 0x4A4D50E: strdup (strdup.c:42)
==8226== by 0x10FE6D: linenoiseHistoryAdd (linenoise.c:1236)
==8226== by 0x110A8C: linenoiseHistoryLoad (linenoise.c:1325)
==8226== by 0x10C214: main (qtest.c:769)
==8226==
```
從提示訊息可以得知 qtest.c 第 769 行出現問題,觀察程式碼:
` linenoiseHistoryLoad(HISTORY_FILE); /* Load the history at startup */
`
得知 linenoiseHistoryLoad 函式有問題,接下來觀察 linenoise.c 第 1325 行:
`linenoiseHistoryAdd(buf);`
觀察 linenoise.c 第 1236 行:
`linecopy = strdup(line);`
最後追查到 `strdup()` 的原始碼,然而此函式好歹也包含在 C 的函式庫中,應該不會有問題,其原始碼在 [https://code.woboq.org/userspace/glibc/string/strdup.c.html](https://code.woboq.org/userspace/glibc/string/strdup.c.html)
擷取 `__strdup()` 觀察其實作:
```cpp=
/* Duplicate S, returning an identical malloc'd string. */
char *
__strdup (const char *s)
{
size_t len = strlen (s) + 1;
void *new = malloc (len);
if (new == NULL)
return NULL;
return (char *) memcpy (new, s, len);
}
```
觀察函式的內容,確認此函式沒有 memory leak 的疑慮。那問題應該是出在 linenoise.c 第 1236 行: `linecopy = strdup(line);` 我初步猜測變數 linecopy 可能忘記被清理掉了,觀察前後程式碼,於是我在第 1244 行後面補上 `free(linecopy);` 函式,在函式回傳前釋放 `linecopy` ,然後再度編譯執行:
```
$ make
CC linenoise.o
LD qtest
$ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd
==9257== Invalid read of size 1
==9257== at 0x483FED4: strcmp (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==9257== by 0x10FE61: linenoiseHistoryAdd (linenoise.c:1231)
==9257== by 0x110A94: linenoiseHistoryLoad (linenoise.c:1326)
==9257== by 0x10C214: main (qtest.c:769)
==9257== Address 0x4ba1cf0 is 0 bytes inside a block of size 8 free'd
==9257== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==9257== by 0x10FF26: linenoiseHistoryAdd (linenoise.c:1246)
==9257== by 0x110A94: linenoiseHistoryLoad
. . .
```
結果噴出更多記憶體錯誤,於是我把程式碼復原,並且再度思考,發現 ` linenoise.c` 有個全域變數:
```cpp
static char **history = NULL;
```
我從此變數被調用的形式猜測此為儲存 qtest 紀錄的東西,也許是 qtest 結束之後忘記將將其釋放造成 memory leak,所以我尋找可以釋放此變數的函式,發現以下函式可以達到目的:
```cpp=
static void freeHistory(void)
{
if (history) {
int j;
for (j = 0; j < history_len; j++)
free(history[j]);
free(history);
}
}
```
我很開心地在 qtest.c 倒數第三行呼叫 `freeHistory()` ,然而編譯時發生了錯誤:
```
$ make
CC qtest.o
qtest.c: In function ‘main’:
qtest.c:782:5: error: implicit declaration of function ‘freeHistory’ [-Werror=implicit-function-declaration]
782 | freeHistory();
| ^~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile:50: qtest.o] Error 1
```
似乎是 linenoise.h 沒有宣告這個函式造成這個錯誤,於是我在 linenoise.h 補上宣告,然後編譯:
```
$ make
CC qtest.o
In file included from console.h:5,
from qtest.c:34:
linenoise.h:73:13: error: ‘freeHistory’ used but never defined [-Werror]
73 | static void freeHistory(void);
| ^~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile:50: qtest.o] Error 1
```
經過網路爬文,在 [stackoverflow](https://stackoverflow.com/questions/5526461/gcc-warning-function-used-but-not-defined) 發現問題的原因,是因為 static 前綴的函式只能被實作的 .c 檔看到,其他 .c 檔無法呼叫,所以不能在 qtest.c 呼叫它!最後我只好自幹一個跟 `freeHistory()` 功能一樣,但是沒有 static 前綴的函式:
```cpp=
void deleteHistory(void)
{
if (history) {
int j;
for (j = 0; j < history_len; j++)
free(history[j]);
free(history);
}
}
```
在 linenoise.h 宣告 `void deleteHistory(void);` 、並且在 qtest.c 倒數第三行呼叫 `deleteHistory();` 之後, trace-01-ops.cmd 終於通過記憶體測試:
```
$ make
CC qtest.o
CC console.o
CC dudect/fixture.o
CC linenoise.o
LD qtest
$ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd
cmd> # Test of insert_head and remove_head
cmd> option fail 0
cmd> option malloc 0
cmd> new
q = []
cmd> ih gerbil
q = [gerbil]
cmd> ih bear
q = [bear gerbil]
cmd> ih dolphin
q = [dolphin bear gerbil]
cmd> rh dolphin
Removed dolphin from queue
q = [bear gerbil]
cmd> rh bear
Removed bear from queue
q = [gerbil]
cmd> rh gerbil
Removed gerbil from queue
q = []
cmd>
Freeing queue
```
### 測式其他 trace
接下來用類似的方法依序測試其他 trace:
```
$ valgrind -q --leak-check=full ./qtest -f traces/trace-02-ops.cmd
$ valgrind -q --leak-check=full ./qtest -f traces/trace-03-ops.cmd
$ valgrind -q --leak-check=full ./qtest -f traces/trace-04-ops.cmd
. . .
```
都沒有再出現記憶體問題,然而在測試 trace-12 時會發生 malloc error、測試 trace-13 時會發生 TLE ,不確定是不是 valgrind 造成的。於是乎我想到使用 Makefile 自帶的 make valgrind:
```
$ make valgrind
# Explicitly disable sanitizer(s)
make clean SANITIZER=0 qtest
make[1]: Entering directory '/home/vm1/lab0-c'
rm -f qtest.o report.o console.o harness.o queue.o random.o dudect/constant.o dudect/fixture.o dudect/ttest.o linenoise.o .qtest.o.d .report.o.d .console.o.d .harness.o.d .queue.o.d .random.o.d .dudect/constant.o.d .dudect/fixture.o.d .dudect/ttest.o.d .linenoise.o.d *~ qtest /tmp/qtest.*
rm -rf .dudect
rm -rf *.dSYM
(cd traces; rm -f *~)
CC qtest.o
CC report.o
CC console.o
CC harness.o
CC queue.o
CC random.o
CC dudect/constant.o
CC dudect/fixture.o
CC dudect/ttest.o
CC linenoise.o
LD qtest
make[1]: Leaving directory '/home/vm1/lab0-c'
cp qtest /tmp/qtest.Bv7bLm
chmod u+x /tmp/qtest.Bv7bLm
sed -i "s/alarm/isnan/g" /tmp/qtest.Bv7bLm
scripts/driver.py -p /tmp/qtest.Bv7bLm --valgrind
--- Trace Points
+++ TESTING trace trace-01-ops:
# Test of insert_head and remove_head
--- trace-01-ops 6/6
+++ TESTING trace trace-02-ops:
# Test of insert_head, insert_tail, and remove_head
--- trace-02-ops 6/6
+++ TESTING trace trace-03-ops:
# Test of insert_head, insert_tail, reverse, and remove_head
--- trace-03-ops 6/6
+++ TESTING trace trace-04-ops:
# Test of insert_head, insert_tail, size, and sort
--- trace-04-ops 6/6
+++ TESTING trace trace-05-ops:
# Test of insert_head, insert_tail, remove_head, reverse, size, and sort
--- trace-05-ops 5/5
+++ TESTING trace trace-06-string:
# Test of truncated strings
--- trace-06-string 6/6
+++ TESTING trace trace-07-robust:
# Test operations on NULL queue
--- trace-07-robust 6/6
+++ TESTING trace trace-08-robust:
# Test operations on empty queue
--- trace-08-robust 6/6
+++ TESTING trace trace-09-robust:
# Test remove_head with NULL argument
--- trace-09-robust 6/6
+++ TESTING trace trace-10-malloc:
# Test of malloc failure on new
--- trace-10-malloc 6/6
+++ TESTING trace trace-11-malloc:
# Test of malloc failure on insert_head
--- trace-11-malloc 6/6
+++ TESTING trace trace-12-malloc:
# Test of malloc failure on insert_tail
--- trace-12-malloc 6/6
+++ TESTING trace trace-13-perf:
# Test performance of insert_tail
--- trace-13-perf 6/6
+++ TESTING trace trace-14-perf:
# Test performance of size
--- trace-14-perf 6/6
+++ TESTING trace trace-15-perf:
# Test performance of insert_tail, size, reverse, and sort
--- trace-15-perf 6/6
+++ TESTING trace trace-16-perf:
# Test performance of sort with random and descending orders
# 10000: all correct sorting algorithms are expected pass
# 50000: sorting algorithms with O(n^2) time complexity are expected failed
# 100000: sorting algorithms with O(nlogn) time complexity are expected pass
--- trace-16-perf 6/6
+++ TESTING trace trace-17-complexity:
# Test if q_insert_tail and q_size is constant time complexity
Probably constant time
Probably constant time
--- trace-17-complexity 5/5
--- TOTAL 100/100
Test with specific case by running command:
scripts/driver.py -p /tmp/qtest.Bv7bLm --valgrind -t <tid>
```
完美通過測試
## Massif 視覺化
### 實驗 1
實驗方法為: 比較 qtest.c 中有無呼叫 `deleteHistory()` 的圖案差別,以 trace-01 為例:
```
$ valgrind --tool=massif ./qtest -f traces/trace-01-ops.cmd
$ massif-visualizer massif.out.*
```
下圖是有呼叫 `deleteHistory()` 的結果,可以看到最後的 Total Memory Heap Consumption 有歸零,代表記憶體有確實歸還
![](https://i.imgur.com/HakghEW.jpg)
下圖是無呼叫 `deleteHistory()` 的結果,可以看到最後的 Total Memory Heap Consumption 尚未歸零,然後程式直接結束,代表記憶體有缺失
![](https://i.imgur.com/zVDmizE.jpg)
### 實驗 2
實驗方法為: 比較 qtest.c 中有無呼叫 deleteHistory() 的圖案差別,以 make test 為例:
```
$ valgrind --tool=massif make test
$ massif-visualizer massif.out.*
```
![](https://i.imgur.com/lyoNjdK.jpg)
![](https://i.imgur.com/IvWk0Op.jpg)
兩張圖的 Total Memory Heap Consumption 都長的差不多
## 研讀 Dudect
### a) Variable-time implementation
首先研讀論文,但是我實在有看沒有懂,就算丟去翻譯成中文,一樣看的不是很懂。就我所理解的,作者第一個實驗測試 Memory comparison ,分成兩組測資,第一組使用均勻分布的 16-byte 字串,命名為 “random class”;另一組測資為完全相同的字串,命名為 “fix class” 。然後觀察這兩種 class 跟一個字串 s 做比較所耗費的 cpu cycle ,並畫出累積分布函數 (cdf) :
![](https://i.imgur.com/nBVBBei.png)
由上圖可以發現紅色所代表的 class 耗用的 cycle 數比較低、藍色較高,可見他們的分佈完全不同,接下來作者對實驗做不同的前處理,然後做多次的測試並比較兩種 class 分布的 t-statistic 的絕對值。當這個值大於 4.5 時,代表這兩種 class 的分布是不一樣的:
![](https://i.imgur.com/287ViOF.png)
很明顯,每次實驗的 |t| statistic 都大於 4.5, “random class”、“fix class” 的 Memory comparison 為不同的分布,他們耗費的 cpu cycle 差太多,出現 timing leakage
接下來,作者對實驗方法做了小幅修改,它降低評估工具的能力,並且讓評估工具不知道字串 s (我無法理解這句 "the evaluator does not know s" 的意思,是指不知道 s 儲存的值,還是連 s 的存在都不知道,還是別的情況...) ,然後將 fixed class 的值變成 0 ,再度進行實驗:
![](https://i.imgur.com/9sqF5Zv.png)
![](https://i.imgur.com/LTaDXTs.png)
經實驗後發現: 兩者耗費 cycle 數的分布變靠近了,然而 |t| statistic 在經過一些前處理之後,仍然無法保持在 4.5 以下,所以兩者分布仍有顯著差異
### b) Constant-time implementation
接下來作者設計了一個預期為 constant time 的函式,這個函式會透過邏輯運算比較字串,並且不會在偵測到不同字元時提早結束
![](https://i.imgur.com/dEgk8KE.png)
![](https://i.imgur.com/BB9krEc.png)
由 |t| statistics 小於 4.5 可以得知兩種 class 的分布是一樣的,無 timing leakage,由此可以推估此函式的複雜度為 $O(1)$ 。
接下來作者以類似的方法分析 AES128 、 Curve25519 on an ARM7TDMI 的 timing leakage ,就不多做贅述了。
### 總結
作者使用 fix class 來測試 corner-case 的執行情況、 random class 測試一般情況,觀察是否有 timing leakage。有時候測資太小,難以觀察 timing leakage ,此時將測試次數增加,增加到百萬倍以上觀察 timing leakage 是否被拉開,若無,則可以得知該程式有很大的可能為 constant time。
然而我在想,如果 corner-case 並不影響程式執行的 cycle ,也就是 corner-case 無法造成 timing leakage,也許這個方法測試 constant time 會失準?
我的理解可能有錯,歡迎各位糾正
### 解釋 dudect 實作
首先我觀察 qtest.c 的 `do_insert_tail()` ,當變數 `simulation` 為 `TRUE` 時,裡面有一個測量時間複雜度的函式 `is_insert_tail_const()` 就會被呼叫,進而啟動量測的程序,該函式在 dudect/fixture.c;在 `is_insert_tail_const()` 裡,會多次呼叫 `result = doit(0);` 如果 `doit()` 判斷 insert_tail 為 constant time ,會回傳 `TRUE` ,然後停止測試。
接下來觀察 `doit()` 的實作:
```cpp=
static bool doit(int mode)
{
int64_t *before_ticks = calloc(number_measurements + 1, sizeof(int64_t));
int64_t *after_ticks = calloc(number_measurements + 1, sizeof(int64_t));
int64_t *exec_times = calloc(number_measurements, sizeof(int64_t));
uint8_t *classes = calloc(number_measurements, sizeof(uint8_t));
uint8_t *input_data =
calloc(number_measurements * chunk_size, sizeof(uint8_t));
if (!before_ticks || !after_ticks || !exec_times || !classes ||
!input_data) {
die();
}
prepare_inputs(input_data, classes);
measure(before_ticks, after_ticks, input_data, mode);
differentiate(exec_times, before_ticks, after_ticks);
update_statistics(exec_times, classes);
bool ret = report();
free(before_ticks);
free(after_ticks);
free(exec_times);
free(classes);
free(input_data);
return ret;
}
```
- 第 15 行 `prepare_inputs(input_data, classes);` 會產生測資跟 classes 的資訊,每一筆測資會對應一種 class (fix 或 random)。
- 第 17 行 `measure(before_ticks, after_ticks, input_data, mode);` 是用來測量 cpu cycle 的,其實作的關鍵部分如下:
```cpp
for (size_t i = drop_size; i < number_measurements - drop_size; i++) {
char *s = get_random_string();
dut_new();
dut_insert_head(
get_random_string(),
*(uint16_t *) (input_data + i * chunk_size) % 10000);
before_ticks[i] = cpucycles();
dut_insert_tail(s, 1);
after_ticks[i] = cpucycles();
dut_free();
}
```
其中 dut_* 看似為函式,其實都是 macro ,其目的應該是為了盡可能減少呼叫函式時的堆疊切換,以提高 cycle 量測的精準度,以 `dut_insert_tail(s, 1);` 為例:
```cpp
#define dut_insert_tail(s,n) do { int j = n; while (j--) q_insert_tail(q, s); } while (0)
```
我們可以看到 `dut_insert_tail(s, 1);` 被 `before_ticks[i] = cpucycles();` 和 `after_ticks[i] = cpucycles();` 夾起來,這就是用來儲存 cpu cycle 的實作方法
再細看 `cpucycle()` 的原始碼:
```cpp
#include <stdint.h>
// http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html
static inline int64_t cpucycles(void)
{
#if defined(__i386__) || defined(__x86_64__)
unsigned int hi, lo;
__asm__ volatile("rdtsc\n\t" : "=a"(lo), "=d"(hi));
return ((int64_t) lo) | (((int64_t) hi) << 32);
#else
#error Unsupported Architecture
#endif
}
```
上面那個 Intel 的網址如右: [Code Execution Times: IA-32/IA-64 Instruction Set Architecture](http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html),參考 Intel 白皮書,我得知:
`"rdtsc"` 的全名為 Read Time-Stamp Counter and Processor ID IA assembly instruction ,是用來讀取 cpu cycle 的指令
`"=a"(lo)` 的 `=` 表示 write-only,`a` 表示暫存器 EAX,`(lo)` 則是指存放到變數 `lo`
`"=d"(hi)` 的 `=` 表示 write-only,`d` 表示暫存器 EDX,`(hi)` 則是指存放到變數 `hi`
最後將 `lo` 和 `high` 放入 `int64_t` 回傳
- 終於,我們知道了量測 cpu cycle 的過程,接下來回到 `doit()` 的第 18 行 `differentiate(exec_times, before_ticks, after_ticks);` 這個函式很單純,就是計算 `after_ticks` 和 `before_ticks` 的差值:
```cpp=
static void differentiate(int64_t *exec_times,
int64_t *before_ticks,
int64_t *after_ticks)
{
for (size_t i = 0; i < number_measurements; i++) {
exec_times[i] = after_ticks[i] - before_ticks[i];
}
}
```
- 再來 `doit()` 第 19 行 `update_statistics(exec_times, classes);` 用來更新 t test 的資料,這些資料會透過 `t_push()` 存放至 `t_ctx *t` :
```cpp
void t_push(t_ctx *ctx, double x, uint8_t class)
{
assert(class == 0 || class == 1);
ctx->n[class]++;
/* Welford method for computing online variance
* in a numerically stable way.
*/
double delta = x - ctx->mean[class];
ctx->mean[class] = ctx->mean[class] + delta / ctx->n[class];
ctx->m2[class] = ctx->m2[class] + delta * (x - ctx->mean[class]);
}
```
- 最後,來到了 `doit()` 第 20 行 `bool ret = report();` 此函式會把 [t-test](https://en.wikipedia.org/wiki/Student%27s_t-test) 的一些參數計算出來:
```cpp=
static bool report(void)
{
double max_t = fabs(t_compute(t));
double number_traces_max_t = t->n[0] + t->n[1];
double max_tau = max_t / sqrt(number_traces_max_t);
printf("\033[A\033[2K");
printf("meas: %7.2lf M, ", (number_traces_max_t / 1e6));
if (number_traces_max_t < enough_measurements) {
printf("not enough measurements (%.0f still to go).\n",
enough_measurements - number_traces_max_t);
return false;
}
/*
* max_t: the t statistic value
* max_tau: a t value normalized by sqrt(number of measurements).
* this way we can compare max_tau taken with different
* number of measurements. This is sort of "distance
* between distributions", independent of number of
* measurements.
* (5/tau)^2: how many measurements we would need to barely
* detect the leak, if present. "barely detect the
* leak" = have a t value greater than 5.
*/
printf("max t: %+7.2f, max tau: %.2e, (5/tau)^2: %.2e.\n", max_t, max_tau,
(double) (5 * 5) / (double) (max_tau * max_tau));
if (max_t > t_threshold_bananas) {
return false;
} else if (max_t > t_threshold_moderate) {
return false;
} else { /* max_t < t_threshold_moderate */
return true;
}
}
```
如果測資不夠多,會直接回傳 `FALSE` ;如果測資夠多,接下來看有沒有 t 值超過 `t_threshold_moderate` (預設為 10) ,如果都沒超過就回傳 `TRUE` ,代表 `do_insert_tail()` 非常有可能為 $O(1)$