# 2025q1 Homework1 (lab0) contributed by < `SimonLiu423` > {%hackmd NrmQUGbRQWemgwPfhzXj6g %} ## Development Environment ```shell $ gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 40 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Common KVM processor CPU family: 15 Model: 6 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping: 1 BogoMIPS: 5999.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm cons tant_tsc nopl xtopology cpuid tsc_known_freq pni cx16 x2apic hypervisor lahf_lm cpuid_fault pti Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 128 KiB (4 instances) L2: 16 MiB (4 instances) L3: 16 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: KVM: Mitigation: VMX unsupported L1tf: Mitigation; PTE Inversion Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Meltdown: Mitigation; PTI Mmio stale data: Unknown: No mitigations Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Vulnerable Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline Srbds: Not affected Tsx async abort: Not affected ``` ## Implementation of queue operations ### q_insert_head() Initial code: ```c bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; /* Create element node with value copied from s */ element_t *node = malloc(sizeof(element_t)); if (!node) return false; char *val = malloc((strlen(s) + 1) * sizeof(char)); if (!val) { free(node); return false; } strcpy(val, s); node->value = val; /* Add element to the head of queue */ list_add(node, head); return true; } ``` In the code above, the "Create element" segment will also be used in `q_insert_tail()`. To follow [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) principle, I extracted the part to another function. ```c static inline element_t *create_element(char *s) { element_t *node = malloc(sizeof(element_t)); if (!node) return NULL; char *val = malloc((strlen(s) + 1) * sizeof(char)); if (!val) { free(node); return NULL; } strcpy(val, s); node->value = val; return node; } ``` 這個函式使用了 `static` 和 `inline` 做宣告,`static` 可讓該函式只能在該檔案中使用,無法從外部存取,提供**封裝性 (Encapsulation)**;`inline` 會建議編譯器在編譯時,將函式被呼叫的地方,直接取代成該函式中的程式碼,如此一來可以避免 function call,提升程式碼的效率。 /* 待編輯,還沒看懂 */ 首先來看 `static`,根據 C99 規格書 **6.2.2 Linakges of identifiers**,... `inline`: **6.7.4 Function specifiers** ... /******************/ 完成後執行 `git commit -a` 可以發現被 `pre-commit` 擋下來了,理由是 `strcpy()` 不安全,因為在複製前不會先比較兩個字串 buffer 的長度,可能會存取到預期外的記憶體位址。 解決方法是改用 `strncpy`,修改後的程式碼如下 ```diff char *val = malloc((strlen(s) + 1) * sizeof(char)); if (!val) { free(node); return NULL; } - strcpy(val, s); + strncpy(val, s, strlen(val)); node->value = val; return node; ``` ### q_delete_mid() 在 [jserv 的教材](https://hackmd.io/@sysprog/c-linked-list#Merge-Sort-%E7%9A%84%E5%AF%A6%E4%BD%9C)中有提到利用快慢指標找到 list 中點,其中一部分程式碼如下: ```c node_t *slow = head; for (node_t *fast = head->next; fast && fast->next; fast = fast->next->next) slow = slow->next; node_t *mid = slow->next; slow->next = NULL; ``` 這段程式碼在 queue size 為偶數時,得出的 `mid` 會往右多移一個,例如當 `size = 4`,`mid` 會指向 queue 第三個 element 的位址;另外,我認為這段程式碼的可讀性較低,因此改用 do-while 實作。 修改上述兩點後的程式碼: [Commit ddf0ada](https://github.com/SimonLiu423/lab0-c/commit/ddf0ada5881af469be1cfa5711e3a505b0ec180f) ### q_swap() [Commit 18c4e7b](https://github.com/SimonLiu423/lab0-c/commit/18c4e7bec777223693e64a8e168a2a6e583a9e1a) 實作方法為,遍歷 list 中每個 pair 的第一個節點,將其從 list 中移除,這時節點所指向的 prev 跟 next 維持不變,再透過 `list_add` 將 node 插入到下個節點之後。 ```c struct list_head *node = head->next; while (node != head && node->next != head) { list_del(node); list_add(node, node->next); node = node->next; } ``` `list_add(node, node->next)` 這裡 `node->next` 並非 `list head`,與該函式敘述所預期的參數不一致 ```c /** * list_add - Insert a node at the beginning of a circular list * @node: Pointer to the list_head structure to add. * @head: Pointer to the list_head structure representing the list head. * * Adds the specified @node immediately after @head in a circular doubly-linked * list, effectively placing it at the beginning. The existing first node, if * any, shifts to follow @node, and the list's circular structure is maintained. */ ``` 但仔細了解該函式的實作可以發現,完全可以將其視為將 `node` 插入在 `head` 之後,而不一定是插在 `list` 的開頭。 在 [torvalds/linux/include/linux/lish.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h#L183) 中 `list_add` 的敘述也是說將 `new` 插入在 `head` 之後。 ```c /** * list_add - add a new entry * @new: new entry to be added * @head: list head to add it after * * Insert a new entry after the specified head. * This is good for implementing stacks. */ ``` 針對這問題我有開一個 pull request [#225](https://github.com/sysprog21/lab0-c/pull/225) 更改 `list_add` 的敘述。 --- 以下開發日記將用英文撰寫,上面的有空再改成英文。 --- ### q_reverseK() The strategy here is to traverse the list and, for every `k` nodes, form a local circular list by attaching the first and last node to a temporary head. Then, use `q_reverse()` with the temporary head to reverse the `k` elements. Finally, reattach the reversed nodes to the original list and continue the process for the remaining nodes. Note that when `k = 2`, this operation is equivalent to `q_swap()`. ```c if (k == 2) { q_swap(head); return; } ``` ### q_ascend() & q_descend() The goal of these two functions is to remove every node that has a node with a strictly smaller (for `q_ascend()`) or larger (for `q_descend()`) value anywhere to its right. In other words, `q_ascend()` creates a monotonically non-increasing sequence, while `q_descend()` creates a monotonically non-decreasing sequence, both starting from the last element working backward. The only difference between the two functions is whether the sequence should be `non-increasing` or `non-decreasing`. To handle this, we can use the same function `monotonic_from_right()`, and pass an enumeration value to specify the desired order. ```c typedef enum _order { NON_DECREASING = 1, NON_INCREASING = -1 } Order; ``` Implementation for `q_ascend()`: ```c int q_ascend(struct list_head *head) { return monotonic_from_right(head, NON_INCREASING); } ``` `q_descend()`: ```c int q_descend(struct list_head *head) { return monotonic_from_right(head, NON_DECREASING); } ``` For implementation of `monotonic_from_right()`, see [Commit ba136d7](https://github.com/SimonLiu423/lab0-c/commit/ba136d7b7b76fa906fbb2fbd97d730b0937da7c7) ### q_sort() Originally, inspired by [jserv's implementation](https://hackmd.io/Z-vcRTCjQdKqu4UAyswd5g?view#q_sort), I created an array of pointers to the nodes in the list and used bottom-up merge sort. However, since the array size is fixed, this approach has limitations. If the length of the queue is too small, it wastes memory; if it's too large, the program can't handle it. To address this, I considered implementing a `vector` in C++ with an **exponential growth policy**(doubling the capacity upon reallocation). Unfortunately, this assignment forbids the use of `malloc`, so I came up with an alternative: maintaining a stack and merging two lists when their lengths are equal. This approach reduces space complexity from `O(n)` to `O(log(n))`, which is much more efficient. For more details, see [Commit 621bf36](https://github.com/SimonLiu423/lab0-c/commit/621bf360a2f18977f7b6ab2ffd20970099646629). ### q_merge() The approach is simple: move all nodes to the first queue then perform a `q_sort()`. If there's only one queue in the chain, we can return immediately, as the queue is already sorted by definition. For more details, see [Commit d1ef5be](https://github.com/SimonLiu423/lab0-c/commit/d1ef5becbf314ba771115a2901c5353e6121ece5). ## Dude, is my code constant time? ### About the paper #### Problem Timing attacks are a type of side-channel attack used to extract secrets, such as cryptographic keys or passwords, by analyzing execution time. They are relatively easy to perform compared to other side-channel attacks. Existing detection methods have a major drawback: they require hardware modeling, which is extremely difficult. #### Solution This method detects timing leakage by analyzing execution time. It measures execution time for **two different input data classes** and then checks whether their **distributions** are **statistically** different. ##### Step 1: Measure execution time - Collect two sets of execution time measurements for two different input data classes. - Use *fix-vs-random* tests, where the fixed class may be chosen to trigger edge cases. - Use cycle counters to measure execution time. - To minimize the effect of environmental changes in the results, randomly assign each measurement to a class. - Interleave class sequences during evaluation to prevent interference from concurrent processes. ##### Step 2: Apply post-processing ![image](https://hackmd.io/_uploads/Sylw_jtsyx.png) - Remove outliers by cropping measurements that exceed a fixed threshold. - Measurements may be positively skewed due to artifacts such as OS interruptions or other extraneous activities. - Apply higher-order preprocessing techniques. ##### Step 3 Apply statistical test - Disprove the null hypothesis *"The two timing distributions are equal"*. - Use statistical methods such as: - t-test - Non-parametric test ### Issues Originally, constant-time measurement passed on my local machine but failed on GitHub Actions. After analyzing `dudect/fixture`, `dudect/constant` and comparing it with [operaz/dudect](https://github.com/oreparaz/dudect). I found some poorly implemented code that could lead to inaccurate results. #### 1. Measurement dropping As mentioned earlier, the sequence of classes should be interleaved during evaluation. However, the original code did not do this correctly. In `dudect/constant.c`: ```c for (size_t i = DROP_SIZE; i < N_MEASURES - DROP_SIZE; i++) { ... before_ticks[i] = cpucycles(); /* Target function to measure */ after_ticks[i] = cpucycles(); ... } ``` the loop **did not properly discard** the first and last `DROP_SIZE` measurements. This resulted in biased data being included in the final analysis. To fix this, I made the following changes: ```diff + int measure_idx = 0; + int64_t a_ticks, b_ticks; for (size_t i = 0; i < N_MEASURES; i++) { ... - before_ticks[i] = cpucycles(); + b_ticks = cpucycles(); /* Target function to measure */ - after_ticks[i] = cpucycles(); + a_ticks = cpucycles(); ... + if (i < DROP_SIZE || i >= N_MEASURES - DROP_SIZE) + continue; + before_ticks[measure_idx] = b_ticks; + after_ticks[measure_idx] = a_ticks; + measure_idx++; } ``` This ensures that the measurements are made even if we're about to discard it. #### 2. Cropping In **operaz's version**, multiple tests were performed: - One first order uncropped test, - `DUDECT_NUMBER_PERCENTILES` tests - One second-order test The highest *t-value* among these tests determines whether the target function is constant-time. In this assignment, only the **uncropped test** was performed, and the cropping mechanism was missing. So I copied and modified the implementation from [operaz/dudect](https://github.com/oreparaz/dudect). Now, I calculate **cutoff value** using `PERCENTILE_CUTOFF = 0.9`, ensuring only values **below the cutoff** are included in the t-test: ```c static void prepare_cutoff_value(int64_t *exec_times, double *cutoff_value) { qsort(exec_times, VALID_MEASURES, sizeof(int64_t), (int (*)(const void *, const void *)) cmp); *cutoff_value = percentile(exec_times, PERCENTILE_CUTOFF, VALID_MEASURES); return; } ``` ```diff - /* do a t-test on the execution time */ - t_push(t, difference, classes[i]); + if (difference < cutoff_value) + /* do a t-test on the execution time */ + t_push(t, difference, classes[i]); ``` #### 3. Potential Bug in operaz's dudect_main() While modifying the code, I noticed a potential bug in operaz's `dudect_main()`: ```c if (first_time) { // throw away the first batch of measurements. // this helps warming things up. prepare_percentiles(ctx); } else { update_statistics(ctx); ret = report(ctx); } ``` This code intends to discard the first batch of measurements to warm up the system. However, **it still computes percentiles using that first batch**. I bellieve this is the reason causing this [run](https://github.com/SimonLiu423/lab0-c/actions/runs/13758105737) of github actions to stuck while testing `remove_head()`. Additionally, I think the percentiles should be **calculated for each batch** rather than depending only on the first batch. Final Fix: ```diff - if (*cutoff_value == -1) { - prepare_cutoff_value(exec_times, cutoff_value); + if (first_time) { + /* Discard first measurement to warm things up */ + first_time = false; } else { - update_statistics(exec_times, classes, *cutoff_value); + prepare_cutoff_value(exec_times, &cutoff_value); + update_statistics(exec_times, classes, cutoff_value); + ret &= report(); } ``` ## Shuffle The queue shuffling function is implemented using the [Fisher-Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle). See [Commit 4a047ee](https://github.com/SimonLiu423/lab0-c/commit/4a047ee10a4733454262e31b299e02ba5753b8ab) for details. ### Experiments To verify if the shuffle implementation produces **uniform random permutations**, I used two methods: - Plotting the frequency distribution - [Chi-squared test](https://en.wikipedia.org/wiki/Chi-squared_test). *(**6** elements shuffled **500400** times)* #### 1. Plotting - **X-axis**: Permutation index - **Y-axis**: Frequency of each permutation after shuffling Since we are shuffling ***6 elements***, there are a total of **6! = 720** unique permutations. The permutation index is computed using the following code ([reference](https://www.geekviewpoint.com/python/numbers/permutation_index)): ```python def get_permutation_index(n, perm): index = 0 for i in range(n): smaller_count = sum(1 for j in range(i+1, n) if perm[j] < perm[i]) index += smaller_count * math.factorial(n - 1 - i) return index ``` ![image](https://hackmd.io/_uploads/SJg71JRiJg.png) Apparently, the result was **not uniform**. After debugging, I found the issue was due to calling `srand(time(NULL))` on **every** shuffle call. After removing it, the distribution becomes much more reasonable. ![image](https://hackmd.io/_uploads/B1m2J1AjJg.png) #### 2. Chi-squared test To further validate uniformity, I conducted a `Chi-squared test` with null hypothesis: *"The shuffle algorithm produces a uniform distribution."* **Formula:** $$ X^2 = \sum^{i=1}_n {(H_i - E)^2\over E} $$ Where: - $H_i$ = Observed count of permutation *i* - $E$ = Expected count = ${500400\over 6!}=695$ The computed $X^2$ value was around $684.4$. Checking the p-value for $df=5$, we get $p<0.00001$, which is smaller than $\alpha = 0.05$. Thus, we **reject the null hypothesis**, meaning the shuffle was not uniform. :::info There are two possible reasons why the shuffle was not uniform: 1. **Virtual Machine Environment**: The test was run on a VM, which would introduce environmental factors that affect randomness. 2. **`rand()` Function**: The `rand()` function used for generating random numbers may not produce uniformly distributed values. Further experiments are needed to identify which of these factors is the primary cause. ::: #### Reference [Amherst College - Introduction to Computer Science II Lab-06](https://sfkaplan.people.amherst.edu/courses/2017/spring/COSC-112/assignments/lab-6.pdf) ## Address Sanitizer & Valgrind Using `make SANITIZER=1 test` & `make valgrind` find no error. ## Profiling `simulation` Heap Usage With Massif > Reference [YM Chen](https://hackmd.io/@cymizer/2021q1_lab0), [salmoniscute](https://hackmd.io/@salmonii/linux2025-homework1) [Massif](https://valgrind.org/docs/manual/ms-manual.html) is a heap profiler from Valgrind. It measures how much heap memory our program uses. I've conducted some experiments using Massif to see if there's any issue to `simulation`. The graphs shown below are plotted using `massif-visualizer`, which could be installed by ```shell $ sudo apt install massif-visualizer ``` ### All At first, I try to measure `./qtest -f traces/trace-17-complexity.cmd` by running the following command: ```shell $ valgrind --tool=massif ./qtest -f traces/trace-17-complexity.cmd ``` Result: ![image](https://hackmd.io/_uploads/S1huLmT6ke.png) However, this isn't really informative since 4 different simulation tasks(it, ih, rh, rt) are measured together. So I try to break things down, measure each operations individually. Additionally, the default `--max-snapshots` is `100`, to get better insights, the following experiments set the value to `1000`. ### Insert tail ```shell $ valgrind --tool=massif --max-snapshots=1000 ./qtest -f sim-it.cmd ``` where `sim-it.cmd`: ``` option simulation 1 it option simulation 0 ``` Result: ![image](https://hackmd.io/_uploads/r1wAkLTpJl.png) It shows that memory usage is inconsistent. After looking into `measure` in `dudect/constant.c`: ```c ... dut_insert_head( get_random_string(), *(uint16_t *) (input_data + i * CHUNK_SIZE) % 10000); ... ``` it appears that before performing measurements, a random string would be inserted to the list **0~9999** times, causing different amount of memory allocated across measurements. Now, fix the number to `100` and measure again ```diff ... dut_insert_head( get_random_string(), - *(uint16_t *) (input_data + i * CHUNK_SIZE) % 10000); + 100); ... ``` Result: ![image](https://hackmd.io/_uploads/SyfCuOpTJx.png) The result is still not that consistent yet but better than the previous one. :::warning We could see that the `test_malloc` is responsible for the inconsistent memory usage. However, when the number of random string inserted to the list before measuremnts is set to smaller number such as `50`, the memory usage becomes consistent. Further experiments should be made to confirm the root cause. Result when number is set to `50`: ![image](https://hackmd.io/_uploads/BJ6xjuapyg.png) ::: ### Insert head ```shell $ valgrind --tool=massif --max-snapshots=1000 ./qtest -f sim-ih.cmd ``` where `sim-ih.cmd`: ``` option simulation 1 ih option simulation 0 ``` Result: ![image](https://hackmd.io/_uploads/B1JPn_paJl.png) Result after fixing insert count to `100`: ![image](https://hackmd.io/_uploads/SyWph_6aJl.png) :::warning Similar to Inset tail, when insert count is set to small value, the memory usage becomes consistent. Result for insert count `50`: ![image](https://hackmd.io/_uploads/Hyi-au661e.png) ::: ### Remove head ```shell $ valgrind --tool=massif --max-snapshots=1000 ./qtest -f sim-rh.cmd ``` where `sim-ih.cmd`: ``` option simulation 1 rh option simulation 0 ``` Result: ![image](https://hackmd.io/_uploads/H1yK6O6ayl.png) Result after fixing insert count to `100`: ![image](https://hackmd.io/_uploads/Sye26u6pkl.png) :::warning Similar to Inset tail, when insert count is set to small value, the memory usage becomes consistent. Result for insert count `50`: ![image](https://hackmd.io/_uploads/BJIRpOpp1e.png) ::: ### Remove tail ```shell $ valgrind --tool=massif --max-snapshots=1000 ./qtest -f sim-rt.cmd ``` where `sim-ih.cmd`: ``` option simulation 1 rt option simulation 0 ``` Result: ![image](https://hackmd.io/_uploads/HkZuo4Cpye.png) Result after fixing insert count to `100`: ![image](https://hackmd.io/_uploads/SJ24aECTkg.png) :::warning Similar to Inset tail, when insert count is set to small value, the memory usage becomes consistent. Result for insert count `50`: ![image](https://hackmd.io/_uploads/BJlf2EC6kl.png) :::