contributed by < SimonLiu423
>
$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Common KVM processor
CPU family: 15
Model: 6
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 1
BogoMIPS: 5999.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm cons
tant_tsc nopl xtopology cpuid tsc_known_freq pni cx16 x2apic hypervisor lahf_lm cpuid_fault pti
Virtualization features:
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 16 MiB (4 instances)
L3: 16 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX unsupported
L1tf: Mitigation; PTE Inversion
Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline
Srbds: Not affected
Tsx async abort: Not affected
Initial code:
bool q_insert_head(struct list_head *head, char *s)
{
if (!head)
return false;
/* Create element node with value copied from s */
element_t *node = malloc(sizeof(element_t));
if (!node)
return false;
char *val = malloc((strlen(s) + 1) * sizeof(char));
if (!val) {
free(node);
return false;
}
strcpy(val, s);
node->value = val;
/* Add element to the head of queue */
list_add(node, head);
return true;
}
In the code above, the "Create element" segment will also be used in q_insert_tail()
. To follow DRY principle, I extracted the part to another function.
static inline element_t *create_element(char *s)
{
element_t *node = malloc(sizeof(element_t));
if (!node)
return NULL;
char *val = malloc((strlen(s) + 1) * sizeof(char));
if (!val) {
free(node);
return NULL;
}
strcpy(val, s);
node->value = val;
return node;
}
這個函式使用了 static
和 inline
做宣告,static
可讓該函式只能在該檔案中使用,無法從外部存取,提供封裝性 (Encapsulation);inline
會建議編譯器在編譯時,將函式被呼叫的地方,直接取代成該函式中的程式碼,如此一來可以避免 function call,提升程式碼的效率。
/* 待編輯,還沒看懂 */
首先來看 static
,根據 C99 規格書 6.2.2 Linakges of identifiers,…
inline
: 6.7.4 Function specifiers …
/******************/
完成後執行 git commit -a
可以發現被 pre-commit
擋下來了,理由是 strcpy()
不安全,因為在複製前不會先比較兩個字串 buffer 的長度,可能會存取到預期外的記憶體位址。
解決方法是改用 strncpy
,修改後的程式碼如下
char *val = malloc((strlen(s) + 1) * sizeof(char));
if (!val) {
free(node);
return NULL;
}
- strcpy(val, s);
+ strncpy(val, s, strlen(val));
node->value = val;
return node;
在 jserv 的教材中有提到利用快慢指標找到 list 中點,其中一部分程式碼如下:
node_t *slow = head;
for (node_t *fast = head->next; fast && fast->next; fast = fast->next->next)
slow = slow->next;
node_t *mid = slow->next;
slow->next = NULL;
這段程式碼在 queue size 為偶數時,得出的 mid
會往右多移一個,例如當 size = 4
,mid
會指向 queue 第三個 element 的位址;另外,我認為這段程式碼的可讀性較低,因此改用 do-while 實作。
修改上述兩點後的程式碼: Commit ddf0ada
Commit 18c4e7b
實作方法為,遍歷 list 中每個 pair 的第一個節點,將其從 list 中移除,這時節點所指向的 prev 跟 next 維持不變,再透過 list_add
將 node 插入到下個節點之後。
struct list_head *node = head->next;
while (node != head && node->next != head) {
list_del(node);
list_add(node, node->next);
node = node->next;
}
list_add(node, node->next)
這裡 node->next
並非 list head
,與該函式敘述所預期的參數不一致
/**
* list_add - Insert a node at the beginning of a circular list
* @node: Pointer to the list_head structure to add.
* @head: Pointer to the list_head structure representing the list head.
*
* Adds the specified @node immediately after @head in a circular doubly-linked
* list, effectively placing it at the beginning. The existing first node, if
* any, shifts to follow @node, and the list's circular structure is maintained.
*/
但仔細了解該函式的實作可以發現,完全可以將其視為將 node
插入在 head
之後,而不一定是插在 list
的開頭。
在 torvalds/linux/include/linux/lish.h 中 list_add
的敘述也是說將 new
插入在 head
之後。
/**
* list_add - add a new entry
* @new: new entry to be added
* @head: list head to add it after
*
* Insert a new entry after the specified head.
* This is good for implementing stacks.
*/
針對這問題我有開一個 pull request #225 更改 list_add
的敘述。
以下開發日記將用英文撰寫,上面的有空再改成英文。
The strategy here is to traverse the list and, for every k
nodes, form a local circular list by attaching the first and last node to a temporary head.
Then, use q_reverse()
with the temporary head to reverse the k
elements.
Finally, reattach the reversed nodes to the original list and continue the process for the remaining nodes.
Note that when k = 2
, this operation is equivalent to q_swap()
.
if (k == 2) {
q_swap(head);
return;
}
The goal of these two functions is to remove every node that has a node with a strictly smaller (for q_ascend()
) or larger (for q_descend()
) value anywhere to its right.
In other words, q_ascend()
creates a monotonically non-increasing sequence, while q_descend()
creates a monotonically non-decreasing sequence, both starting from the last element working backward.
The only difference between the two functions is whether the sequence should be non-increasing
or non-decreasing
. To handle this, we can use the same function monotonic_from_right()
, and pass an enumeration value to specify the desired order.
typedef enum _order { NON_DECREASING = 1, NON_INCREASING = -1 } Order;
Implementation for q_ascend()
:
int q_ascend(struct list_head *head)
{
return monotonic_from_right(head, NON_INCREASING);
}
q_descend()
:
int q_descend(struct list_head *head)
{
return monotonic_from_right(head, NON_DECREASING);
}
For implementation of monotonic_from_right()
, see Commit ba136d7
Originally, inspired by jserv's implementation, I created an array of pointers to the nodes in the list and used bottom-up merge sort. However, since the array size is fixed, this approach has limitations. If the length of the queue is too small, it wastes memory; if it's too large, the program can't handle it.
To address this, I considered implementing a vector
in C++ with an exponential growth policy(doubling the capacity upon reallocation). Unfortunately, this assignment forbids the use of malloc
, so I came up with an alternative: maintaining a stack and merging two lists when their lengths are equal. This approach reduces space complexity from O(n)
to O(log(n))
, which is much more efficient.
For more details, see Commit 621bf36.
The approach is simple: move all nodes to the first queue then perform a q_sort()
.
If there's only one queue in the chain, we can return immediately, as the queue is already sorted by definition.
For more details, see Commit d1ef5be.
Timing attacks are a type of side-channel attack used to extract secrets, such as cryptographic keys or passwords, by analyzing execution time. They are relatively easy to perform compared to other side-channel attacks.
Existing detection methods have a major drawback: they require hardware modeling, which is extremely difficult.
This method detects timing leakage by analyzing execution time. It measures execution time for two different input data classes and then checks whether their distributions are statistically different.
Originally, constant-time measurement passed on my local machine but failed on GitHub Actions. After analyzing dudect/fixture
, dudect/constant
and comparing it with operaz/dudect. I found some poorly implemented code that could lead to inaccurate results.
As mentioned earlier, the sequence of classes should be interleaved during evaluation. However, the original code did not do this correctly.
In dudect/constant.c
:
for (size_t i = DROP_SIZE; i < N_MEASURES - DROP_SIZE; i++) {
...
before_ticks[i] = cpucycles();
/* Target function to measure */
after_ticks[i] = cpucycles();
...
}
the loop did not properly discard the first and last DROP_SIZE
measurements. This resulted in biased data being included in the final analysis.
To fix this, I made the following changes:
+ int measure_idx = 0;
+ int64_t a_ticks, b_ticks;
for (size_t i = 0; i < N_MEASURES; i++) {
...
- before_ticks[i] = cpucycles();
+ b_ticks = cpucycles();
/* Target function to measure */
- after_ticks[i] = cpucycles();
+ a_ticks = cpucycles();
...
+ if (i < DROP_SIZE || i >= N_MEASURES - DROP_SIZE)
+ continue;
+ before_ticks[measure_idx] = b_ticks;
+ after_ticks[measure_idx] = a_ticks;
+ measure_idx++;
}
This ensures that the measurements are made even if we're about to discard it.
In operaz's version, multiple tests were performed:
DUDECT_NUMBER_PERCENTILES
testsThe highest t-value among these tests determines whether the target function is constant-time.
In this assignment, only the uncropped test was performed, and the cropping mechanism was missing. So I copied and modified the implementation from operaz/dudect.
Now, I calculate cutoff value using PERCENTILE_CUTOFF = 0.9
, ensuring only values below the cutoff are included in the t-test:
static void prepare_cutoff_value(int64_t *exec_times, double *cutoff_value)
{
qsort(exec_times, VALID_MEASURES, sizeof(int64_t),
(int (*)(const void *, const void *)) cmp);
*cutoff_value = percentile(exec_times, PERCENTILE_CUTOFF, VALID_MEASURES);
return;
}
- /* do a t-test on the execution time */
- t_push(t, difference, classes[i]);
+ if (difference < cutoff_value)
+ /* do a t-test on the execution time */
+ t_push(t, difference, classes[i]);
While modifying the code, I noticed a potential bug in operaz's dudect_main()
:
if (first_time) {
// throw away the first batch of measurements.
// this helps warming things up.
prepare_percentiles(ctx);
} else {
update_statistics(ctx);
ret = report(ctx);
}
This code intends to discard the first batch of measurements to warm up the system. However, it still computes percentiles using that first batch.
I bellieve this is the reason causing this run of github actions to stuck while testing remove_head()
.
Additionally, I think the percentiles should be calculated for each batch rather than depending only on the first batch.
Final Fix:
- if (*cutoff_value == -1) {
- prepare_cutoff_value(exec_times, cutoff_value);
+ if (first_time) {
+ /* Discard first measurement to warm things up */
+ first_time = false;
} else {
- update_statistics(exec_times, classes, *cutoff_value);
+ prepare_cutoff_value(exec_times, &cutoff_value);
+ update_statistics(exec_times, classes, cutoff_value);
+ ret &= report();
}
The queue shuffling function is implemented using the Fisher-Yates shuffle.
See Commit 4a047ee for details.
To verify if the shuffle implementation produces uniform random permutations, I used two methods:
(6 elements shuffled 500400 times)
Since we are shuffling 6 elements, there are a total of 6! = 720 unique permutations.
The permutation index is computed using the following code (reference):
def get_permutation_index(n, perm):
index = 0
for i in range(n):
smaller_count = sum(1 for j in range(i+1, n) if perm[j] < perm[i])
index += smaller_count * math.factorial(n - 1 - i)
return index
Apparently, the result was not uniform. After debugging, I found the issue was due to calling srand(time(NULL))
on every shuffle call.
After removing it, the distribution becomes much more reasonable.
To further validate uniformity, I conducted a Chi-squared test
with null hypothesis: "The shuffle algorithm produces a uniform distribution."
Formula:
Where:
The computed
Checking the p-value for
Thus, we reject the null hypothesis, meaning the shuffle was not uniform.
There are two possible reasons why the shuffle was not uniform:
rand()
Function: The rand()
function used for generating random numbers may not produce uniformly distributed values.Further experiments are needed to identify which of these factors is the primary cause.
Amherst College - Introduction to Computer Science II Lab-06