# 2025q1 Homework1 (lab0)
contributed by < `vicLin8712` >
{%hackmd NrmQUGbRQWemgwPfhzXj6g %}
## Testbed Information
```
$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz
CPU family: 6
Model: 94
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 3
CPU(s) scaling MHz: 94%
CPU max MHz: 3300.0000
CPU min MHz: 800.0000
BogoMIPS: 5399.81
```
## Progression Tracing
以下複製自「[作業要求](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-f)」
- [x] 在 GitHub 上 fork [lab0-c](https://github.com/sysprog21/lab0-c)
- [x] 依據上述指示著手修改 `queue.[ch]` 和連帶的檔案,測試後用 Git 管理各項修改,要能滿足 `$ make test` 自動評分系統的所有項目。
- [ ] Linux 核心原始程式碼的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 並嘗試引入到 `lab0-c` 專案,比較你自己實作的 merge sort 和 Linux 核心程式碼之間效能落差
- [x] 在`qtest` 提供新的命令 `shuffle`,允許藉由 [Fisher–Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) 演算法,對佇列中所有節點進行洗牌 (shuffle) 操作,需要以統計的原理來分析你的實作,探究洗牌的「亂度」
- [x] 在`qtest` 中執行 `option entropy 1` 並搭配 `ih RAND 10` 一類的命令,觀察亂數字串的分布,並運用「資訊與熵互補,資訊就是負熵」的觀念,設計一組新的命令或選項,得以在 `qtest` 切換不同的 PRNG 的實作 (可選定 [Xorshift](https://en.wikipedia.org/wiki/Xorshift)),搭配統計分析工具,比較亂數產生器 (如 Xorshift vs. 內建) 的品質
- [x] 研讀論文〈[Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)〉,解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析,達到驗證時間複雜度,需要解釋 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 及程式實作的原理。
* [ ] 注意:現有實作存在若干致命缺陷,請討論並提出解決方案
* [ ] 解釋 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution)
* [ ] 在 [oreparaz/dudect](https://github.com/oreparaz/dudect) 的程式碼具備 percentile 的處理,但在 [lab0-c](https://github.com/sysprog21/lab0-c) 則無。當你改進後,可提交 pull request
- [ ] 指出現有程式的缺陷或者可改進之處 (例如更全面地使用 Linux 核心風格的鏈結串列 API),嘗試實作並提交 pull request
## Record of `queue.[ch]` Development
### q_free()
#### Confusion
I tried to introduce`list_for_each(pos, head)` in `q_free()`. I don't understand the parameter `pos` used for. So I check [The Linux Kernel API](https://www.kernel.org/doc/html/latest/core-api/kernel-api.html). Obviously, to operate `test_free()` funtion on the node, `list_for_each_safe()` is a better option due to it can remove current node safely. Next, utilize`q_release_element(node)` to release element with list embeded in.
#### Why using `test_free` but not `free` function?
Refer to [N01: lab0](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-b), for the better memory management, `harness.h` define a new structure called `__block_element`, which containing externel member `magic_header` used for memory accessing check. Different from pure `free` function, it can return alarm message when accessing error memory location.
### q_insert_head()/q_insert_tail()
#### Unsafe function
Error message as below shown. [CERN](https://security.web.cern.ch/recommendations/en/codetools/c.shtml) indicates that the function`strcpy()` is an unsafe function。
```
Dangerous function detected in /home/q36131045/lab0-c/queue.c
41: strcpy(new->value, s);
```
#### cppcheck warning: style
Both messages shown below indicate that qualifier`const`should be added brfore `char *s` in the case that `s` isn't manipulated in the function.
```
$ git commit -a
Following files need to be cleaned up:
queue.c
queue.c:51:50: style: Parameter 's' can be declared as pointer to const [constParameterPointer]
bool q_insert_head(struct list_head *head, char *s)
^
queue.c:57:50: style: Parameter 's' can be declared as pointer to const [constParameterPointer]
bool q_insert_tail(struct list_head *head, char *s)
^
nofile:0:0: information: Active checkers: 106/592 (use --checkers-report=<filename> to see details) [checkersReport]
Fail to pass static analysis.
```
Refer to [Inconsistent Cppcheck Version Leads to Differing Default Enabled Checking Flags](https://github.com/sysprog21/lab0-c/issues/153).
Edit `./script/pre-commit.hook` to suppress this warning message.
```diff
@@ -21,7 +21,6 @@ CPPCHECK_suppresses="--inline-suppr harness.c \
- --suppress=constParameterPointer:queue.c\
```
#### Utilize marco and remove redundant code
`q_insert_head() & q_insert_tail()` have extremely similarity. With the application of marco, both function usage can be simplified. Also, add a new parameter, "pos," to determine which position should be inserted.
```diff
@@ -21,7 +21,8 @@
return false; \
} \
strncpy(new->value, s, strlen(s) + 1); \
- list_add(&new->list, to); \
+ !strcmp(pos, "head") ? list_add(&new->list, head) \
+ : list_add_tail(&new->list, head); \
return true;
@@ -66,14 +67,14 @@ void q_free(struct list_head *head)
/* Insert an element at head of queue */
bool q_insert_head(struct list_head *head, char *s)
{
- q_insert_new(head, s);
+ q_insert_new(head, s, "head");
}
```
#### Fix bufsize issue in the marco
Add condiction branch to deal with the overflow problem. Also, the previous version had extreme logical error. That is, `strlen()` only return the length without last ending char, `'\0'`.
```diff
list_del(&cur_elem->list); \
if (cur_elem && sp) { \
size_t temp = strlen(cur_elem->value); \
- strncpy(sp, cur_elem->value, temp + 1); \
- sp[temp + 1] = '\0'; \
+ if (temp > bufsize - 1) \
+ temp = bufsize - 1; \
+ strncpy(sp, cur_elem->value, temp); \
+ sp[temp] = '\0'; \
} \
return cur_elem;
```
### q_delete_mid()
#### Fix mid node position
In previous version, a pointer to a pointer is not necessary. Furthermore, the final node to delete should be the next one node because for loop starts at head.
```diff
if (!head || list_is_singular(head))
return NULL;
- struct list_head **indir = &head, *fast = head;
+ struct list_head *indir = head, *fast = indir->next;
for (; fast->next != head && fast->next->next != head;
fast = fast->next->next)
- indir = &(*indir)->next;
- struct list_head *del = *indir;
+ indir = indir->next;
+ struct list_head *del = indir->next;
list_del(del);
q_release_element(list_entry(del, element_t, list));
```
### q_sort()
#### Complexity issues
In first verion, I use insertion sort to implement. However, this method can't pass complexity test. Use merge sort algorithm instead can reduce time complexity.
#### Duplicate strings order issue in merge sort
```
+++ TESTING trace trace-15-perf:
ERROR: Not stable sort. The duplicate strings "akjks" are not in the same order.
ERROR: Not stable sort. The duplicate strings "akjks" are not in the same order.
--- trace-15-perf 0/6
```
Fix: using greater-than-or-equal-to operator instead.
```diff
- if (((descend * 2) - 1) * strcmp(l->value, r->value) > 0)
+ if (((descend * 2) - 1) * strcmp(l->value, r->value) >= 0)
```
### q_merge()
#### Revise merge function
To apply the `merge` function, which can merge two lists in expected order into a new list. However, in the previous `merge` function, the merging API function didn't consider the original list state. As below showed.
```diff
@@ -206,9 +208,9 @@ void merge(struct list_head *head,
}
}
if (!list_empty(left))
- list_splice_tail(left, head);
+ list_splice_tail_init(left, head);
else
- list_splice_tail(right, head);
+ list_splice_tail_init(right, head);
}
```
Using `list_splice_tail_init` instead ensure the initialization of the head of the list can satisfy the requirement of NULL-queue after merge.
## Studying [lib/list_sort.c ](https://github.com/torvalds/linux/blob/master/lib/list_sort.c)
### `list_sort` problems record
#### `merge` function
* How this function work?
Merge two "null-terminated" linked lists into one "signly" linked list.
* Why using `for(;;)` instead `while(1)`?
Linux Kernel style. Also for cleaner implementation.
#### `merge_final` function
* What is `u8 count = 0;` stand for?
1 bit, equivalent to `uint8_t count = 0;`, defined in [types.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/types.h).
* Unbalanced issue
In the final merge process, marco `unlikely` defined in [compiler.h](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h), which call `cmp` function when count overflow and turn out to be zero.
>Unsolved:
>a. How `cmp` works so that it is called here?
>b. What is the statement "client's cmp() routine can invoke cond_resched() periodically" meaning for?
``` C=79
/*
* If the merge is highly unbalanced (e.g. the input is
* already sorted), this loop may run many iterations.
* Continue callbacks to the client even though no
* element comparison is needed, so the client's cmp()
* routine can invoke cond_resched() periodically.
*/
if (unlikely(!++count))
cmp(priv, b, b);
```
#### `list_sort` function
* Initial state before `do` loop

* First loop `count=0`
## `Shuffle` Command
In order to add new command, `shuffle`, I checked how `qtest.c` executed.
### Progran startup
```c
int main(int argc, char *argv[])
```
Refer to `C11 5.1.2.2.1 program startup`, it constrains that `argc` should be a nonnegative value and `argv[0]` through `argv[argc-1]` are used for program parameters where `argv[0]` is the program name.
### `while` statement and `getopt` function
Next, quite interesting, how does the function `getopt` work, and how does the `switch` statement be used? Why these function used here?
```c
while ((c = getopt(argc, argv, "hv:f:l:")) != -1) {
switch (c) {
...
}
```
Refer to [The GNU C Library](https://www.gnu.org/software/libc/manual/html_node/Using-Getopt.html), which describes the usage of function `getopt`, indicating that this function is used for getting next option arguments from argument list specified by `argc` and `argv` arguments and will return `ASCII` value.
Refer to `C11 6.8.4.2`, The controlling expresion of a `switch` shall have intger type. As return `ASCII` value of function `getopt`, now we can use `switch` with the same option argument.
Now analyze cases one by one.
```c
case 'h':
usage(argv[0]);
break;
```
Where function `usage` is:
```c
static void usage(char *cmd)
{
printf("Usage: %s [-h] [-f IFILE][-v VLEVEL][-l LFILE]\n", cmd);
printf("\t-h Print this information\n");
printf("\t-f IFILE Read commands from IFILE\n");
printf("\t-v VLEVEL Set verbosity level\n");
printf("\t-l LFILE Echo results to LFILE\n");
exit(0);
}
```
Above codes show the usages of each flag.
Next case is the usage of flag `f`, which copy the parameter `optarg` which is the argument value provided by function `getopt`. As the same functionality of case `l`.
```c
case 'f':
strncpy(buf, optarg, BUFSIZE);
buf[BUFSIZE - 1] = '\0';
infile_name = buf;
break;
case 'l':
strncpy(lbuf, optarg, BUFSIZE);
buf[BUFSIZE - 1] = '\0';
logfile_name = lbuf;
break;
```
### `q_init`
Now the program call `q_init` function, which is trying to initialize initial stae of queue. As the below code shown.
```c
static void q_init()
{
fail_count = 0;
INIT_LIST_HEAD(&chain.head);
signal(SIGSEGV, sigsegv_handler);
signal(SIGALRM, sigalrm_handler);
}
```
Briefly,refer to [lab0-c](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-b), `signal` function is used for controlling signal.
> [color=#f45d88]TODO:
> More detailed explanations of `signal`
### `init_cmd`
The function `init_cmd` is in file `console.c`. In this function, a linked list will be created which is used for storing command declare in `init_cmd`.
### Implement `shuffle`
First, I added the `q_shuffle` function, which is used for shuffling elements with a given linked list with the algorithm. Then, I added a `do_shuffle` function to check the program's status and relative variables.
>[Commit a8524fa ](https://github.com/vicLin8712/lab0-c/commit/a8524fac0b4dc6bd527d7a3fa8cf690ab4beb35b)
### test_result

```
Expectation: 41666
Observation: {'1234': 41943, '1243': 41628, '1324': 41683, '1342': 41631,
'1423': 41862, '1432': 41907, '2134': 41342, '2143': 41575, '2314': 41840,
'2341': 41733, '2413': 41479, '2431': 41790, '3124': 41502, '3142': 41631,
'3214': 41713, '3241': 41587, '3412': 41457, '3421': 41733, '4123': 41826,
'4132': 41656, '4213': 41688, '4231': 41685, '4312': 41640, '4321': 41469}
chi square sum: 12.60792972687563
```
## TODO: Enhance Accuracy of Entropy Measurement
## Reading "Dude, is my code constant time?"
### How to determine the time complexity of a piece of the code in this paper
The `dudect` calculates the cycle of piece of code to be measured with two different inputs.
Repeated specific times, the statistical property of each input can be used to determine whether the null hypothesis is rejected. More specifically, the "t-test" method is applied in this paper.
### Student's t-distribution
Student's ***t*** distribution has the PDF given by
\begin{align}
f(t)=\frac{\Gamma \bigg( \frac {v+1}{2}\bigg)}{ \sqrt {v\pi}\Gamma (\frac v2 )}\bigg(1+\frac {t^2}{v}\bigg)^{-(v+1)/2}
\end{align}
where ***$v$*** is the degree of freedom and $t$ is the random variable. Noted that if
$v\to \infty$, $t\sim N(0,1)$.
The usage of this PDF can refer to [student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test). Specifically, in this paper, [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test) is applied,which it the case that unequal variance assumed.
The parameters used in Welch's t-test are defined as below
\begin{align}
t&=\frac {\bar{X_1}-\bar{X_2}}{\sqrt{s^2_{\bar{X_1}}+s^2_{\bar{X_1}}} }\\
s^2_{\bar{X}}&=\frac {s_i}{\sqrt {N_i}}
\end{align}
where $\bar{X_i}$ and $s_{\bar{X_i}}$ are the $i_{th}$ sample mean and its standard error, with $s_i$ is the corrected sample standard deviation and $N_i$ is the sample size.
Next step is to calculate the degree of freedom $v$, which is given by
\begin{align}
v\approx \frac{{\bigg(\frac{s^2_1}{N_1}+\frac{s^2_2}{N_2}\bigg)}^2}{\frac{s^4_1}{N^2_1v_1}+\frac{s^4_2}{N^2_2v_2}}
\end{align}
As long as the degree of freedom $v$ is calculated, the two-tailed $p$ value can be calculated as below.
\begin{align}
p=2\times Pr(T>|t|)=2\times \int^\infty_tf(x)dx
\end{align}
Using significant value $\alpha=0.05$ to test. If $p\gt\alpha$, the null hypothesis fails to reject, implying that both classes of data are probably the same in the first moment.
### Implementaion in `lab0-c`
## Reading "Teach Yourself Programming in Ten Years"
In my past point of view, programming language was just a language used to communicate with computers. I didn't even know the design of programming language or the principle of how it works, and most importantly, I was not interested in digging into it.
This article makes me review my attitude toward this course. More specifically, what attitude does the professor want me to learn in this course?
* Accomplish tasks carefully not just finish tasks in the time
At the beginning of this course, my primary goal was to submit assignments on time. However, it became clear that I couldn’t guarantee the quality of my code under such tight deadlines. What made things worse was that I didn’t even know how to measure code quality or properly utilize the tools provided by the professor.
I realized that it wasn’t feasible to handle such a large workload so quickly. Therefore, I decided to revisit each piece of material provided by the teacher and study it carefully. As the teacher often says: “缺甚麼補甚麼” (fill in what you’re missing). Moving forward, I plan to rebuild my foundation so I’ll be ready for future challenges in the information technology industry.
* Interact with other programmers
In class, Jserv consistently emphasizes active interaction between the students and himself.
* Work on projects after other programmers
In `lab0`, the most valuable project and materials provided by the teacher.
* Work on projects with other programmers
We reviewed and commented on `Lab0` together. This exercise taught me how to inspect other people’s code and ideas.
* The importance of standardization of programming language
This is one of the most valuable lessons from Jserv. It encourages me to consult official standards whenever I encounter technical issues, rather than relying solely on AI tools or casual Internet searches.