# linux2021: ekangmonyet
###### tags: `linux2021`
> `ekangmonyet@posteo.net` **(correction on 9th July)**
## 題目 $\alpha-1$
An exact same implementation is defined (without using macro) in [include/linux/bitops.h](https://elixir.bootlin.com/linux/latest/source/include/linux/bitops.h#L86).
```c
/**
* rol16 - rotate a 16-bit value left
* @word: value to rotate
* @shift: bits to roll
*/
static inline __u16 rol16(__u16 word, unsigned int shift)
{
return (word << (shift & 15)) | (word >> ((-shift) & 15));
}
```
It is [mainly used](https://elixir.bootlin.com/linux/latest/A/ident/rol32) in cryptography-related functions. One example is SHA-1: (pseudocode excerpt from [Wikipedia](https://en.wikipedia.org/wiki/SHA-1))
```
Message schedule: extend the sixteen 32-bit words into eighty 32-bit words:
for i from 16 to 79
Note 3: SHA-0 differs by not having this leftrotate.
w[i] = (w[i-3] xor w[i-8] xor w[i-14] xor w[i-16]) leftrotate 1
```
Corresponding implementation in [lib/sha1.c](https://elixir.bootlin.com/linux/latest/source/lib/sha1.c#L53):
```c
#define SHA_MIX(t) rol32(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1)
```
## 題目 $\beta-1$
The binary representation of $m2^n (m, n \in N)$ will have a suffix of $n$ consecutive zeros, e.g. $11111000_2$, and $11100000_2$ are all integer multiples of $1000_2$.
~$(2^n - 1)$ is a mask that can be used to clear the trailing $n$ bits. By AND-ing it with anoher integer $x$, we can find an integer $y$ satisfying $y = m2^n, y <= x$, i.e. rounding down $x$ to a lower integer multiple of $2^n$. We can change the behaviour to round up by adding $2^n-1$ to $x$ first,.
## 題目 $\gamma-1$
After experimenting with some values, a value $NNN=12$ is found to satisfy the output length of $41925$.
```c=
for (int i = 0; i < 12; i++) {
fork();
printf("-");
fflush(stdout);
}
```
If we flush the buffer right after every `printf()`, we will get the output length of $8190$, which is an expected behaviour as $2+4+8+...+2^{12} = 8190$. The relationship between $8190$ and $41925$ is still unknown. A small experiment is then carried out:
```c=
fork();
printf("-");
fork();
printf("+");
fflush(stdout);
```
Executing the above code will print a total of 4 `-` **(instead of 2)** and 4 `+`. After researching the behaviour of `fork()` and `printf()`, I now know that `fork()` will make a copy of the memory, including the buffer of `printf()`, which will still contains 2 `-` before forking as it is not flushed automatically due to the lack of `\n`. This observation agrees with our previous experiment. The state of the buffer is visualized as follow:
```graphviz
digraph {
node[shape=record]
init1 [label="-"]
fork1 [label="<l>-|<r>+"]
fork2 [label="<l>-|<r>+"]
pfork1 [shape="plaintext" label="fork()"]
pf1 [shape="plaintext" label="printf(\"+\")"]
init1->pfork1
pfork1->fork1:l
pfork1->fork2:l
pf1->fork1:r [color=red]
pf1->fork2:r [color=red]
init2 [label="-"]
fork3 [label="<l>-|<r>+"]
fork4 [label="<l>-|<r>+"]
pfork2 [shape="plaintext" label="fork()"]
init2->pfork2
pfork2->fork3:l
pfork2->fork4:l
pf1->fork3:r [color=red]
pf1->fork4:r [color=red]
root [label=""]
pfork3 [shape="plaintext" label="fork()"]
root->pfork3
pfork3->init1
pfork3->init2
pf3 [shape="plaintext" label="printf(\"-\")"]
pf3->init1 [color=red]
pf3->init2 [color=red]
stdout [shape="plaintext" label="-+-+-+-+" fontname="Courier"]
flush [shape="plaintext" label="fflush(stdout)"]
fork1->flush
fork2->flush
fork3->flush
fork4->flush
flush->stdout
}
```
With these observations, we can calculate the number of iteration needed to get $41952$ using a simple algorithm:
```c=
int iter = 0, wc = 0;
while (wc < 41952) {
wc *= 2; // simulate fork()
wc += 1 << (iter + 1); // there will be 2^i childs doing printf()
iter++;
}
```
## 題目 $\epsilon-1$
A visualization of a simple `mpool` with 4 cells in a page:
```c=
pool[o] = (int *) &pool[o + (sz / sizeof(void *))];
```
```graphviz
digraph {
node[shape=record fontname=Courier]
hs
page16 [label="<0> free | <1> free | <2> free | <3> NUL "]
hs:e->page16:0:w
page16:0:n -> page16:1:n
page16:1:n -> page16:2:n
page16:2:n -> page16:3:n
}
```
Each cell stores a pointer to the next cell after initialization. The `hs` member of `mpool` points to the first available cell. When the cell is being allocated, `hs` will updated with the value of the free cell it is pointing to, i.e. the pointer to the next free cell:
```c=
mp->hs[i] = *cur;
```
```graphviz
digraph {
node[shape=record fontname=Courier]
hs
page16 [label="<0> used | <1> free | <2> free | <3> NUL "]
hs:e->page16:1:w [color=red]
page16:1:n -> page16:2:n
page16:2:n -> page16:3:n
}
```
When the cell is being returned/repooled, we first store the `hs` pointer to the cell being returned. _For the sake of illustration, assume the second cell is also allocated before returning._
```c=
*ip = mp->hs[i];
```
```graphviz
digraph {
node[shape=record fontname=Courier]
hs
page16 [label="<0> FREE | <1> used | <2> free | <3> NUL "]
hs:e->page16:2:w
page16:0:n -> page16:2:n [color=red]
page16:2:n -> page16:3:n
}
```
Then update `hs` to point to the returning cell.
```c=
mp->hs[i] = ip;
```
```graphviz
digraph {
node[shape=record fontname=Courier]
hs
page16 [label="<0> free | <1> used | <2> free | <3> NUL "]
hs:e->page16:0:w [color=red]
page16:0:n -> page16:2:n
page16:2:n -> page16:3:n
}
```
Thus the properties of the free list is maintained. It resembles the `push_head()` operation on a linked list.
The pointer stored in the last cell acts as a sentinel node to indicate the end of a page. When a cell is requested and `hs` points to `NULL`, a new page will be allocated.
The `queue.c` implementation stores multiple pools of different cell sizes, each is stored as individual free list as illustrated above, in the `hs` array.
As the cell size of each list is twice the previous one, we can always safely repool to `hs[0]`. The performance impact of this is not yet investigated.
## 題目 $\delta-2$ (WIP)
### Progress 2: Working Test-implementation using liburcu
https://github.com/ekangmonyet/learnurcu
- Attempt to reinvent the API provided by liburcu
### Progress 1: RCU Lockfree Queue Study
https://github.com/urcu/userspace-rcu/blob/85be4e352b8bf274704acd4d57165552cba070c7/include/urcu/static/rculfqueue.h
#### Primitives
1. `rcu_deref`
2. `atomic_cmpxchg`
#### Enqueue
1. `tail = rcu_deref(q->tail)` safely get tail
2. `cmpxchg(&tail->next, NULL, node)` try to swap node into tail node
2.1 **success:** `cmpxchg(&q->tail, tail, node)` swap node into queue tail. _Quote: `another enqueue might beat us to it, that's fine`, (why?_
2.2 **fail:** cmpxchg(&q->tail, tail, next) ... , goto 1
### Initial Commit
Idea from [CppCon 2014: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades), Part II"](https://www.youtube.com/watch?v=CmxkPChOcvw)
**Apparently this approach is wrong**
```c
int con_push(con_queue_t *restrict queue, void *restrict new_element)
{
/* Prepare new node */
node_t *node = _con_node_init(new_element);
if (!node)
return Q_ERROR;
// queue->last is an atomic pointer to a node_t
// make a copy of the pointer to the current last node
node_t *last = atomic_acquire(queue->last);
while (!atomic_compare_exchange_weak(
queue->last, &last, node)) {
// busy spin?
};
// can only reach here if we have safely pushed,
// we have a safe pointer that can be updated without race
last->next = node;
return Q_OK;
}
void *con_pop(con_queue_t *queue)
{
// queue->first is also an atomic ptr
node_t *node, *new_header;
do {
node = atomic_acquire(queue->first); /* Node to be removed */
new_header = node->next; /* become the first in the queue */
/* Queue is empty */
if (!new_header) {
return NULL;
}
}
while (!atomic_compare_exchange_weak(
queue->first, &node, new_header));
/* Queue not empty: retrieve data and rewire */
void *return_value = new_header->value; // BBB
/* Free removed node and return */
free(node);
return return_value;
}
```