# linux internals 2022
## Role of mutex in Fibonacci driver
[gist](https://gist.github.com/cantfindagoodname/11e0348b60354c6a900b71a138471150)
There are no shared resources in algorithm of calculating Fibonacci sequence itself, hence no critical section are present in the algorithm.
Shared resources are needed when we wish to store "Computed result" to not waste CPU cycle on recomputing the values.

There are repeating computation without dynamic programming, For an example, `Fib(1)` are called twice.
We can calculate Fibonacci sequence bottom-up by storing result their result in an array.

The array would be shared, hence we mutex could be used to ensure critical-section problem does not happen. (Mutual exclusion, progress, and bounded waiting)
```c
/* array to be passed as argument */
unsigned long long arr[93] = {[0] = 0, [1 ... 2] = 1, [3 ... 92] = 4};
/* Function Definitions */
unsigned long long fib(int n)
{
if (n < 2)
return n;
else if (n == 2)
return 1;
return fib(n - 1) + fib(n - 2);
}
unsigned long long fib_m(unsigned long long *arr, int n)
{
if (arr[n] != 4)
return arr[n];
return (arr[n-1] = fib_m(arr, n-1)) + (arr[n-2] = fib_m(arr, n-2));
}
```
The program is not as simple, we would compute the fibonacci sequence using Fast doubling method.
Instead of computing the value by traverse through the sequence 1 by 1, we traverse through the sequence bit by bit.
```text
n a b
--------------------------------------------
0 0 1
1 1 1
2 1 2
5 5 8
11 89 144
23 28657 46368
46 1836311903 2971215073
92 7540113804746346429 12200160415121876738
```
This makes the time complexity bound from about $O(2^n)$ to $O(lg\ n)$
```shell
Result of Fib(92)
Fib(92)
Iterative w/ memoization
Time : 6208
Fast Doubling
Time : 551
```
However, when we do multiple calculations memoization is needed
```shell
Result of Fib(0), Fib(1), ... , Fib(92)
Fib(0) - Fib(92)
Iterative w/ memoization
Time : 7470
Fast Doubling
Time : 21223
```
We can see that fast doubling is significantly slower than iterative without avoiding recomputation of known value.
:::info
The performance improvement by memoization is migated by compiler optimization, however, it does not change the fact that memoization with random access array made the computation faster
```shell
Result of Fib(92) w/ -O3 flag
Fib(92)
Iterative w/ memoization
Time : 4690
Fast Doubling
Time : 93
Result of Fib(0), Fib(1), ... , Fib(92) w/ -O3 flag
Fib(0) - Fib(92)
Iterative w/ memoization
Time : 1268
Fast Doubling
Time : 107
```
:::
An attempt to store computed value as a seperate data is made.
As Fast Doubling method of computing Fibonacci number would traverse through the bit pattern of `n`, do `2n` if the bit is `0`, and `2n + 1` if the bit is `1`.
A binary tree is constructed to store the `RHS` of the Fibonacci value. A node is insert to the left child if current bit is `0`, right child if current bit is `1`.
>$$
\begin{split}
F(2k) &= F(k)[2F(k+1) - F(k)] \\
F(2k+1) &= F(k+1)^2+F(k)^2
\end{split}
>$$
The following tree would be constructed after `Fib(11)` is called

When `Fib(10)` is called right after, the program would traverse to `(5)`, then continue constructing the tree while computing Fibonacci number.

There are additional overhead to construct and traverse the tree, which are expected to be $O(d)$, `d` being the number of bits, therefore, we are expected to slow down the program to construct the tree.
```shell
Fib(92)
Iterative w/ memoization
Time : 5983
Fast Doubling
Time : 546
Fast Doubling w/ memoization
Time : 720
```
And even so, we would not the program with memoization will not outperform unless the overhead of constructing and tranversing the tree is smaller than the overhead of multiplication and addition. Which are shown to be false, even if the program should be faster than if its fully implemented thanks to spatial locality.
```shell
Fib(0) - Fib(92)
Iterative w/ memoization
Time : 7823
Fast Doubling
Time : 21300
Fast Doubling w/ memoization
Time : 26259
Fast Doubling w/ memoization2 (After the tree are fully constructed)
Time : 24699
```
Moreover, pointers would be used for the tree data structure. As compiler optimization would have problem with pointer aliasing, we would face severe performance degration when we attempt to use the tree data structure.
```shell
Fib(92)
Iterative w/ memoization
Time : 6029
Fast Doubling
Time : 126
Fast Doubling w/ memoization
Time : 534
Fib(0) - Fib(92)
Iterative w/ memoization
Time : 1512
Fast Doubling
Time : 96
Fast Doubling w/ memoization
Time : 15669
Fast Doubling w/ memoization2
Time : 15140
```
And so, when using fast doubling, instead of trying to make the program faster with memoization, optimizing the multiplication algorithm might be the better approach.
Shared resources are not needed hence mutex locks are very likely not needed too.
:::spoiler
refcnt / RCU
Synchronization mechanisms
locking (semaphores, mutes)
atomics
RCU
Mechanisms for Acquisition / Release Synchronization
---
btree ref
simple counting - no atomics / barriers
protected by same lock
```c
struct _ref {
refcount;
}
remove()
{
ref = ref_count()
ref -= ref_to_drop
}
insert()
{
ref = ref_count()
ref += ref_to_add
}
```
---
kref
protected by atomics (fetch_add / fetch_sub)
```c
struct kref{
refcount
}
init()
{
ref_count_set(1)
}
get()
{
refcount_inc()
}
put()
{
if (dec_and_test)
}
```
---
protected by atomics + barrier
used in networking layer (dst_clone)
```c
clone()
{
atomic_inc()
}
release()
{
barrier()
atomic_dec()
}
```
---
protected by atomics + check + barrier
fget() fput()
```c
fget()
{
rcu_read_lock()
...
if (!inc_not_zero())
rcu_read_unlock()
...
rcu_read_unlock()
}
fput()
{
if (dec_and_test())
call_rcu()
}
```
---
rwlock
writer wait till no reader / vice versa
reader can't coexist with writer
---
seqlock
rwlock but write > read (exact opposite to rwlock)
often used when there are plenty of reads and few writes
priortize writing, readers can wait
additional lock (e.g. mutex ,spinlock) for writers
sequence + 1 when writer acquire lock and release lock (odd, even = state)
reader checks sequence
---
RCU : read-copy-update
often used when there are plenty of reads and few writes
read operations have no locks, atomics, and barrier
write operations would do read, copy then update to pointer.
This would protect the object pointed by the pointer, as updating pointer (1-word wide) is an atomic operation
when reading during write, we would either read the value before update or the value after update, no in-between
:::