# 2024q1 Homework2 (quiz1+2)
contributed by < `YangYeh-PD` >
## Linked List and its Relevant APIs
Consider the structure node definition,
```c
typedef struct __node {
struct __node *left, *right;
struct __node *next;
long value;
} node_t;
```
```graphviz
digraph "queue"
{
rankdir = "LR"
subgraph "cluster node1"
{
node1_1[shape = record, label = "next"];
node1_2[shape = record, label = "value"];
node1_3[shape = record, label = "{<left> left |<right> right}"];
}
subgraph "cluster node2"
{
node2_1[shape = record, label = "next"];
node2_2[shape = record, label = "value"];
node2_3[shape = record, label = "{<left> left |<right> right}"];
}
subgraph "cluster node3"
{
node3_1[shape = record, label = "next"];
node3_2[shape = record, label = "value"];
node3_3[shape = record, label = "{<left> left |<right> right}"];
}
node1_1 -> node2_1;
node2_1 -> node3_1;
}
```
we can construct a linked list by the following APIs.
```c
void list_add(node_t **list, node_t *node)
node_t *list_tail(node_t **left)
int list_length(node_t **left)
node_t *list_construct(node_t *list, int n)
void list_free(node_t **list)
```
### `list_add()`
```c
void list_add(node_t **list, node_t *node)
{
node->next = *list;
*list = node;
}
```
The first parameter is the **indirect pointer** `list` of the head of the linked list, and the second parameter is the pointer of the node we want to add. Since we directly change the `next` of the node points to `*list` (the head of the list), then update the `*list` points to the new node, the function ==adds the node onto the front== of the linked list.
### `list_tail()`
```c
node_t *list_tail(node_t **left)
{
while ((*left) && (*left)->next)
left = &((*left)->next);
return *left;
}
```
Again, we pass the indirect pointer `left` of the linked list as a parameter into the function.
* If the linked list is empty (or `null`), the `while` condition would be `false` and breaks the loop, then return the pointer (`*left`) directly.
* If the linked list is nonempty, the loop will keep executing until the `next` of the node where `*left` points to is empty (or `null`), then return the pointer `*left`.
Base on these rules, the function is to ==find the tail== of the linked list.
### `list_length()`
```c
int list_length(node_t **left)
{
int n = 0;
while (*left) {
++n;
left = &((*left)->next);
}
return n;
}
```
The function again pass the indirect pointer `left` as a parameter. We `++n` and reassign `left` to the pointer of the next node in each literation, then return `n` when the loop breaks.
### `list_construct()`
```c
node_t *list_construct(node_t *list, int n)
{
node_t *node = malloc(sizeof(node_t));
node->next = list;
node->value = n;
return node;
}
```
First of all, we allocate a memory with size of `node_t` saved by the pointer `node`. Then we modify the `next` points to the `list` and assign the value `n`, and return the pointer `node`.
### `list_free()`
```c
void list_free(node_t **list)
{
node_t *node = (*list)->next;
while (*list) {
free(*list);
*list = node;
if (node)
node = node->next;
}
}
```
We pass the indirect pointer `list` as a parameter.
First, we declare the pointer `node` and initialize with the `next` which points to the next node of the head of the linked list. (Since ==we cannot access the memories anymore once we free them==). Then we free the memory of each nodes until the last one.
## Problem `1`: Quicksort (Non-recursive)
### How it works
We can make a non-recursive quicksort algorithm by the following functions.
:::danger
Before making a list of code snip, describe the concepts and considerations!
:::
#### `list_is_ordered()`
```c
static bool list_is_ordered(node_t *list)
{
bool first = true;
int value;
while (list) {
if (first) {
value = list->value;
first = false;
} else {
if (list->value < value)
return false;
value = list->value;
}
list = list->next;
}
return true;
}
```
The function that check ==whether the linked list is in ordered.== As we can see, the literation especially skip the first node of the linked list, which is the special case. We can avoid it by checking whether the current and the next node is `NULL`, and compare two adjacent nodes at once.
```c
static bool list_is_ordered(node_t *list) {
while(list && (list->next)) {
if (list->value > list->next->value) {
return false;
}
list = list->next;
}
return true;
}
```
#### `shuffle()`
```c
void shuffle(int *array, size_t n)
{
if (n <= 0)
return;
for (size_t i = 0; i < n - 1; i++) {
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
```
This function shuffle the array into ==random order==.
:::warning
TODO: Make it computationally better.
:::
#### `quick_sort()`
The following non-recursive quicksort is not same as in [Optimized QuickSort - C Implementation (Non-recursive)](https://alienryderflex.com/quicksort/), since such the implementation needs ==doubly linked list==, which is contradict to our current `node_t` conrfiguration.
1. We first pick the first entry of the list as the `pivot`, and assignment the first and the last entry to `begin[0]` and `end[0]` respectively.
```graphviz
digraph "Linked List" {
rankdir = "LR";
node3[shape = circle, label = "3"];
node5[shape = circle, label = "5"];
node4[shape = circle, label = "4"];
node1[shape = circle, label = "1"];
node2[shape = circle, label = "2"];
null[shape = none, label = "Null"];
p[shape = none, label = "p"];
pivot[shape = none, label = "pivot", fontcolor = "red"];
pivot -> node3;
p -> node5;
node3 -> null;
node5 -> node4;
node4 -> node1;
node1 -> node2;
node2 -> null;
}
```
```c
node_t *L = begin[i], *R = end[i];
if (L != R) {
node_t *pivot = L;
value = pivot->value;
node_t *p = pivot->next;
pivot->next = NULL;
}
```
2. Then we traverse each entry, and add the entries with smaller number to the `left` and larger ones to the `right`.
```graphviz
digraph "Linked List" {
rankdir = "LR";
node3[shape = circle, label = "3"];
node5[shape = circle, label = "5"];
node4[shape = circle, label = "4"];
node1[shape = circle, label = "1"];
node2[shape = circle, label = "2"];
list[shape = none, label = "*list"];
pivot[shape = none, label = "pivot", fontcolor = "red"];
left[shape = none, label = "left", fontcolor = "blue"];
right[shape =none, label = "right", fontcolor = "orange"];
null[shape = none, label = "Null"];
list -> node3;
pivot -> node3;
left -> node2 -> node1 -> null;
right -> node4 -> node5 -> null;
}
```
```c
while (p) {
node_t *n = p;
p = p->next;
list_add(n->value > value ? &right : &left, n);
}
```
3. `begin[]` stores all of the entries for the following iteration, and `end[]` records ==the tail== of each `begin` element.
```graphviz
digraph "Linked List" {
node3[shape = circle, label = "3"];
node5[shape = circle, label = "5"];
node4[shape = circle, label = "4"];
node1[shape = circle, label = "1"];
node2[shape = circle, label = "2"];
begin [shape = none, label=<
<TABLE BORDER="1" CELLBORDER="1" CELLSPACING="0"> <TR>
<TD PORT="a0"> begin[0]</TD>
<TD PORT="a1"> begin[1]</TD>
<TD PORT="a2"> begin[2]</TD>
</TR> </TABLE>>];
end[shape = none, label=<
<TABLE BORDER="1" CELLBORDER="1" CELLSPACING="0"> <TR>
<TD PORT="a0"> end[0]</TD>
<TD PORT="a1"> end[1]</TD>
<TD PORT="a2"> end[2]</TD>
</TR> </TABLE>>];
end:a0 -> 1;
end:a1 -> 3;
end:a2 -> 5;
begin:a0 -> node2 -> node1;
begin:a1 -> node3;
begin:a2 -> node4 -> node5;
}
```
```diff
begin[i] = left;
+ end[i] = list_tail(&begin[i]);
begin[i + 1] = pivot;
end[i + 1] = pivot;
begin[i + 2] = right;
+ end[i + 2] = list_tail(&begin[i + 2]);
```
4. The next iteration begin and continue to `begin[2]` splice the list, until `begin[i] == end[i]`.
```graphviz
digraph "Linked List" {
rankdir = "LR";
node5[shape = circle, label = "5"];
node4[shape = circle, label = "4"];
list[shape = none, label = "*list"];
pivot[shape = none, label = "pivot", fontcolor = "red"];
left[shape = none, label = "left", fontcolor = "blue"];
right[shape =none, label = "right", fontcolor = "orange"];
null[shape = none, label = "Null"];
list -> node4;
pivot -> node4;
left -> null;
right -> node5 -> null;
}
```
```graphviz
digraph "Linked List" {
node3[shape = circle, label = "3"];
node5[shape = circle, label = "5"];
node4[shape = circle, label = "4"];
node1[shape = circle, label = "1"];
node2[shape = circle, label = "2"];
null[shape = none, label = "Null"];
begin [shape = none, label=<
<TABLE BORDER="1" CELLBORDER="1" CELLSPACING="0"> <TR>
<TD PORT="a0"> begin[0]</TD>
<TD PORT="a1"> begin[1]</TD>
<TD PORT="a2"> begin[2]</TD>
<TD PORT="a3"> begin[3]</TD>
<TD PORT="a4"> begin[4]</TD>
</TR> </TABLE>>];
begin:a0 -> node2 -> node1;
begin:a1 -> node3;
begin:a2 -> null;
begin:a3 -> node4;
begin:a4 -> node5;
}
```
At this point, since `begin[1] ~ begin[4]` cannot be divided anymore, we add them to `result`
```c
list_add(&result, L);
```
For `begin[0]`, we continue the iteration and it will add them to `result` in ==ascending order==.
> But `left` and `right` in `node_t` seem useless...
[name=ChenYang Yeh] [time=Thu, Mar 7, 2024 14:43 PM]
### Further Improvements
`left` and `right` in `node_t` seem to be useless, maybe we can make the best used of it.
### Using Linux Kernel List API
### How to Avoid Worst Cases in Quicksort?
## Linux Kernel Style Linked List
In the figure below, in [linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h), the linked list is actually implemented by using ==doubly circular linked list== `list_head`.
```c
struct list_head {
struct list_head *prev;
struct list_head *next;
};
```
Both `prev` and `next` are pointers of the `list_head` type, pointing to the type itself. In this, we can easily observe that from the beginning, the Linux kernel does not specify the data type to be stored, significantly increasing the flexibility of linked lists in the design of the Linux kernel.
```graphviz
digraph "Doubly Linked List" {
node[shape = record];
rankdir = "LR";
node1[label = "{<prev> prev |<next> next}"];
node2[label = "{<prev> prev |<next> next}"];
node3[label = "{<prev> prev |<next> next}"];
node1:next:c -> node2;
node2:prev:c -> node1;
node2:next:c -> node3;
node3:prev:c -> node2;
}
```
If we want to store integer data in the linked list, we only need to include the definition of `list_head` in list.h and then implement an additional struct.
```c
struct item {
int value;
struct list_head list;
};
```
```graphviz
digraph int {
rankdir = "LR";
subgraph "cluster int"
{
int_prev[shape = record, label = "value"];
int_next[shape = record, label = "{<prev> prev |<next> next}"];
}
}
```
To make manipulating this linked list more convenient, the Linux kernel provides corresponding APIs for our use.
```c
#define list_first_entry(ptr, type, field)
#define list_last_entry(ptr, type, field)
#define list_for_each(p, head)
#define list_for_each_safe(p, n, head
void INIT_LIST_HEAD(struct list_head *list)
void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
void list_add(struct list_head *_new, struct list_head *head)
void __list_del(struct list_head *entry)
void list_del(struct list_head *entry)
void list_move(struct list_head *entry, struct list_head *head)
```
### `list_first_entry() / list_last_entry()`
Return the first and the last entry of the linked list.
Since the linked list is circular and doubly, we can simply define the macro as
```c
#define list_first_entry(ptr, type, field) list_entry((ptr)->next, type, field)
#define list_last_entry(ptr, type, field) list_entry((ptr)->prev, type, field)
```
### `list_for_each()`
It use for loop traverses all entries of hlist, but ==cannot remove nodes==.
It would end once `pos == head`.
```c
#define list_for_each(p, head) for (p = (head)->next; p != head; p = p->next)
```
### `list_for_each_safe()`
There have two variables in the for loop.
* pos : points to the current entry.
* n : points to the next entry.
Thus, it allows us to remove the node during traversal.
```c
#define list_for_each_safe(p, n, head) \
for (p = (head)->next, n = p->next; p != (head); p = n, n = p->next)
```
### `INIT_LIST_HEAD()`
Initialized the head of the list.
```c
void INIT_LIST_HEAD(struct list_head *list)
{
list->next = list;
list->prev = list;
}
```
### `__list_add() / list_add()`
Add the node `new` between the previous node `prev` and the next node `next`.
```c
void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
next->prev = new;
new->next = next;
new->prev = prev;
prev->next = new;
}
```
Thus, the function of `list_add` is adding the node `new` to ==the head== of the list `head`.
```c
void list_add(struct list_head *_new, struct list_head *head)
{
__list_add(_new, head, head->next);
}
```
### `__list_del() / list_del()`
Remove the node `entry` from the list.
```c
void __list_del(struct list_head *entry)
{
entry->next->prev = entry->prev;
entry->prev->next = entry->next;
}
```
And `list_del()` initializes `entry` after it is removed from the list.
```c
void list_del(struct list_head *entry)
{
__list_del(entry);
entry->next = entry->prev = NULL;
}
```
### `list_move()`
Move the node `entry` to the new linked list `head`.
```c
void list_move(struct list_head *entry, struct list_head *head)
{
__list_del(entry);
list_add(entry, head);
}
```
So it just divided into 2 steps.
* Remove `entry` from the old list.
* Add it to the new list `head`.
## Problem `2`: Timsort
## Linked List for Hash Table
Consider a following definition of structure
```c
struct hlist_node {
struct hlist_node *next, **pprev;
};
```
```c
struct hlist_head {
struct hlist_node *first;
};
```
where `hlist_head` points to the first element of the hlist.
Since `pprev` is **an indirect pointer** to itself, we may illustrate the graph like this.
```graphviz
digraph G {
rankdir = LR;
splines = false;
node[shape = "record"]
list_head[label = "<m>list_head | <n>first"]
node_1[label = "<m>node1 | {<p>pprev | <n>next}", group = list];
node_2[label = "<m>node2 | {<p>pprev | <n>next}", group = list];
node_3[label = "<m>node3 | {<p>pprev | <n>next}", group = list];
NULL_2[shape = plaintext, label = "NULL", group = list]
list_head -> node_1:m;
node_1:p -> list_head:n;
node_1:n -> node_2:m;
node_2:p -> node_1:n;
node_2:n -> node_3:m;
node_3:p -> node_2:n;
node_3:n -> NULL_2;
}
```
and again, there are several relevant APIs we can use
```c
#define container_of(ptr, type, member)
#define list_entry(ptr, type, member)
#define hlist_for_each(pos, head)
#define hlist_for_each_safe(pos, n, head)
static inline void INIT_HLIST_HEAD(struct hlist_head *h)
static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
static inline void hlist_del(struct hlist_node *n)
```
### `container_of() / list_entry()`
Given a member pointer, structure name and the member name, it would return ==the pointer to the beginning of the structure==.
```c
#define container_of(ptr, type, member) \
((type *) ((char *) (ptr) - (size_t) & (((type *) 0)->member)))
#define list_entry(ptr, type, member) container_of(ptr, type, member)
```
In [ISO/IEC 9899 (p)7.17 3](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf#page=266), C standard library provides a macro `offsetof()` which returns **the offset** of the `member`.
> `offsetof(type, member-designator)` which expands to an integer constant expression that has type **size_t**, the value of which is the offset in bytes, to the structure member, from the beginning of its structure (designated by type).
>
`offsetof()` is defined as
```c
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE*)0)->MEMBER)
```
Thus,
```c
((type *) ((char *) (ptr) - (size_t) & (((type *) 0)->member)))
```
means the address to the head of the `type`.
> What is the meaning of `((TYPE*)0)->MEMBER`? I still cannot understand in [stackoverflow](https://stackoverflow.com/questions/13723422/why-this-0-in-type0-member-in-c).[name=ChenYang Yeh] [time=Sat, Mar 9, 2024 19:03 PM]
>
### `hlist_for_each()`
It use for loop traverses all entries of hlist, but ==cannot remove nodes==.
It would end once `pos == NULL`.
```c
#define hlist_for_each(pos, head) \
for (pos = (head)->first; pos; pos = pos->next)
```
### `hlist_for_each_safe()`
There have two variables in the for loop.
* pos : points to the current entry.
* n : points to the next entry.
Thus, it allows us to remove the node during traversal.
```c
#define hlist_for_each_safe(pos, n, head) \
for (pos = (head)->first; pos && ({ \
n = pos->next; \
true \
}); \
pos = n)
```
Note that `{n = pos->next; true}` is a compound statement, which always return the value of last expression. So it assign `pos->next` to `n`, then return `true`.
### `INIT_HLIST_HEAD()`
Initializing `hlist_head` to `NULL`.
```c
static inline void INIT_HLIST_HEAD(struct hlist_head *h)
{
h->first = NULL;
}
```
### `hlist_add_head()`
It adds node `n` ==at the beginning== of the hlist `h`.
```c
static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
{
if (h->first)
h->first->pprev = &n->next;
n->next = h->first;
n->pprev = &h->first;
h->first = n;
}
```
It first modifies the indirect pointer `pprev` of the first node points to the address of `n->next`, and modify `next` and `pprev` of `n` point to the `h->first` and `&h->first` respectively, then make `h->first` point to `n`.
### `hlist_del()`
Remove the node `n` from the list.
```c
void hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next, **pprev = n->pprev;
*pprev = next;
if (next)
next->pprev = pprev;
}
```
First define `next` as a pointer to the next entry of `n` and `pprev` as an indirect pointer to `n` itself.
To remove the node `n`, just simply change the pointer `pprev` to `next` and the `pprev` of the next node to `pprev`.
## Problem `3`: Building Tree
### How it works
We first define the node of the tree `TreeNode` and the node `order node`.
```c
struct TreeNode {
int val;
struct TreeNode *left, *right;
};
```
```c
struct order_node {
struct hlist_node node;
int val;
int idx;
};
```
All of `order_node` would be added into hash table by the following `node_add()`.
#### `node_add()`
This function add the node `on` into the hash table `heads`.
The `idx` is the index of the `inorder` array, and `hash` is the result of ==hash function==.
$$
h(x) = |x| \mod \textrm{size}.
$$
After $h(x)$ is calculated, `on` would be added into the corresponding bucket.
```c
static inline void node_add(int val,
int idx,
int size,
struct hlist_head *heads)
{
struct order_node *on = malloc(sizeof(*on));
on->val = val;
on->idx = idx;
int hash = (val < 0 ? -val : val) % size;
hlist_add_head(&on->node, DDDD);
}
```
```graphviz
digraph hash_table {
nodesep=.05;
rankdir=LR;
node [shape=record];
head[label = " <h0> in_head[0] | <h1> in_head[1] | <h2> in_head[2] | <h3> in_head[3] | <h4> in_head[4] ",height=3];
in_head[shape = none, label = "in_head"];
node0[label = "9"];
node1[label = "3"];
node2[label = "15"];
node3[label = "20"];
node4[label = "7"];
in_head -> head:h0;
head:h0 -> node3 -> node2;
head:h2 -> node4;
head:h3 -> node1;
head:h4 -> node0;
}
```
#### `find()`
Find the node in hash table, then return its index in inorder array. Return `-1` if it doesn't find the node.
```c
static int find(int num, int size, const struct hlist_head *heads)
{
struct hlist_node *p;
int hash = (num < 0 ? -num : num) % size;
hlist_for_each (p, &heads{hash}) {
struct order_node *on = list_entry(p, struct order_node, node);
if (num == on->val)
return on->idx;
}
return -1;
}
```
#### `dfs()`
`dfs()` constructs a tree ==in recursive manner==.
Both `preorder` and `inorder` are array of integers, which are used to store the preorder and inorder array given by user.
`pre_low`, `pre_high`, `in_low` and `in_high` are used to record ==the range== of preorder and inorder array in each level of recursion, which is quite important in this implementaiton.
```graphviz
digraph lists {
preorder[shape = record, label = " <h0> 3 | <h1> 9 | <h2> 20 | <h3> 15 | <h4> 7 ",height=0.5, width=3];
inorder[shape = record, label = " <h0> 9 | <h1> 3 | <h2> 15 | <h3> 20 | <h4> 7 ",height=0.5, width=3];
pre[shape = none, label = "preorder"];
inr[shape = none, label = "inorder"];
pre_low[shape = none, label = "pre_low", fontcolor="red"];
pre_high[shape = none, label = "pre_high", fontcolor="blue"];
in_low[shape = none, label = "in_low", fontcolor="red"];
in_high[shape = none, label = "in_high", fontcolor="blue"];
pre -> preorder:h0;
inr -> inorder:h0;
pre_low -> preorder:h0;
pre_high -> preorder:h4;
in_low -> inorder:h0;
in_high -> inorder:h4;
}
```
```c
static struct TreeNode *dfs(int *preorder,
int pre_low,
int pre_high,
int *inorder,
int in_low,
int in_high,
struct hlist_head *in_heads,
int size)
```
Once it is called, it first check whether both arrays are out of range, if so, directly return `NULL`.
```c
{
if (in_low > in_high || pre_low > pre_high)
return NULL;
```
Then allocates `sizeof(TreeNode)` memories to `tn`, use `preorder[pre_low]` as the root (or parent) node.
```c
struct TreeNode *tn = malloc(sizeof(*tn));
tn->val = preorder[pre_low];
```
Finally, use `find()` to find the `idx` of the node in `inorder` array, then recursively add left and right childs based on arrays range and `idx`.
```c
int idx = find(preorder[pre_low], size, in_heads);
tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder,
in_low, idx - 1, in_heads, size);
tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder,
idx + 1, in_high, in_heads, size);
return tn;
}
```
The rule of [preorder traversal](https://www.geeksforgeeks.org/tree-traversals-inorder-preorder-and-postorder/) in binary tree visits the current (or parent) node first, then traverse left and right childs.
```c
struct NodeTree *preOrder(struct NodeTree* root)
{
if(!root) {
return;
}
printf("%d ", root->val);
preOrder(root->left);
preOrder(root->right);
}
```
> Maybe it can be done **iteratively**.[name=ChenYang Yeh] [time=Sat, Mar 9, 2024 18:23 PM]
And in [inorder traversal](https://www.geeksforgeeks.org/tree-traversals-inorder-preorder-and-postorder/), it visits the left child first, then the parent node and right child.
```c
struct NodeTree *inOrder(struct NodeTree* root)
{
if(!root) {
return;
}
preOrder(root->left);
printf("%d ", root->val);
preOrder(root->right);
}
```
So, the first few element in `preorder` should be ==the root or parent nodes== in the tree. Once we determine the parent node, we can **partition** inorder array base on `idx` of the parent node. Those who has smaller index in inorder array should be assigned to left subtree, and vice versa.
$$
\textrm{inorder range} =
\begin{split}
\textrm{in_low ~ idx - 1}, & \textrm{ left} \\
\textrm{idx + 1 ~ in_high}, & \textrm{ right} \\
\end{split}
$$
```graphviz
digraph SimpleTree {
A[label="3"];
B1[label="9"];
B2[label="20"];
C3[label="15"];
C4[label="7"];
A -> B1;
A -> B2;
B2 -> C3;
B2 -> C4;
}
```
We can verify the result by post ordered tree traversal.
> I still not figure out why the range of preorder array should be
$$
\textrm{preorder range} =
\begin{split}
\textrm{pre_low + 1 ~ pre_low + (idx - in_low)}, & \textrm{ left} \\
\textrm{pre_high - (in_high - idx - 1) ~ pre_high}, & \textrm{ right} \\
\end{split}
$$
[name=ChenYang Yeh] [time=Sat, Mar 9, 2024 19:04 PM]
### Further Improvements
### Linux Kernel `cgroups`
preorder walk.
## Problem `4`: LRU Cache
Least Recently Used (LRU) cache is a type of caching mechanism used in computer systems, particularly in the context of managing memory or storage.
The goal of an LRU cache is to keep track of the usage patterns of various items and prioritize the retention of the most recently used items while evicting the least recently used items when the cache reaches its capacity.
LRU cache can also help lower the rate of **cache misses** in certain scenarios and remain good **temporal locality**.
```
Input
["LRUCache", "put", "put", "get", "put", "get", "put", "get", "get", "get"]
[[2], [1, 1], [2, 2], [1], [3, 3], [2], [4, 4], [1], [3], [4]]
Output
[null, null, null, 1, null, -1, null, -1, 3, 4]
```
```graphviz
digraph G {
node[shape=box, width = 1, height = .50 ];
" | " -> "1 | *" -> "*1 | 2" -> "1 | *2";
node[shape=box, width = 1, height = .50 ];
"*1 | 3" -> "4 | *3" -> "*4 | 3" -> " 4 | *3";
}
```
`*` means LRU Cache.
### How it works
We first define the structure of `LRUCache` and `LRUNode`.
```c
typedef struct {
int capacity;
int count;
struct list_head dhead;
struct hlist_head hhead[];
} LRUCache;
```
```c
typedef struct {
int key;
int value;
struct hlist_node node;
struct list_head link;
} LRUNode;
```
#### `lRUCacheCreate()`
Create a `LRUCache` node, and initializes all of `dhead` and all of `hlist_head` in the array.
```c
LRUCache *lRUCacheCreate(int capacity)
{
LRUCache *cache = malloc(2 * sizeof(int) + sizeof(struct list_head) +
capacity * sizeof(struct list_head));
cache->capacity = capacity;
cache->count = 0;
INIT_LIST_HEAD(&cache->dhead);
for (int i = 0; i < capacity; i++)
INIT_HLIST_HEAD(&cache->hhead[i]);
return cache;
}
```
Note that since the definition of `LRUCache` is ==imcompleted type==, we cannot use `malloc(sizeof(LRUCache))` directly.
> Perhaps `malloc(2 * sizeof(int) + sizeof(struct list_head) + capacity * sizeof(struct hlist_head))` is more reasonable. [name=ChenYang Yeh] [time=Sat, Mar 9, 2024 23:52 PM]
#### `lRUCacheFree()`
Free `LRUCache` and its relevant `LRUNode`.
```c
void lRUCacheFree(LRUCache *obj)
{
struct list_head *pos, *n;
list_for_each_safe (pos, n, &obj->dhead) {
LRUNode *cache = list_entry(pos, LRUNode, FFFF);
list_del(GGGG);
free(cache);
}
free(obj);
}
```
```graphviz
digraph {
rankdir = "LR";
subgraph "cluster LRUNode1"
{
key1[shape = record, label = "key"];
value1[shape = record, label = "value"];
list_head1[shape = record, label = "{<prev> prev |<next> next}"];
hlist_head1[shape = record, label = "{<pprev> pprev |<next> next}"];
}
subgraph "cluster LRUCache"
{
capacity[shape = record, label = "capacity"];
count[shape = record, label = "count"];
list_head[shape = record, label = "{<prev> prev |<next> next}"];
hlist_head[shape = record, label = "{| | ... | |}"];
}
subgraph "cluster LRUNode2"
{
key2[shape = record, label = "key"];
value2[shape = record, label = "value"];
list_head2[shape = record, label = "{<prev> prev |<next> next}"];
hlist_head2[shape = record, label = "{<pprev> pprev |<next> next}"];
}
list_head1:next -> list_head:prev;
list_head:prev -> list_head1:next;
list_head:next -> list_head2:prev;
list_head2:prev -> list_head:next;
}
```
#### `lRUCacheGet()`
move the `LRUNode` in the hash table with `key` to `dhead` of `LRUCache`, then return its `value`.
```c
int lRUCacheGet(LRUCache *obj, int key)
{
int hash = key % obj->capacity;
struct hlist_node *pos;
hlist_for_each (pos, &obj->hhead[hash]) {
LRUNode *cache = list_entry(pos, LRUNode, node);
if (cache->key == key) {
list_move(&cache->link, &obj->dhead);
return cache->value;
}
}
return -1;
}
```
It would first search `LRUNode` with `key` in certain bucket in hash table, if the object is successfully found, move it to the beginning of `dhead` and return `value`, if not, return `-1`.
The hash function is
$$
h(\textrm{key}) = \textrm{key } \% \textrm{ capacity}.
$$
So at this point, we know that the hash table in `LRUCache` is used to store the `LRUNode` for ==$O(1)$ finding complexity==.
> Of course the hash function can be redesigned, or it may not meet [SUHA condition](https://en.wikipedia.org/wiki/SUHA_(computer_science)) when `capacity` is large.
[name=ChenYang Yeh] [time=Sun, Mar 10, 2024 01:01 AM]
#### `lRUCachePut()`
There are two circumstances in `lRUCachePut()`.
* If `CacheNode` we want can be found in the hash table of `obj`, then move it to the beginning of `dhead` and change ==its `value`== to `value`.
```c
void lRUCachePut(LRUCache *obj, int key, int value)
{
LRUNode *cache = NULL;
int hash = key % obj->capacity;
struct hlist_node *pos;
hlist_for_each (pos, &obj->hhead[hash]) {
LRUNode *c = list_entry(pos, LRUNode, node);
if (c->key == key) {
list_move(&c->link, &obj->dhead);
cache = c;
}
}
```
* If the `CacheNode` we want cannot be found in hash table,
* If the cache is full (`count == capacity`), then we directly remove the last `LRUNode` in `dhead`, then add it into the beginning of `dhead` again and change its value.
* If the cache isn't full, then allocated the memory for `LRUNode`, and add it to the `dhead` and hash table.
```c
if (!cache) {
if (obj->count == obj->capacity) {
cache = list_last_entry(&obj->dhead, LRUNode, link);
list_move(&cache->link, &obj->dhead);
hlist_del(&cache->node);
hlist_add_head(&cache->node, &obj->hhead[hash]);
} else {
cache = malloc(sizeof(LRUNode));
hlist_add_head(&cache->node, &obj->hhead[hash]);
list_add(&cache->link, &obj->dhead);
obj->count++;
}
cache->key = key;
}
cache->value = value;
}
```
### Improvements
### LRU in Linux Kernel
## Problem `5`: Find nth Bit
:::danger
You shall explain why such routine exists in the Linux kernel and other applications.
:::
### How it works
We need to figure out the following macro first.
```c
#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
#define __const_hweight8(w)
```
#### `BITMAP_LAST_WORK_MASK()`
The purpose of this macro is to generate a bitmask for a bitmap with a specified number of bits `nbits`. Note that since in most Unix and Unix-like systems using [LP64](https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models) data model, we cannot directly use `0UL` in 32-bit architecture, or it will lead to an error.
```
$ ./bitmap
Bitmask for setting the last 5 bits to 1: 0b11111111111111111111111111111111
```
We need to declare `32-bit` unsigned integer by our own.
```c
#include <stdio.h>
#include <stdint.h>
uint32_t zero = 0;
#define BITS_PER_LONG 32 // Assume 64 bits for this example
#define BITMAP_LAST_WORD_MASK(nbits) (~(zero) >> (-(nbits) & (BITS_PER_LONG - 1)))
```
```
$ ./bitmap
Bitmask for setting the last 5 bits to 1: 0b00000000000000000000000000011111
```
#### `__const_hweight8(w)`
This macro calculates the number of set bits (bits set to 1) in an 8-bit binary number, which means the count of bits that are 1 in the given number.
```c
#define __const_hweight8(w) \
((unsigned int) ((!!((w) & (1ULL << 0))) + (!!((w) & (1ULL << 1))) + \
(!!((w) & (1ULL << 2))) + (!!((w) & (1ULL << 3))) + \
(!!((w) & (1ULL << 4))) + (!!((w) & (1ULL << 5))) + \
(!!((w) & (1ULL << 6))) + (!!((w) & (1ULL << 7)))))
```
The `!!` operator ensures that the resulting calculation yields ==only 0 or 1==, without any other numbers.
> `!!(00000010)` will return 1 instead of 2.
### Find nth Bit in Linux Kernel