owned this note
owned this note
Published
Linked with GitHub
# 2021q1 Homework2 (quiz2)
contributed by < `ilkclord` >
## Schedule
* [ ] **Test 1**
* [x] Working principle
* [x] Improvement
* [ ] Standalone meta
* [ ] Explanation
* [ ] Compare
* [ ] **Test 2**
* [x] Working principle
* [x] is_power_of_2
* [ ] Slab
* [ ] **Test 3**
* [x] Working principle
* [ ] Improvment
* [ ] Linux kernal research
* [ ] **Test 4**
* [ ] Working principle
* [ ] POSIX Thread testing
* [ ] chriso/intern research
## Test 1 - List_sort
### Working Principle
* **offsetof**
Return the distance (in byte unit) of the member to struct's start point .
Sample code
```clike
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
int main(int argc, char const *argv[])
{
struct s{
int a ;
int b ;
int c ;
} ;
printf("offsetof a = %lu\n",offsetof(struct s , a) );
printf("offsetof b = %lu\n",offsetof(struct s , b) );
printf("offsetof c = %lu\n",offsetof(struct s , c) );
struct s *s = malloc(sizeof(s));
printf("the position of struct is %p\n",s);
printf("the position of member b is %p\n",&(s->b) );
return 0;
}
```
Result :
```clike
offsetof a = 0
offsetof b = 4
offsetof c = 8
the position of struct is 006C1AC0
the position of member b is 006C1AC4
```
* **container_of**
```clike
#define container_of(ptr, type, member) \
__extension__({ \
const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
(type *) ((char *) __pmember - offsetof(type, member)); \
})
#endif
```
From the code
```clike
(type *) ((char *) __pmember - offsetof(type, member));
```
The pointer substract the offsetof(struct , member) , and make the pointer points to the start position of the struct , which is the address of struct .
* **list_entry**
```clike
#define list_entry(node, type, member) container_of(node, type, member)
```
Return the address of the node .
* **structure**
```clike
struct list_head {
struct list_head *prev, *next;
};
typedef struct __element {
char *value;
struct __element *next;
struct list_head list;
} list_ele_t;
typedef struct {
list_ele_t *head; /* Linked list of elements */
list_ele_t *tail;
size_t size;
struct list_head list;
} queue_t;
```
* **get_middle**
```clike
#define list_for_each(node, head) \
for (node = (head)->next; node != (head); node = node->next)
```
```clike
static list_ele_t *get_middle(struct list_head *list)
{
struct list_head *fast = list->next, *slow;
list_for_each (slow, list) {
if (fast->next == list || fast->next->next == list)
break;
fast = fast->next->next;
}
return list_entry(slow, list_ele_t, list);
}
```
The fuction finds out the midpoint by Floyd's algorithm.
* **list_cut_position**
```clike=
static inline void list_cut_position(struct list_head *head_to,
struct list_head *head_from,
struct list_head *node)
{
struct list_head *head_from_first = head_from->next;
if (list_empty(head_from))
return;
if (head_from == node) {
INIT_LIST_HEAD(head_to);
return;
}
head_from->next = node->next;
head_from->next->prev = head_from;
head_to->prev = node;
node->next = head_to;
head_to->next = head_from_first;
head_to->next->prev = head_to;
}
```
The gate checks two condition , the empty case and the singular case . For the normal case , we cut the list to two separate list , **head_to** is the left linked list and **head_from** is the right linked list.
How this funtion moves the node .
Original state :
```graphviz
Digraph D{
node[shape = box];
Head_from -> Head_from_first ->list->tail ->Head_from
rankdir = LR
}
```
```graphviz
Digraph G{
node[shape = box];
Head_to ;
left_list ->_node -> node_next ->right_list
rankdir = LR
}
```
After operation line 15 and 16 :
```graphviz
Digraph G{
node[shape = box];
Head_from -> node_next ->right_list ->Head_from
rankdir = LR ;
}
```
Finally line 18 ~ 21 :
```graphviz
Digraph G{
node[shape = box]
Head_to ->Head_from_first ->left_list->_node -> Head_to
rankdir = LR ;
}
```
The two list were cut , and it still reserve the double linked list structure !
* **list_merge**
```clike
static void list_merge(struct list_head *lhs,
struct list_head *rhs,
struct list_head *head)
{
INIT_LIST_HEAD(head);
if (list_empty(lhs)) {
list_splice_tail(lhs, head);
return;
}
if (list_empty(rhs)) {
list_splice_tail(rhs, head);
return;
}
while (!list_empty(lhs) && !list_empty(rhs)) {
char *lv = list_entry(lhs->next, list_ele_t, list)->value;
char *rv = list_entry(rhs->next, list_ele_t, list)->value;
struct list_head *tmp = strcmp(lv, rv) <= 0 ? lhs->next : rhs->next;
list_del(tmp);
list_add_tail(tmp, head);
}
list_splice_tail(list_empty(lhs) ? rhs : lhs, head);
}
```
The funtion has two gate for the case that either left or right is empty.When either of two is empty ,there is no need to merge .We splice it to the tail then return.
For the normal case , we use ***tmp** to record the larger node .We will delete the ***tmp** from it's list,then put it to the tail of the head.The **head** node is the head node of the merged list.
* **list_merge_sort**
```clike
void list_merge_sort(queue_t *q)
{
if (list_is_singular(&q->list))
return;
queue_t left;
struct list_head sorted;
INIT_LIST_HEAD(&left.list);
list_cut_position(&left.list, &q->list, &get_middle(&q->list)->list);
list_merge_sort(&left);
list_merge_sort(q);
list_merge(&left.list, &q->list, &sorted);
INIT_LIST_HEAD(&q->list);
list_splice_tail(&sorted, &q->list);
}
```
The function has a gate for singular list which means it doesn't need to sort.
For the non-singular list ,the list is cut from the middle ,and we record them by **left** and **q->list**.Since the merge sort required two sorted list to sort , we use merge-sort to sort the two list above and record the sorted list in **sorted** .
After all the operation are down ,we added the sorted list to the **q->head**.
### Improvement
When going through the code , we can find out there are some useless member in the structure .In **queue_t** ,***tail** can get by **head->list->pre** ,and merge_sort doesn't call **size** to use . Both of them can be removed . **list** may be useless if we operate one queue each time .
And ***next** in **element** is useless because we had a structure **list_head** to record the next node and the previous node .
The improve structure for one-queue at the sametime :
```clike
struct list_head {
struct list_head *prev, *next;
};
typedef struct __element {
char *value;
struct list_head list;
} list_ele_t;
typedef struct {
list_ele_t *head; /* Linked list of elements */
struct list_head list;
} queue_t;
```
## Test 2 - Bit-operation
```clike
uint16_t func(uint16_t N) {
/* change all right side bits to 1 */
N |= N >> 1;
N |= N >> 2;
N |= N >> 4;
N |= N >> 8;
return (N + 1) >> 1;
}
```
### Working Principle
We have to get the largest 2^n^ which 2^n^ <= N , because 2^n^ has only one bit is '1' ,the N after operation must has only one bit with state 1 .For example "000x100" .And now we are going to find out the fls() aka find last bit .Our target is the fls , so we have to keep the last bit unchanged .
```clike
N |= N >> x
```
This operation will keep the position of last bit , because after shifting , **N'** must be smaller than **N** , and **1|0 = 1** . This operation is suitable .
Then to make the return be a clean 2^n^ , consider the operation
```clike
00111111 + 00000001 = 01000000
```
By making **N** 's bit on rightside of last bit , and plus 1 , then shift >> 1 will reach the goal which is :
```clike
N + 1 >> 1;
```
The order of **x** is 1,2,4,8 , because the modified bit will effect the rest of all .
* **Overflow**
Tp prevent it , we can change the order of operation .
```clike
N = N >> 1 +1 ;
```
### Is_power_of_2
* **is_power_of_2**
```clike
static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
return (n != 0 && ((n & (n - 1)) == 0));
}
```
As for 2^x^ , the byte must only have 1 bit is 1 , for example "000x100".When you substract them , you will get a series of 1 in your byte .
```clike
01000000 - 00000001 = 00111111
```
It will be 0 after **n &(n - 1)** !
:::danger
Fix the grammatical mistakes!
:notes: jserv
:::
* **roundup_pow_of_two**
```clike
static inline __attribute__((const))
unsigned long __roundup_pow_of_two(unsigned long n)
{
return 1UL << fls_long(n - 1);
}
```
**1UL** represents the unsigned 1 and **fls_long** returns the last bit's position .It works by finding the last bit's position and shift the **1** by fls() to acquire the answer . It will work because **1 << x** makes a round-up , since **1 << x** equals to **1** * **2^x^** .The reason to minus 1 to the **n** is avoid n is already 2^x^ .For example :
```clike
// input 8 (00x001000)
fls_long(8) = 4 ;
1 << 4 = 16 ;
fls_long(7) = 3 ;
1 << 3 = 8 ;
7 = 8 - 1;
// input 9 (00x001001)
fls_long(9) = 4 ;
1 << 4 = 16 ;
fls_long(8) = 4 ;
1 << 4 = 16 ;
```
According to the implement , we can see that function also works in none 2^x^ number without substracting 1 , so we can reason that the substracting 1 operation is added for the case 2 ^x^ .
:::danger
Fix the grammatical mistakes!
:notes: jserv
:::
* **rounddown_pow_of_two**
```clike
static inline __attribute__((const))
unsigned long __rounddown_pow_of_two(unsigned long n)
{
return 1UL << (fls_long(n) - 1);
}
```
To achieve our goal, we only need to find the last bit .We shift **1UL** by **fls_long(n) -1** , so "1" will be send to the right position correctly.
### Slab Allocator
:::danger
Fix the grammatical mistakes!
:notes: jserv
:::
* **Background**
There are three diffrent implementation in this day .
* Slob Allocator : The oinginal slab allocator , using at the scarce memory source situation . Based on **first-fit allocation algorithm**
* Slub Allocator : An improvement over Slob , aims to be “cache-friendly”.
* Slab Allocator : Execute faster than Slab , and reduce the number of queues / chains used .
* **Is_power_of_2 in slab**
At slab_common.c from line 541 to line 570 .
```clike
/* Create a cache during boot when no slab services are available yet */
void __init create_boot_cache(struct kmem_cache *s, const char *name,
unsigned int size, slab_flags_t flags,
unsigned int useroffset, unsigned int usersize)
{
int err;
unsigned int align = ARCH_KMALLOC_MINALIGN;
s->name = name;
s->size = s->object_size = size;
/*
* For power of two sizes, guarantee natural alignment for kmalloc
* caches, regardless of SL*B debugging options.
*/
if (is_power_of_2(size))
align = max(align, size);
s->align = calculate_alignment(flags, align, size);
s->useroffset = useroffset;
s->usersize = usersize;
err = __kmem_cache_create(s, flags);
if (err)
panic("Creation of kmalloc slab %s size=%u failed. Reason %d\n",
name, size, err);
s->refcount = -1; /* Exempt from merging for now */
}
```
**is_power_of_2** works at below
```clike
if (is_power_of_2(size))
align = max(align, size);
s->align = calculate_alignment(flags, align, size);
```
The align represents the how much to move to be align . Because the cache size is 2^x^ ,if the size is 2^x^ , then it will just fit .
(need to be modified)
reference <https://hammertux.github.io/slab-allocator>
reference <https://elixir.bootlin.com/linux/latest/source/mm/slab_common.c#L556>
## Test 3 - Bitcpy
### Working principle
* **bitcpy - setting**
```clike
size_t read_lhs = _read & 7;
size_t read_rhs = 8 - read_lhs;
const uint8_t *source = (const uint8_t *) _src + (_read / 8);
size_t write_lhs = _write & 7;
size_t write_rhs = 8 - write_lhs;
uint8_t *dest = (uint8_t *) _dest + (_write / 8);
static const uint8_t read_mask[] = {
0x00, /* == 0 00000000b */
0x80, /* == 1 10000000b */
0xC0, /* == 2 11000000b */
0xE0, /* == 3 11100000b */
0xF0, /* == 4 11110000b */
0xF8, /* == 5 11111000b */
0xFC, /* == 6 11111100b */
0xFE, /* == 7 11111110b */
0xFF /* == 8 11111111b */
};
static const uint8_t write_mask[] = {
0xFF, /* == 0 11111111b */
0x7F, /* == 1 01111111b */
0x3F, /* == 2 00111111b */
0x1F, /* == 3 00011111b */
0x0F, /* == 4 00001111b */
0x07, /* == 5 00000111b */
0x03, /* == 6 00000011b */
0x01, /* == 7 00000001b */
0x00 /* == 8 00000000b */
};
```
`read_lhs` ,`write_lhs`: check if the start writing-bit / reading-bit is the start of the byte.
`read_rhs` , `write_rhs`: check if it need to cross two byte .
`*source` , `*dest` : pointer points to the start address .
* **bitcpy - operation**
```clike
while (count > 0) {
uint8_t data = *source++;
size_t bitsize = (count > 8) ? 8 : count;
if (read_lhs > 0) {
data <<= read_lhs;
if (bitsize > read_rhs)
data |= (*source >> read_rhs);
}
if (bitsize < 8)
data &= read_mask[bitsize];
uint8_t original = *dest;
uint8_t mask = read_mask[write_lhs];
if (bitsize > write_rhs) {
/* Cross multiple bytes */
*dest++ = (original & mask) | (data >> write_lhs);
original = *dest & write_mask[bitsize - write_rhs];
*dest = original | (data << write_rhs);
} else {
// Since write_lhs + bitsize is never >= 8, no out-of-bound access.
mask |= write_mask[write_lhs + bitsize];
*dest++ = (original & mask) | (data >> write_lhs);
}
count -= bitsize;
}
```
`mask` : the tool use to extract the oringinal byte .
`original` : record the original start bit .
The first gate check the alignment , if it isn't , shift the source and record in data .
The second gate extraxt the bit that are going to be copied , in the form [copy bit] + [0x] .
The third gate is design for two case , crossing byte or not .
For the crossing case , `original & mask` extract the writing bit and `|` copy the bit .Then `dest++` moves to the next byte , and continue copying .
For not crossing case , after extracted , we copy the bit to the writing byte .
### Improvement
By align the byte first , reduce the cross byte operation .
**Sample code (testing)**
```clike
include <stdio.h>
#define bytesize 8
#define >> a /
if(a < 0)
<< a /
else
>> a /
#endif
void bitcpy(int *write , int write_index , int *read , int read_index , int count){
int gap = write_index - read_index ;
int write_index_byte = write_index / bytesize ;
int read_index_byte = read_index / bytesize ;
if(count > bytesize){
int shift = bytesize - write_index%bytesize ;
write[write_index_byte] = write[write_index_byte] >> shift << shift + read[read_index_byte] << (bytesize - shift) >> (bytesize - shift);
write_index_byte++ ;
read_index_byte++ ;
count = count - shift ;
while(count /bytesize > 0){
write[write_index_byte++] = read[read_index_byte] << gap >> gap + read[read_index_byte - 1] << (bytesize - gap) ;
read_index_byte++ ;
count = count - bytesize ;
}
if(count != 0){
write[write_index_byte] = write[write_index_byte] << shift >> shift + read[read_index_byte-1] >> (bytesize - shift) << (bytesize - shift) ;
}
}
else{
}
}
int main(int argc, char const *argv[])
{
return 0;
}
```
## Test 4 - string interning
### Working Principle