# 2019q1 Homework7 (buddy) contributed by < `rebvivi` > ###### tags: `linux2019` * [F13: buddy](https://hackmd.io/s/SyPmY90jV) ## 作業要求 - 完成 第 11 週測驗題 (下) 和所有延伸題目 - 在 Linux 核心原始程式碼使用 buddy allocator 的案例,介紹其原理,設計 Linux 核心模組的實驗 - 需要涵蓋 kernel API 同步機制的運用 - 執行時期的分析 (提示: 可善用 eBPF) ## 測驗 `1` - `parent_is_split()` 函式中,第 4 行先找到 parent 的 index - `node_is_split[]` 這個 array 裡頭的每一個 element 都代表 1 個 byte ,也就是 8 個 bits - 第 5 行將 `index / 8`,找到在 `node_is_split[]` 這個 array 裡頭對應的 byte,再將 `node_is_split[index / 8]` 往右 shift `(index % 8)` ,為的是找到在 `node_is_split[index / 8]` 這個 byte 裡頭 , index 是對應到 0 ~ 7 當中哪一個 bit ,最後再和 `1` 做 `&`,判斷對應到的 bit 是 0 還是 1 ```cpp= // Given the index of a node, this returns the "is split" flag of the parent. static bool parent_is_split(size_t index) { index = (index - 1) / 2; return (bool) ((node_is_split[index / 8] >> (index % 8)) & 1); } ``` - `flip_parent_is_split()` 函式中,第 4 行先找到 parent 的 index - 第 5 行中,`1 << (index % 8)` 是先找到 index 在它對應的 byte 當中是在 0 ~ 7 中哪一個 bit,之後將那個 bit 設為 1 - 將 `node_is_split[index / 8]` ,也就是 index 原本對應到的 byte 和 `1 << (index % 8)`,也就是我們所設定 index bit 為 1 的那個 byte 做 `XOR` - 這樣如果原本 index 的 `"is split" flag` 是 `1` , 和 1 做 `XOR` 之後就會變成 `0`,而如果原本 index 的 `"is split" flag` 是 `0` , 和 1 做 `XOR` 之後就會變成 `1`,就成功達到`flips the "is split" flag` 的效果 ```cpp= // Given the index of a node, this flips the "is split" flag of the parent. static void flip_parent_is_split(size_t index) { index = (index - 1) / 2; node_is_split[index / 8] ^= 1 << (index % 8); } ``` - 在 `bucket_for_request()` 函式中,第 10 行的 while ,只要 `size` 還不夠滿足我們所需要的 memory 大小,`bucket` 這個 index 就會 -1,`size` 就會變 2 倍,直到 `size > request` 才會停止 - `bucket` 這個 index 是從 `BUCKET_COUNT - 1` 慢慢減下來的,意謂著當 index 是 0 , `size` 就是 `MAX_ALLOC` ```cpp= /* * Given the requested size passed to "malloc", this function returns the index * of the smallest bucket that can fit that size. */ static size_t bucket_for_request(size_t request) { size_t bucket = BUCKET_COUNT - 1; size_t size = MIN_ALLOC; while (size < request) { bucket--; size *= 2; } return bucket; } ``` ```cpp= /* * The tree is always rooted at the current bucket limit. This call grows the * tree by repeatedly doubling it in size until the root lies at the provided * bucket index. Each doubling lowers the bucket limit by 1. */ static bool lower_bucket_limit(size_t bucket) { while (bucket < bucket_limit) { size_t root = node_for_ptr(base_ptr, bucket_limit); /* * If the parent isn't SPLIT, that means the node at the current bucket * limit is UNUSED and our address space is entirely free. In that case, * clear the root free list, increase the bucket limit, and add a single * block with the newly-expanded address space to the new root free * list. */ if (!parent_is_split(root)) { list_remove((list_t *) base_ptr); list_init(&buckets[--bucket_limit]); list_push(&buckets[bucket_limit], (list_t *) base_ptr); continue; } /* * Otherwise, the tree is currently in use. Create a parent node for the * current root node in the SPLIT state with a right child on the free * list. Make sure to reserve the memory for the free list entry before * writing to it. Note that we do not need to flip the "is split" flag * for our current parent because it's already on (we know because we * just checked it above). */ uint8_t *right_child = ptr_for_node(root + 1, bucket_limit); if (!update_max_ptr(right_child + sizeof(list_t))) return false; list_push(&buckets[bucket_limit], (list_t *) right_child); list_init(&buckets[--bucket_limit]); /* * Set the grandparent's SPLIT flag so if we need to lower the bucket * limit again, we'll know that the new root node we just added is in * use. */ root = (root - 1) / 2; if (root != 0) flip_parent_is_split(root); } return true; } ``` ```cpp= void *malloc(size_t request) { if (!request || request + HEADER_SIZE > MAX_ALLOC) return NULL; /* * Initialize our global state if this is the first call to "malloc". At the * beginning, the tree has a single node that represents the smallest * possible allocation size. More memory will be reserved later as needed. */ if (!base_ptr) { /* FIXME: sbrk is deprecated on some platforms where mmap is suggested * as a better replacement (macOS, FreeBSD). sbrk is generally * onsidered quite archaic. */ base_ptr = max_ptr = (uint8_t *) sbrk(0); bucket_limit = BUCKET_COUNT - 1; update_max_ptr(base_ptr + sizeof(list_t)); list_init(&buckets[BUCKET_COUNT - 1]); list_push(&buckets[BUCKET_COUNT - 1], (list_t *) base_ptr); } /* * Find the smallest bucket that will fit this request. This doesn't check * that there's space for the request yet. */ size_t bucket = bucket_for_request(request + HEADER_SIZE); size_t original_bucket = bucket; /* * Search for a bucket with a non-empty free list that's as large or larger * than what we need. If there isn't an exact match, we'll need to split a * larger one to get a match. */ while (bucket + 1 != 0) { /* * We may need to grow the tree to be able to fit an allocation of this * size. Try to grow the tree and stop here if we can't. */ if (!lower_bucket_limit(bucket)) return NULL; /* * Try to pop a block off the free list for this bucket. If the free * list is empty, we're going to have to split a larger block instead. */ uint8_t *ptr = (uint8_t *) list_pop(&buckets[bucket]); if (!ptr) { /* * If we're not at the root of the tree or it's impossible to grow * the tree any more, continue on to the next bucket. */ if (bucket != bucket_limit || bucket == 0) { bucket--; continue; } /* * Otherwise, grow the tree one more level and then pop a block off * the free list again. Since we know the root of the tree is used * (because the free list was empty), this will add a parent above * this node in the SPLIT state and then add the new right child * node to the free list for this bucket. Popping the free list will * give us this right child. */ if (!lower_bucket_limit(bucket - 1)) return NULL; ptr = (uint8_t *) list_pop(&buckets[bucket]); } /* * Try to expand the address space first before going any further. If we * have run out of space, put this block back on the free list and fail. */ size_t size = (size_t) 1 << (MAX_ALLOC_LOG2 - bucket); size_t bytes_needed = bucket < original_bucket ? size / 2 + sizeof(list_t) : size; if (!update_max_ptr(ptr + bytes_needed)) { list_push(&buckets[bucket], (list_t *) ptr); return NULL; } /* * If we got a node off the free list, change the node from UNUSED to * USED. This involves flipping our parent's "is split" bit because that * bit is the exclusive-or of the UNUSED flags of both children, and our * UNUSED flag (which isn't ever stored explicitly) has just changed. * * Note that we shouldn't ever need to flip the "is split" bit of our * grandparent because we know our buddy is USED so it's impossible for * our grandparent to be UNUSED (if our buddy chunk was UNUSED, our * parent wouldn't ever have been split in the first place). */ size_t i = node_for_ptr(ptr, bucket); if (i != 0) flip_parent_is_split(i); /* * If the node we got is larger than we need, split it down to the * correct size and put the new unused child nodes on the free list in * the corresponding bucket. This is done by repeatedly moving to the * left child, splitting the parent, and then adding the right child to * the free list. */ while (bucket < original_bucket) { i = i * 2 + 1; bucket++; flip_parent_is_split(i); list_push(&buckets[bucket], (list_t *) ptr_for_node(i + 1, bucket)); } /* * Now that we have a memory address, write the block header (just the * size of the allocation) and return the address immediately after the * header. */ *(size_t *) ptr = request; return ptr + HEADER_SIZE; } return NULL; } ``` ```cpp= void free(void *ptr) { /* Ignore any attempts to free a NULL pointer */ if (!ptr) return; /* * We were given the address returned by "malloc" so get back to the actual * address of the node by subtracting off the size of the block header. Then * look up the index of the node corresponding to this address. */ ptr = (uint8_t *) ptr - HEADER_SIZE; size_t bucket = bucket_for_request(*(size_t *) ptr + HEADER_SIZE); size_t i = node_for_ptr((uint8_t *) ptr, bucket); /* * Traverse up to the root node, flipping USED blocks to UNUSED and merging * UNUSED buddies together into a single UNUSED parent. */ while (i != 0) { /* * Change this node from UNUSED to USED. This involves flipping our * parent's "is split" bit because that bit is the exclusive-or of the * UNUSED flags of both children, and our UNUSED flag (which isn't ever * stored explicitly) has just changed. */ flip_parent_is_split(i); /* * If the parent is now SPLIT, that means our buddy is USED, so don't * merge with it. Instead, stop the iteration here and add ourselves to * the free list for our bucket. * * Also stop here if we're at the current root node, even if that root * node is now UNUSED. Root nodes don't have a buddy so we can't merge * with one. */ if (parent_is_split(i) || bucket == bucket_limit) break; /* * If we get here, we know our buddy is UNUSED. In this case we should * merge with that buddy and continue traversing up to the root node. We * need to remove the buddy from its free list here but we don't need to * add the merged parent to its free list yet. That will be done once * after this loop is finished. */ list_remove((list_t *) ptr_for_node(((i - 1) ^ 1) + 1, bucket)); i = (i - 1) / 2; bucket--; } /* * Add ourselves to the free list for our bucket. We add to the back of the * list because "malloc" takes from the back of the list and we want a * "free" followed by a "malloc" of the same size to ideally use the same * address for better memory locality. */ list_push(&buckets[bucket], (list_t *) ptr_for_node(i, bucket)); } ``` ## 延伸問題 `1` >在 [sbrk 的 Linux Man Page](https://linux.die.net/man/2/sbrk) 中有提到: >brk() and sbrk() change the location of the program break, which defines the end of the process's data segment (i.e., the program break is the first location after the end of the uninitialized data segment). Increasing the program break has the effect of allocating memory to the process; decreasing the break deallocates memory. > >sbrk() increments the program's data space by increment bytes. Calling sbrk() with an increment of 0 can be used to find the current location of the program break. >>void *sbrk(intptr_t increment); > >sbrk() 用來改變 program break 的位置,改變 data segment size ,實現 virtual memory 到 physical memory 的 mapping >- 當 increment > 0 時, program break 的位置向後移動 increment 個字節,同時 return 移動之前的位置,相當於 allocate memory >- 當 increment < 0 時,program break 的位置向前移動 increment 個字節,相當與於 free memory >- 當increment = 0 時,program break 的位置不移動,只 return 當前的位置 ![](https://i.imgur.com/I8Y9xyp.jpg) - 第 6 行的 `p0` 的 address 為 program break 的位置,也就是 heap 的 ending address,執行結果為 `0x55ed28a04000` - 第 9 行的 `p1` 的 address ,執行結果為 `0x55ed28a25000`,變大了`0x21000` - 雖然 program 可能只是申請很少的 memory ,但是為了方便,系統會把很大的 memory 分配給 program ,這樣的話,就避免了多次 kernel 與 user 的切換,提高了 program 的效率 - 這 `0x21000` 的 memory 稱為 arena,之後申請的 memory 會一直從這個 arena 中獲取,直到 memory 不足 - 第 12 行的 `p2` 的 address ,執行結果為 `0x55ed28a25400`, return 的結果為第一次分配的 1024 個字節空間 - 第 15 行中, free memory ,所以第 17 行的 `p3` 會回到一開始 `p1` 所在的初始位置 ```cpp= #include <stdio.h> #include <unistd.h> int main() { char *p0 = sbrk(0); printf("p0 is : %p\n", p0); char *p1 = sbrk(1024); printf("p1 is : %p\n", p1); char *p2 = sbrk(1024); printf("p2 is : %p\n", p2); sbrk(-2048); char *p3 = sbrk(0); printf("p3 is : %p\n", p3); } ``` >輸出結果如下 ```cpp p0 is : 0x55ed28a04000 p1 is : 0x55ed28a25000 p2 is : 0x55ed28a25400 p3 is : 0x55ed28a25000 ``` ```cpp= #include <stdio.h> #include <unistd.h> int main() { void *p; p = sbrk(0); printf("Initial brk: %p\n", p); p = sbrk(1); // Increase the brk (returns OLD brk!) p = sbrk(0); // Get the new brk printf("New brk: %p\n", p); return 0; } ``` >輸出結果如下 ```cpp Initial brk: 0x5574cd170000 New brk: 0x5574cd191001 ```