# 2019q3 Homework1 (review) contributed by < `yaohsiaopid` > ## 測驗 `1` 考慮以下 C 程式的 `align4` 巨集的作用是,針對 4-byte alignment,找到給定地址的 round up alignment address。 ```cpp #include <stdio.h> #define align4(x) (((x) + K) & (-4)) int main(void) { int p = 0x1997; printf("align4(p) is %08x\n", align4(p)); return 0; } ``` 預期程式輸出 `align4(p) is 00001998` `K` = `(3)` ### Discussion #### Why it works For `sizeof(int) = 4`, The macro will become `(sizeof(n) + 3) & ~3`. Then we can prove the macro works by working out the following scenario: | sizeof(n) | intermediate steps | result | comments | | :--------: | :--------: | :--------: | :--------: | | `4k` | `(4k+3)& 1..11100` | `4k` | | `4k+1` | `(4k+4)& 1..11100` | `4(k+1)` | | `4k+2` | `(4k+4+1) & 1..11100`| `4(k+1)` | the last two digits are `01`(from 5) and `01 & 00 = 0` | | `4k+3` | `(4k+6) & 1..11100` | `4(k+1)` | likewise last two digits are `10 & 00 = 0` #### 延伸題目: 在 Linux 核心原始程式碼找出類似的 alignment 巨集並解釋其作用 * `include/linux/kernel.h` line 33: ``` c /* @a is a power of 2 value */ #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) ``` * `include/uapi/linux/kernel.h` line 10: ``` c #define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1) #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) ``` **last line matches almost perfectly with the problem with mask = 3 !!** * Where `ALIGN` is used ? * `grep -rnw '.' -e 'ALIGN'` yields 2443 results. * Of these results, let's take a look at `./mm/page_alloc.c` * `./mm/page_alloc.c` - Linux page allocator * `./include/linux/mm.h:216`: The macro is to "align the pointer to the (next) page boundary" ```c #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE) ``` * an example involved `ALIGN`: `alloc_pages_exact > make_alloc_exact > PAGE_ALIGN > ALIGN` * `void *alloc_pages_exact(size_t size, gfp_t gfp_mask)`: ```c void *alloc_pages_exact(size_t size, gfp_t gfp_mask) { unsigned int order = get_order(size); unsigned long addr; if (WARN_ON_ONCE(gfp_mask & __GFP_COMP)) gfp_mask &= ~__GFP_COMP; addr = __get_free_pages(gfp_mask, order); return make_alloc_exact(addr, order, size); } ``` * Goal: allocate an exact number physically-contiguous pages. * `size`: **number of bytes to allocate** * different from `alloc_pages()`, which can only allocate memory in **power-of-two pages** * how this function works: 1. Firstly it asks for power-of-two pages by `__get_free_pages(gfp_mask, order)` where `order = get_order(size)` At this point, it obtains $2^{order}$ pages where $2^{order}\times \text{PAGE_SIZE} > size$ 2. Free up the extra pages it asked in previous step with `make_alloc_exact`: ```c static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { if (addr) { unsigned long alloc_end = addr + (PAGE_SIZE << order); unsigned long used = addr + PAGE_ALIGN(size); split_page(virt_to_page((void *)addr), order); while (used < alloc_end) { free_page(used); used += PAGE_SIZE; } } return (void *)addr; } ``` * Since the allocator only distributes complete page, `PAGE_ALIGN(size)` is needed above to find out the size of pages that is able to accomodate `size` number of bytes. In return we get `used`, which is page-aligned. * In the `while` loop, it then sequentially frees the extra pages one by one. * Question/To do on this function: * `split_page` not yet investigated * A possible worst-case can be that size = `PAGE_SIZE * (2^x + 1)`, then `order = (x+1)` and `while` will unfortunately runs for `2^(x+1) -(2^x+1) = 2^x + 1` times. Not so sure if this O(size) is miserable, but one thing certain is that allocating 'exact' number of bytes can quite troublesome. * Reference * [LinuxMM: Page Allocation](https://linux-mm.org/PageAllocation) * [Physical Page Allocation](https://www.kernel.org/doc/gorman/html/understand/understand009.html) * https://ufal.mff.cuni.cz/~jernej/2018/docs/predavanja07.pdf :::danger So, what is the benefit? Can you expriment? :notes: jserv ::: --- ## 測驗 `2` 考慮以下 C 程式可取得以 `sizeof(int)` 為 alignment 基礎的整數空間,也就是說,當 `sizeof(int)` 為 `4` 的時候: * 給定 `char arr[13];`,`INT_SIZE_OF(arr)` 要得到 `16`; * 給定 `short int s;`,`INT_SIZE_OF(s)` 要得到 `4` ```cpp #include <stdint.h> #define INT_SIZE_OF(n) \ ((sizeof(n) + sizeof(int) + X) & ~(sizeof(int) + Y)) ``` * 參考資訊: [Memory access and alignment](https://lwn.net/Articles/260832/) `X` = `(-1)` `Y` = `(-1)` ### 延伸題目: 解釋運作原理並在 GitHub 找出實際應用的程式碼 - device tree compiler * `dtc/dtc.c:59`: ```c #define ALIGN(x, a) (((x) + (a) - 1) & ~((a) - 1)) ``` * Why: To generate formatted .dtb file * In the process of generating a dtb file, Device Tree Compiler basically aims to write a linked list of structs into a file * I think in order to make the dtb formatted in memory, dtc thus uses `ALIGN(x,a)` for different properties of different struct * Such as when it prepares blob containing fdt header to be written as file, it will make it aligned by 8 before adding more information ```c blob = data_append_data(blob, &fdt, vi->hdr_size); blob = data_append_align(blob, 8); ``` * usage: * `data_append_align()` from `dtc/data.c`: ```c int newlen = ALIGN(d.len, align); return data_append_zeroes(d, newlen - d.len); ``` * `make_fdt_header()` from `dtc/flattree.c`: ```c /* Reserve map should be doubleword aligned */ reserve_off = ALIGN(vi->hdr_size, 8); ``` * `dt_to_blob()` from `dtc/flattree.c`: ```c if (alignsize > 0) padlen = ALIGN(fdt32_to_cpu(fdt.totalsize) + padlen, alignsize) - fdt32_to_cpu(fdt.totalsize); ``` * Reference: * [Device Tree(四):文件结构解析](http://www.wowotech.net/device_model/dt-code-file-struct-parse.html) * [Device Tree Usage](https://elinux.org/Device_Tree_Usage) --- ## 課程說明 ppt slide 10 ```c /* IEEE 754 */ #include <stdio.h> int main() { float sum = 0.0f, corr = 0.0f;/* corrective value for rounding error */ printf("1+..+t\tsum \tcorrection\tgold\n"); for (int i = 0; i < 10000 ; i++) { float y = (i + 1) - corr; /* add the correction to specific item */ float t = sum + y; /* bits might be lost */ corr = (t - sum) - y; /* recover lost bits */ sum = t; if(i > 9990 ) { printf("%d\t%f\t%f\t%d\n", i+1, sum, corr, (i+2)*(i+1)/2); } } printf("Sum: %f\n", sum); return 0; } ``` * Original code in slide 9 does not work because there will be [round-off error](https://en.wikipedia.org/wiki/Floating-point_arithmetic#Floating-point_arithmetic_operations) under the IEEE 754 representation of floating point. * The code in slide 10 works for 10000. * Out of curiosity I plot out the `corr` ![](https://i.imgur.com/1STrd1C.png) * Although the above code superficially fixes the problem for the summation to **10000** with error correction, the problem is still there: * The above C output: ``` 1+..+t sum correction gold 9992 49925028.000000 0.000000 49925028 9993 49935020.000000 -1.000000 49935021 9994 49945016.000000 1.000000 49945015 9995 49955008.000000 -2.000000 49955010 9996 49965008.000000 2.000000 49965006 9997 49975004.000000 1.000000 49975003 9998 49985000.000000 -1.000000 49985001 9999 49995000.000000 0.000000 49995000 10000 50005000.000000 0.000000 50005000 Sum: 50005000.000000 ``` * it does not work for other numbers, but I think this is because of the resolution of the fraction part of IEEE 754. * why does it works ? ``` assume corr = 0, then y = i+1 sum 1.f1 x 2^e1 i+1 +) 1.f2 x 2^e2 , assume: e1 >> e2 ---------------------------------------------------------------- sum 1.f1 x 2^e1 i+1 +) 1.f2 x 2^(e2-e1) x 2^e1 , 1.f2 = 1.f21 + f22 and f22 will be lost after shifting ---------------------------------------------------------------- t (1.f1 + 1.f2x2^(e2-e1)) x 2^e1 t-sum = 1.f2 x 2^(e2-e1)) x 2^e1 = 1.f2 x 2^e2 ---(5) t-sum-y = f22 x 2^e2 , the lost bits ! ``` * Not yet find a universal version .... :::danger Check [Pairwise summation](https://en.wikipedia.org/wiki/Pairwise_summation) :notes: jserv ::: * Reference: * [IEEE 754](https://en.wikipedia.org/wiki/Floating-point_arithmetic#IEEE_754:_floating_point_in_modern_computers)