# 2020q2 Homework2 (quiz2)
contributed by < `eecheng87` >
###### tags: `linux2020`
## 解釋程式碼運作原理,並提出改進方案
### Describe what did each function do
* union `xs`
```cpp
typedef union {
char data[16];
struct {
uint8_t filler[15],
space_left : 4,
/* if it is on heap, set to 1 */
is_ptr : 1, flag1 : 1, flag2 : 1, flag3 : 1;
};
/* heap allocated */
struct {
char *ptr;
/* supports strings up to 2^54 - 1 bytes */
size_t size : 54,
/* capacity is always a power of 2 (unsigned)-1 */
capacity : 6;
/* the last 4 bits are important flags */
};
} xs;
```
If you are not sure what exactly layout of its member, following is address of member:
```shell
(gdb) p &string
$1 = (xs *) 0x7fffffffdc20
(gdb) p &string->data
$2 = (char (*)[16]) 0x7fffffffdc20
(gdb) p &string->filler
$3 = (uint8_t (*)[15]) 0x7fffffffdc20
(gdb) p &string->space_left
$4 = (uint8_t *) 0x7fffffffdc2f ""
(gdb) p &string->is_ptr
$5 = (uint8_t *) 0x7fffffffdc2f ""
(gdb) p &string->flag1
$6 = (uint8_t *) 0x7fffffffdc2f ""
(gdb) p &string->ptr
$7 = (char **) 0x7fffffffdc20
(gdb) p &string->size
$8 = (size_t *) 0x7fffffffdc28
(gdb) p &string->capacity
$9 = (size_t *) 0x7fffffffdc28
(gdb) p &string->flag2
$10 = (uint8_t *) 0x7fffffffdc2f ""
(gdb)
```
* `space_left`, `is_ptr`, `flags` share the same 8 bits `uint8_t`
* each struct's start address is as same as `data`. (characteristic of union)
* The variable of last struct only use 60 bits. Others 4 bits are used by `is_ptr`, `flag`.
* `static inline bool xs_is_ptr(const xs *x)`
The purpose of it is as same as its function name, check whether it is pointer which is one bit variable (see also: [bit field](https://hackmd.io/@sysprog/c-bitfield)) in struct.
* `static inline size_t xs_size(const xs *x)`
* `static inline char *xs_data(const xs *x)`
If string isn't short, it stores in heap. When accessing long string, we need read `ptr` in union.
* macro `xs_tmp(x)`
This macro have two jobs to do which is seperated by comma. First, check whether input string is less than 16. If not, you will get error message by `_Static_assert` after compile. Second, call function `xs_new` and check whether input `x` is valid through concat `""` and `x`.
* macro `xs_literal_empty()`
Use [compound literal](https://en.cppreference.com/w/c/language/compound_literal) to update value of member `space_left`.
* `xs *xs_new(xs *x, const void *p)`
Illustrate in several sections
* variable `mask` : its complete definition is `uint8_t mask[32]` which contain 256 bits (32 * 8). Each character can map into this array(one to one). The most powerful part is that `mask` can record $2^{256}$ posibilities. e.g. When value of `mask[0]` is 3, it means we don't want character `NUL` and `SOH`.
* macro `set_bit(byte)` : map input character into `mask`. e.g. If input is `'\n'` which denotes $1010_b$, `mask[1]` will be set to 4 ($100_b$) through `|` operation. Divide and shift operation in this macro you can view them as hash operation. Hash operation make mapping feasible.
* macro `check_bit(byte)` : check bit of `mask` . e.g. If you want to check '\n' is in `mask` or not, just check bit 3 in `mask[1]` through `&` operation. If match, you will get 1 as r-value.
* Explain mechanism of the function:
* `const char *trimset` store the character you want to delete. e.g. pass `\n ` means you want to remove character '\n' and ' '.
* ```cpp
// this for loop aim at maping
// all charater you don't want
// into array mask
for (i = 0; i < trimlen; i++)
set_bit(trimset[i]);
// start from left of input string,
// break until encounter character
// you want to keep
for (i = 0; i < slen; i++)
if (!check_bit(dataptr[i]))
break;
// start from right of input string
// break until encounter character
// you want to keep
for (; slen > 0; slen--)
if(!check_bit(dataptr[slen - 1]))
break;
```
* `dataptr += i` : re-locate real start point of string.
* [memmove](http://tw.gitbook.net/c_standard_library/c_function_memmove.html)
* `xs *xs_grow(xs *x, size_t len)` : let allocated memory larger.
* If `is_ptr` in `xs` set -> current memory is allocated in heap. Just call `realloc` to acquire larger memory.
* If not, move string from stack to heap(`malloc`)
* `xs *xs_concat(xs *string, const xs *prefix, const xs *suffix)`
* if current size can accommodate string after concat
```cpp
// move `data` to middle
memmove(data + pres, data, size);
// copy prefix to front of string
memcpy(data, pre, pres);
// append sufix to middle
memcpy(data + pres + size, suf, sufs + 1);
string->space_left = 15 - (size + pres + sufs);
```
* if not
```cpp
xs tmps = xs_literal_empty();
xs_grow(&tmps, size + pres + sufs);
char *tmpdata = xs_data(&tmps);
// re-arrange string: pre + data + suf
// same as above procedure
memcpy(tmpdata + pres, data, size);
memcpy(tmpdata, pre, pres);
memcpy(tmpdata + pres + size, suf, sufs + 1);
xs_free(string);
*string = tmps;
string->size = size + pres + sufs;
```
## 提供字串複製 (應考慮到 CoW) 和類似 strtok 功能的函式實作
### xs_cpy
* 若為短字串的複製,就直接複製到 `data`
* 若為超過 16 個字元的字串,則讓 `ptr` 指到同一個地方。
* 利用預留的 `flag1` 來紀錄此字串為共享記憶體,若接著有寫的動作,則需要重新要一塊記憶體寫上對應的字串,並且清除 `flag1`。
* 其他存在的函數可能都要微調,因為要考慮 CoW 議題。
```cpp
xs *xs_cpy(xs *dest, xs *src) {
if (xs_is_ptr(src)){
/* string is on heap */
dest->is_ptr = true;
dest->ptr = src->ptr;
dest->flag1 = true;
dest->size = xs_size(src);
} else {
/* string is on stack */
memcpy(dest->data, src->data , xs_size(src));
dest->is_ptr = false;
dest->space_left = 15 - xs_size(src);
}
return dest;
}
```
```cpp
main() {
xs string = *xs_tmp("foobarbar");
xs prefix = *xs_tmp("(((ffss"), suffix = *xs_tmp(")))");
xs_concat(&string, &prefix, &suffix);
xs string_cpy = *xs_cpy(&xs_literal_empty(),&string);
printf("[%s] : %2zu\n", xs_data(&string_cpy), xs_size(&string_cpy));
}
```
預期輸出
```
[(((ffssfoobarbar)))] : 19
```
額外檢查兩者的字串是否真的為 copy
```cpp
printf("string ptr start at %p,
cpy's ptr start at %p\n",string.ptr,string_cpy.ptr);
```
預期輸出
```
string ptr start at 0x56098bbac260,
cpy's ptr start at 0x56098bbac260
```
值得注意的是在 heap 時,要更新 `dest->size` 大小,不能用 `strlen(src->ptr)`,結果會錯,目前尚未找到原因。
雖然以上版本的 `xs_cpy` 暫時沒有問題了,但是我們尚未解決其他函數在對複製的字串操作時,應該要做什麼措施。
:::warning
TODO: 學習 [folly::fbstring](https://github.com/facebook/folly/blob/master/folly/FBString.h) 的記憶體佈局策略,將 CoW 所需要的 [reference counting](https://en.wikipedia.org/wiki/Reference_counting) 嵌入到結構體成員中,如此空間運用的效率會更高。
:notes: jserv
:::
* 當操作複製的字串,原本那份也會被一起改
```cpp
xs string = *xs_new(&string,"aoaoaoaofoobarbar");
xs string_cpy = *xs_cpy(&xs_literal_empty(),&string);
xs prefix = *xs_tmp("(((ffss"), suffix = *xs_tmp(")))");
xs_concat(&string_cpy, &prefix, &suffix);
printf("[%s] : %2zu\n", xs_data(&string_cpy), xs_size(&string_cpy));
printf("[%s] : %2zu\n", xs_data(&string), xs_size(&string));
```
```
[(((ffssaoaoaoaofoobarbar)))] : 17
[(((ffssaoaoaoaofoobarbar)))] : 17
```
製作 reference count,我選擇在結構體加一個變數 `refcnt`。
```cpp
struct {
char *ptr;
size_t size : 54,
capacity : 6;
int *refcnt;
};
```
因為共享變數,所以要宣告成 pointer。在每次複製時會遞增,需要在 `xc_cpy` 內增加一些內容
```cpp
if (xs_is_ptr(src)) {
...
if (!src->refcnt) {
dest->refcnt = src->refcnt = (int *) malloc(sizeof(int));
*(dest->refcnt) = 1;
} else {
dest->refcnt = src->refcnt;
*(dest->refcnt) += 1;
}
}
```
```cpp
main() {
xs string = *xs_new(&string, "aoaoaoaofoobarbar");
xs string_cpy = *xs_cpy(&xs_literal_empty(),&string);
xs hihi = *xs_cpy(&xs_literal_empty(), &string);
xs qq = *xs_cpy(&xs_literal_empty(), &hihi);
printf("%d\n", *string_cpy.refcnt);
}
```
```
3
```
符合預期,總共有四份,複製三次,所以 `refcnt` 應該為 3 沒錯。
* 接著著手修改 `xs_concat`
* 若 concat 後大小仍小於 `capacity`,代表我們會到原本那塊共享的記憶體。所以我們要額外在 `malloc` 一塊,當作 concat 結果存放的地方。此外,記得修改一些變數。如:`refcnt`
```cpp
if (size + pres + sufs <= capacity) {
if (string->flag1 && string->refcnt && *(string->refcnt) > 0) {
/* create new space */
*(string->refcnt) -= 1;
string->flag1 = false;
data = string->ptr = (char *) malloc(sizeof(char) * capacity);
if (*(string->refcnt) == 0){
free(string->refcnt);
string->refcnt = NULL;
}
}
...
}
```
* 若大於 `capacity`,因為尚未考慮 CoW 版本的程式碼本來就會再要一塊新的記憶體了,所以這裡不用在判斷使否需要額外取得記憶體。但是我們要做的是判斷原本的 `string` 是否為共享。如果是,我們不能 `free` 它,因為別人還在用。
```cpp
else {
...
if (string->flag1 && string->refcnt && *(string->refcnt) > 0) {
*(string->refcnt) -= 1;
if(*(string->refcnt) == 0) {
free(string->refcnt);
string->refcnt = NULL;
}
} else {
xs_free(string);
}
...
````
* 檢查是否正確
```cpp
xs prefix = *xs_tmp("(((ffss"), suffix = *xs_tmp(")))");
xs string = *xs_new(&string,"aoaoaoaofoobarbar");
xs string_cpy = *xs_cpy(&xs_literal_empty(),&string);
xs_concat(&string_cpy, &prefix, &suffix);
printf("[%s] : %2zu\n", xs_data(&string_cpy), xs_size(&string_cpy));
printf("[%s] : %2zu\n", xs_data(&string), xs_size(&string));
printf("%d\n",*string.refcnt);
```
```
[(((ffssaoaoaoaofoobarbar)))] : 27
[aoaoaoaofoobarbar] : 17
0
```
正確,原本的字串沒有被改到。
* 著手修改 `xs_trim`
* 尚未考慮 CoW 遇到的問題
```cpp
xs prefix = *xs_tmp("((((((("), suffix = *xs_tmp("))))))))))))");
xs string = *xs_new(&string,"aoaoaoaofoobarbar");
xs_concat(&string, &prefix, &suffix);
xs string_cpy = *xs_cpy(&xs_literal_empty(),&string);
xs_trim(&string_cpy,"(");
printf("[%s] : %2zu\n", xs_data(&string_cpy), xs_size(&string_cpy));
printf("[%s] : %2zu\n", xs_data(&string), xs_size(&string));
```
```
[aoaoaoaofoobarbar))))))))))))] : 29
[aoaoaoaofoobarbar))))))))))))] : 36
```
很明顯, trim 改到共享記憶體的內容。
* 開始修改:
```cpp
if (x->flag1 && x->refcnt && *(x->refcnt) > 0) {
x->ptr = orig = (char *) malloc(sizeof(char) * strlen(x->ptr));
*(x->refcnt) -= 1;
if(*(x->refcnt) == 0) {
free(x->refcnt);
x->refcnt = NULL;
}
}
memmove(orig, dataptr, slen);
/* do not dirty memory unless it is needed */
if (orig[slen])
orig[slen] = 0;
...
```
```
[aoaoaoaofoobarbar))))))))))))] : 29
[(((((((aoaoaoaofoobarbar))))))))))))] : 36
```
在 trim 中增加判斷,若要裁剪的字串是共享的,則重新再 `malloc` 一塊。
### xs_strtok
```cpp
char *xs_strtok(char *x, const char *delimit)
{
static char *lastToken = NULL; /* UNSAFE SHARED STATE! */
char *tmp;
/* Skip leading delimiters if new string. */
if (x == NULL) {
x = lastToken;
if (x == NULL) /* End of story? */
return NULL;
} else {
x += strspn(x, delimit);
}
/* Find end of segment */
tmp = strpbrk(x, delimit);
if (tmp) {
/* Found another delimiter, split string and save state. */
*tmp = '\0';
lastToken = tmp + 1;
} else {
/* Last segment, remember that. */
lastToken = NULL;
}
return x;
}
main() {
/* Testing xs_strtok */
xs str = *xs_new(&str, "asd:aee:gdw:tfv:ddd");
char *pch = strtok(xs_data(&str), ":");
while (pch != NULL) {
printf ("%s ", pch);
pch = strtok (NULL, ":");
}
printf("\n");
}
```
```
asd aee gdw tfv ddd
```
符合預期!
:::warning
將 `strtok` 換為 `strtok_r`,後者是考慮到 [reentrant](https://en.wikipedia.org/wiki/Reentrancy_(computing)) 的實作,之後整合 reference counting 和 POSIX Thread 後,需要這方面的驗證
:notes: jserv
:::
[完整程式碼](https://github.com/eecheng87/xstring)
## 設計實驗,確認上述字串處理最佳化 (針對空間效率) 帶來的效益,應考慮到 locality of reference
雖然在還沒實驗前就知道 `xs_cpy` 一定會比原本的直接複製還要省空間。但是還是用 valgrind 來看一下到底差多少。來觀察記憶體和 locality of reference 在兩種複製方法下的差距。
額外寫一個效率差的 copy string 方法:
```cpp
void cpy(char **dest, char **src) {
*dest = (char *) malloc(sizeof(char) * strlen(*src) + 1);
strcpy(*dest, *src);
}
```
實驗:假設某情境複製了十萬次,但因為此十萬個變數都還有後續的運算,所以不 `free` 掉。(但是本實驗並沒有做後續的運算,純粹複製而已)
```cpp
/* Experiment */
/* original copy */
printf("This is original copy\n");
int n = 100000;
char* test1 = "boobooboobooboobooboobooboobooboobooboobooboobooboo";
char* test2;
while (n-->0) {
cpy(&test2, &test1);
//free(test2);
}
/* xs copy */
xs xs_test1 = *xs_new(&xs_test1, "boobooboobooboobooboobooboobooboobooboobooboobooboo");
printf("This is xs copy\n");
int n = 100000;
while (n-->0) {
xs xs_test2 = *xs_cpy(&xs_literal_empty(), &xs_test2);
//xs_free(&xs_test2);
}
```
* 記憶體量
用 valgrind 的 massif 來監測 heap 的使用狀況
* `copy` 的狀況
```
--------------------------------------------------------------------------------
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
50 16,807,531 6,854,712 4,950,904 1,903,808 0
51 17,103,631 6,976,536 5,038,888 1,937,648 0
52 17,399,731 7,098,360 5,126,872 1,971,488 0
53 17,652,360 7,201,032 5,201,024 2,000,008 0
```
* `xs_copy` 的狀況
```
--------------------------------------------------------------------------------
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
0 0 0 0 0 0
1 147,863 72 64 8 0
2 148,112 1,104 1,088 16 0
3 15,352,211 1,104 1,088 16 0
```
因為沒有 `free` 任何記憶體 (heap),所以可以直接看 total 的最底。這樣優化的複製大概省了 7000 倍的記憶體量。
* locality of reference
* `copy` 的狀況
```
$ sudo perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./xs
```
```
Performance counter stats for './xs' (5 runs):
18,3685 cache-misses # 78.011 % of all cache refs ( +- 2.58% )
23,5461 cache-references ( +- 2.43% )
3769,3837 instructions # 1.71 insn per cycle ( +- 0.03% )
2201,8832 cycles ( +- 3.98% )
0.007494794 seconds time elapsed ( +- 4.24% )
```
* `xs_copy` 的狀況
```
Performance counter stats for './xs' (5 runs):
2,0995 cache-misses # 43.695 % of all cache refs ( +- 37.65% )
4,8049 cache-references ( +- 9.57% )
1584,7683 instructions # 2.13 insn per cycle ( +- 0.04% )
744,4251 cycles ( +- 12.67% )
0.002707433 seconds time elapsed ( +- 14.53% )
```
連 cache miss 也是 `xs_cpy` 比較好
## 在 Linux 核心原始程式碼中,找出類似手法的 SSO 或 CoW 案例並解說
我在作業系統課程聽到的 CoW 案例是 fork 會避免馬上 exec 掉,所以採用此機制。我接著再 linux 原始碼中找關於 CoW 和 fork 的應用。
找了老半天,終於在 `xfs_inode_fork.c` 底下找的了蛛絲馬跡。若看到名字有 fs 相關的很有可能都和 [btrfs](https://zh.wikipedia.org/wiki/Btrfs) 很有關聯,而這是一種支援寫入時複製(CoW)的檔案系統,原始碼相當龐大,[資料結構](https://kknews.cc/zh-tw/news/roak8gx.html)也十分複雜。
:::warning
從檔案系統實作下手的話,的確太複雜,很容易落得「舉燭」的下場。或許可回頭分析 Linux 核心 fork + exec 這樣的最佳化手法,具體在哪裡實作。當然,程式註解很可能不會直接寫 CoW。
:notes: jserv
:::
首先看到 `xfs_ifork_init_cow`,註解已經講明了
>Initialize an inode's copy-on-write fork.
>
```cpp
void
xfs_ifork_init_cow(
struct xfs_inode *ip)
{
if (ip->i_cowfp)
return;
ip->i_cowfp = kmem_zone_zalloc(xfs_ifork_zone,
KM_NOFS);
ip->i_cowfp->if_flags = XFS_IFEXTENTS;
ip->i_cformat = XFS_DINODE_FMT_EXTENTS;
ip->i_cnextents = 0;
}
```
從初始化來看,作者不希望 `cowfp` 是 NULL,因為在 call 初始化前有先用 `ASSERT` 確認不為 NULL
```cpp=93
ASSERT(ip->i_cowfp == NULL);
xfs_ifork_init_cow(ip);
```
不幸的是,我在這個初始化中被沒有看到 **copy**。這邊預期能夠找到新的節點指向一塊共用的地方,但是反而看到要了一塊新的記憶體 (把 `kmem_zone_zalloc` 往內找會看到 `kmem_cache_size(zone)`)給 `cowfp`。既然找不到 **copy**,乾脆來找 **write** 後要怎麼處理。
首先我找到也是在fs系列中的 `xfs_iomap.c` 中發現 `xfs_ilock_for_iomap`。
這個函數是專門用來作為 lock 用,之所以要 lock 是因為當 COW writes 時,會發生 allocate 或 delalloc,所以我們需要額外 lock,以免發生錯誤。
```cpp=692
relock:
if (flags & IOMAP_NOWAIT) {
if (!xfs_ilock_nowait(ip, mode))
return -EAGAIN;
} else {
xfs_ilock(ip, mode);
}
/*
* The reflink iflag could have changed since the earlier unlocked
* check, so if we got ILOCK_SHARED for a write and but we're now a
* reflink inode we have to switch to ILOCK_EXCL and relock.
*/
if (mode == XFS_ILOCK_SHARED && is_write && xfs_is_cow_inode(ip)) {
xfs_iunlock(ip, mode);
mode = XFS_ILOCK_EXCL;
goto relock;
}
*lockmode = mode;
return 0;
}
```
由 `XFS_ILOCK_SHARED`,`is_write` ,`xfs_is_cow_inode(ip)`,三個參數為依據判斷是否block。
而 allocate 為我預期 CoW 被 write 時會做的事,既然已經找到預防 allocate 時會發生問題的函數,那離找到 **COW 如何處理 Write** 又更進一步。
接著我在 `xfs_direct_write_iomap_begin()` 找到有使用 block function,所以推測應該處理 **Write** 的程式碼應該在附近。
```cpp=756
if (imap_needs_cow(ip, flags, &imap, nimaps)) {
error = -EAGAIN;
if (flags & IOMAP_NOWAIT)
goto out_unlock;
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
&lockmode, flags & IOMAP_DIRECT);
if (error)
goto out_unlock;
if (shared)
goto out_found_cow;
end_fsb = imap.br_startoff + imap.br_blockcount;
length = XFS_FSB_TO_B(mp, end_fsb) - offset;
}
```
在 block 呼叫的附近,我找到一行註解 (762)。既然有可能把鎖丟掉,那可能接下來要做的就是 Write 後的處理,更巧的是接著 function name 竟然和 allocate 有關,所以再繼續往內看。
在 `xfs_reflink.c` 中有 `xfs_reflink_allocate_cow` 的實作。
```cpp=379
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
```
這行就是 CoW 遇到對共享資料(唯讀)做**寫**的動作時,其中一樣會發生的事,同時和我們預期的一樣,會配置一塊新的記憶體,然後再把改的資料寫進去。
先在把前半部:**拿一塊新的記憶體** 看仔細一點。我們可在 `xfs_trans.c` 中找到 `xfs_trans_alloc` 的實作
```cpp=266
tp = kmem_zone_zalloc(xfs_trans_zone, 0);
```
這行就是實際的分配新的記憶體(事實上再往內看實作可以找到要了 `kmem_cache_size(zone)`)
接著我們來找第二部分: **把新資料寫回剛剛拿到的記憶體上**。同樣的,我們再回去 `xfs_reflink_allocate_cow` 找線索
```cpp=427
convert:
xfs_trim_extent(cmap, offset_fsb, count_fsb);
/*
* COW fork extents are supposed to remain unwritten until we're ready
* to initiate a disk write. For direct I/O we are going to write the
* data and need the conversion, but for buffered writes we're done.
*/
if (!convert_now || cmap->br_state == XFS_EXT_NORM)
return 0;
trace_xfs_reflink_convert_cow(ip, cmap);
return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
```
經過 427 行以前的判斷,現在已經可以開始寫了,加上註解的提示,我認為 436, 437 應該就是寫的部分。
Reference:
* [What is size_t](https://stackoverflow.com/questions/2550774/what-is-size-t-in-c)
* [Perf](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial)
* [btrfs](https://lwn.net/Articles/576276/)
* [btrfs](https://www.slideshare.net/TsungenHsiao/introduction-to-btrfs-and-zfs)