2020q3 Homework2 (quiz2)

# 2020q3 Homework2 (quiz2) contributed by < `JulianATA` > > [題目](https://hackmd.io/@sysprog/2020-quiz2) ## Bitwise Operations 在處理數學的問題時，會有一些常見的梗。例如： $Given\ x^5=512, find\ x$ 。此時，就會以 $log$ 作為降維的手法。在電腦科學中，則常常看到以 Bitwise operations 作為替換一般運算的手法。 Bitwise Operations 有非常多好處例如： * 速度相對快，對於 bitwise operation 的輸出每個 bit 只需要參照其他兩個 bit 。 * 基礎通用，在任何的電腦都可以使用。很多的高階技巧是無法在不同的架構之間通用的。 * 簡單易懂， bitwise operation 理解上都相當簡單，難點在於： * 實際上，並不清楚哪裡可以用 bitwise operation 來代替。 * 雖然，每個 bitwise opeartion 都相當簡單，但其背後的性質非常重要。 * 就像 $log$ 的概念很好懂，但是 $log$ 可以用來降維度這件事，卻並非那麼直觀。這篇文件將著重於： * 題目中的 bitwise operations * 解析額外的 bitwise operations * 思考 bitwise operation 背後的性質。 * 思考以下各種優化的程式碼，是如何設計的。 * 透過參考資料，了解各項優化場域考慮的細節。 ### Check if an Integer number is odd ```clike bool is_odd(int n){ return (bool) n%2; } ``` 在辨別一個 integer 是否為一奇數，直觀上可以用 modulus 取餘數來辨別。 ```clike bool fast_is_odd(int n){ return (bool) n & 0x1; } ``` 我們也可以透過 bitwise operation 中的 and 來實做。 ```bash Testing is_odd with 10000000 repetitions START TIME RECORDED END TIME RECORDED TIME ELAPSED: 0.047385513 Testing fast_is_odd with 10000000 repetitions START TIME RECORDED END TIME RECORDED TIME ELAPSED: 0.017787593 ``` 同樣辨識 10000000 次，使用 modulus 的方法要比 bitwise 慢上兩倍以上。這份程式碼有兩個重要的角色： * `0x1`: 一個 bitmask 的角色。 * `&`: 配合一個 bitmask 可以得出一個/一組選定的 bit 的狀態。設計的思考路徑如下: * 奇偶數在 2 進位表示法當中，差異只在於第一個 bit 。 * 因此只使用 `&` 與 `0x1` 來確認。 ### Check if a Character is ASCII > 目前電腦中用得最廣泛的字元集及其編碼，是美國國家標準局 (ANSI) 制定的 ASCII 碼，被國際標準化組織 (ISO) 採納為 ISO 646 標準。7 位碼 ASCII 是以 7 位元二進位數字進行編碼，可表示 128 個字元，第 128～255 號為擴展字元 (extended ASCII)。如果我們明確界定 7 位元編碼的字元隸屬於 ASCII，可透過以下函式來判斷指定的記憶體範圍內是否全是有效的 ASCII 字元: > [name=參考題目] ```clike bool is_ascii(const char str[], size_t size) { if (size == 0) return false; for (int i = 0; i < size; i++) if (str[i] & 0x80) return false; return true; } ``` 這份程式碼有以下角色： |Hex|用途|Binary| |---|----|------| |0x80|bitmask|1000 0000| * `&`: 配合一個 bitmask 可以得出一個/一組選定的 bit 的狀態。設計的思考路徑如下: * EASCII 是 128～255 號字元，因此所有的 EASCII 的第 8 位元皆為 1 。 * 因此，確定輸入的 `char` 的第八位元為 0 則可確認其為 ASCII 。 ```clike bool fast_is_ascii(const char str[], size_t size) { if (size == 0) return false; int i = 0; while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); if (payload & 0x8080808080808080) return false; i += 8; } while (i < size) { if (str[i] & 0x80) return false; i++; } return true; } ``` > 逐一字元比對的效率低，在 64 位元的架構下，我們嘗試一次比對 64 位元的資料 (即 word size)。 > [name=參考題目] ### Unaligned Memory Access 延伸優化細節： * 參考：[Unaligned Memory Accesses](https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt) Natural alignment: * 當存取/修改 N bytes data * 記憶體位置應當被 N 整除 Unaligned access 的定義為： * 違反 Natural alignment * 當存取/修改 N bytes data * 記憶體位置無法被 N 整除文件中提到，有要求 Natural alignment 的架構其實並非多數，但為了達到優秀的 `portability` ，直接考慮 Natural alignment 是最好的方法。為什麼 Unaligned access (以下稱 UA) 不好呢？ * 有些架構是可以接受 UA 。 * 但多數伴隨著較大的開銷 * 有些架構的 Processors 遇到 UA 會產生例外處理。 * 其中有可以修正 UA 的，也有無法修正 UA 的。 * 有些架構的 Processors 遇到 UA 會產生例外處理。 * 其中有可以修正 UA 的，也有無法修正 UA 的。 * 有些架構明明無法接受 UA ，卻不會有錯誤。 * 而這樣的行為，容易導致 Bug 。因此，為了保證程式碼在任何平台上都能正確的執行，考慮 memory alignment 是一件基本且重要的事情。考慮以下 `Structure` : * 假設記憶體開始位置 `0x10000` ```clike struct foo1 { u16 field1; u32 field2; u8 field3; }; ``` 直觀上來說： |Field|bytes|address| |-----|-----|-------| |field1|2|0x10000| |field2|4|0x10002| |field3|1|0x10006| * $\frac{65538}{4}=16384.5$ * field2 就會發生 UA 。但是 compiler 是不會讓這種事情發生的，它會自動安插一個 2 bytes 的 `padding` 。 |Field|bytes|address| |-----|-----|-------| |field1|2|0x10000| |padding|2|0x10002| |field2|4|0x10004| |field3|1|0x10008| * Access 1 byte data will never cause UA * GCC-specific attribute `__attribute__((packed))` * 這個 GCC-specific attribute 可以讓 compiler 不去塞入 padding 。 * 可以需要固定的記憶體排列時使用。可以透過重排順序來降低 Resident memory ```clike struct foo2 { u32 field2;// 0x10000 u16 field1; u8 field3; }; ``` |Field|bytes|address| |-----|-----|-------| |field2|4|0x10000| |field1|2|0x10004| |field3|1|0x10006| 主要發生 UA 的情況有下列兩者： * Casting * 也是上面的程式碼以 `memcpy` 規避的問題。 * Pointer arithemetic * 當這個指標取大於 2 bytes 的資料時，就應該考慮 UA 的問題。如何避免 UA ： * 在 Linux kernel 中的 `<asm/unaligned.h>` * 有提供 `put_unaligned` 以及 `get_unaligned` 兩個 macro * 但這仍然是一種開銷較大的方法。 * 另外一個方法就是使用 memcpy() 來避免 UA 。 * 當 source 或 destination 其一是 `u8*` 或 `char*` 等型態，因為皆為 1 byte 資料型態，就能成功避免。 ## Hexadecimal to Value ```clike uint8_t hexchar2val(char in){ const uint8_t letter = (uint8_t) in & 0x40; const uint8_t shift = (letter >> 3) | (letter >> 6); return (in + shift) & 0xf; } ``` ## Fast Division and Modulus ## Find Single Number ###### tags: `Cprogramming`

Read more

簡介 NUMA 架構

Hierachical spinlock embedding problem in Linux kernel

2020q3 Homework3 (dict)

Untitled