hankluo6
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# 2020q3 Homework (quiz3) contributed by < `hankluo6` > ### `測驗 1` ```c int asr_i(signed int m, unsigned int n) { const int logical = (((int) -1) OP1) > 0; unsigned int fixu = -(logical & (OP2)); int fix = *(int *) &fixu; return (m >> n) | (fix ^ (fix >> n)); } ``` #### `OP1` - [x] `(b)` `>> 1` #### `OP2` - [x] `(c)` `m < 0` 在有號型態上進行 ASR 的結果只有兩種可能,一種為在 MSB 補 0,另一種為在 MSB 補 1。因 `OP1` 使用到 `(int)-1` 且後方檢驗是否為正數,可以推測出 `OP1` 用來檢驗當前編譯器是選擇哪種 implementation,所以答案很明顯為 `(b) >> 1`。 已知 `logical` 為 0 或 1,接著就是實作兩種情況都有正確答案的程式碼。因 `m > 0` 時不受影響,可以分成以下 2 種: * `m < 0` 且 ASR 為 `MSB = 0`,logical shift * `m < 0` 且 ASR 為 `MSB = 1`,arithemetic shift 我們可以知道第二種情況為正確行為,所以只要讓第一種情況發生時,shift 的部分改為 1 就行了。深入程式碼可發現 `fix` 只有 0 跟 -1 兩種可能,接著當 `fix == -1` 時,`return` 當中 `(fix ^ (fix >> n)` 能讓 shift 產生的位元為 0 或 1 (如果 ASR 的 `MSB = 0` 則產生 1,反之則產生 0),接著使用 OR 就能把原本該為 1 卻為 0 的部分改為 1。因為我們只要針對 ASR 為 logical shift 時的行為,所以只考慮 `(c) m < 0`。 #### 數學證明 當 ASR 為 logical shift 時,可以當成是在 unsigned 上做 arithmetic shift,以 $-5 >> 2$ 為例子,且假設為 4 bit integer,則可以寫成: $$ \begin{align} -5(1011)_{2's} >> 2 &= (-5(1011)_{2's}+2^4-2^4) >> 2 \\ &= (11(1011)_{unsigned}-2^4) >> 2 \\ &= 2(0010)_{uinsigned} - 2^2 \\ &= 2(0010)_{uinsigned} - 2^4 + 2^3 + 2^2 \\ &= -11(0010)_{2's} + 2^3 + 2^2\\ &= -2(1110)_{2's} \end{align} $$ :::info 這邊使用 $2^4$ 來當作 2's complement 與 unsigned 之間的轉換,可以只當成形式上的代換,也可以當成數值上的相加減,詳細二補數規則可參考 [解讀計算機編碼](https://hackmd.io/@sysprog/binary-representation#%E7%AC%AC%E4%B8%80%E9%83%A8%E5%88%86%EF%BC%9A%E4%BA%8C%E8%A3%9C%E6%95%B8%E7%B7%A8%E7%A2%BC) ::: 公式內的 $2^4$ 因我們假設在 unsigned 上做 ASR,所以程式內不需特別寫出,而後方的 $-2^4$ 則是我們在 return 時,將 MSB (或所有 shift 新增的位元) 都設為 1 的這個行為,這個 $-2^4$ 會因為 shift 的關係而減少,最後為了要轉換為 2's complement 而再補齊前面幾個 bit。 #### 延伸問題 * 練習實作其他資料寬度的 ASR,可參照 [C 語言:前置處理器應用篇 撰寫通用的巨集以強化程式碼的共用程度](https://hackmd.io/@sysprog/c-preprocessor); ```c #define asr(m, n) \ _Generic((m), \ int8_t: asr_8, \ int16_t: asr_16, \ int32_t: asr_32, \ int64_t: asr_64 \ )(m, n) #define asr_type(type) \ const type logical = (((type) -1) >> 1) > 0; \ u##type fixu = -(logical & (m < 0)); \ type fix = *(type *) &fixu; \ return (m >> n) | (fix ^ (fix >> n)) int_8_t asr_8(int_8_t m, unsigned int n) { asr_type(int8_t); } int16_t asr_16(int16_t m, unsigned int n) { asr_type(int16_t); } int32_t asr_32(int32_t m, unsigned int n) { asr_type(int32_t); } int64_t asr_64(int64_t m, unsigned int n) { asr_type(int64_t); } ``` 使用 `_Generic` 來實現類似 C++ 中 Template 的效果。 ### `測驗 2` ```c bool isPowerOfFour(int num) { return num > 0 && (num & (num - 1))==0 && !(__builtin_ctz(num) OPQ); } ``` #### `OPQ` - [x] `(f)` `& 0x1` power of 4 的條件為總共只有一個 bit 為 1,且該 bit 的位置需為 2, 4, 6, 8...(從 0 開始)才為 4 的倍數。`num & (num - 1)` 可以判斷是否只有一個 bit 為 1。 所以 `__builtin_ctz(num)` 必須判斷 bit 的位置,從上述可知 `__builtin_ctz(num)` 的值要為偶數,所以`OPQ` 為 `(f)`。 #### 延伸問題 * 改寫上述程式碼,提供等價功能的不同實作,儘量降低 branch 的數量; `__builtin_ctz(num) & 0x1` 計算從 LSB 數來是否為偶數個 0,可以改用 bitmask 來檢測,因 `num` 只有一個 bit 為 1,所以 `0x55555555 & num` 能夠取得該 bit 是否在偶數位置,且此 bitmask 也能順便考慮 `num == 0` 的情況。 ```c bool isPowerOfFour(int num) { return (0x55555555 & num) && (num & (num - 1)) == 0; } ``` * 練習 LeetCode [1009. Complement of Base 10 Integer](https://leetcode.com/problems/complement-of-base-10-integer/) 和 [41. First Missing Positive](https://leetcode.com/problems/first-missing-positive/),應善用 clz; 1. LeetCode 1009. Complement of Base 10 Integer 有兩種想法,一種為使用 Not 運算再用 bitmask 將最高位元以前的 1 都 clear;另一種為對 N 的 bit 與 bitmask 做 XOR,且最高位元以前的 bitmask 皆為 0,以後皆為 1。這邊實作第二種方式: ```c int bitwiseComplement(int N) { return N ? ((1 << (32 - __builtin_clz(N))) - 1) ^ N : 1; } ``` 使用 `__builtin_clz` 能讓程式碼更簡潔 (沒有 loop)。 2. LeetCode 41. First Missing Positive 因 missing positive integer 不可能超過陣列長度,且題目要求使用 constant extra space,所以我們可以利用陣列來紀錄資料。 ```c int firstMissingPositive(int* nums, int numsSize) { for (int i = 0; i < numsSize; ++i) { if (nums[i] < 0 || nums[i] > numsSize) nums[i] = 0; } for (int i = 0; i < numsSize; ++i) { int idx = (nums[i] & 0x7fffffff) - 1; if (idx < 0 || idx >= numsSize) continue; nums[idx] |= 0x80000000; } for (int i = 0; i < numsSize; ++i) { if (!nums[i] || __builtin_clz(nums[i])) return i + 1; } return numsSize == 0 ? 1 : numsSize + 1; } ``` 把遇到過的數字應該放置的位置上的值與 `0x80000000` 做 OR 運算來當作標記,這樣不僅不會影響到原本的數字(假設都為正數),也可以紀錄哪些數字出現過。最後在看哪個數字沒有標記就是答案。 * 研讀 2017 年修課學生報告,理解 clz 的實作方式,並舉出 Exponential Golomb coding 的案例說明,需要有對應的 C 原始程式碼和測試報告; 在 [H.264](https://en.wikipedia.org/wiki/Advanced_Video_Coding) 中用到了 Exponential Golomb coding 這種編碼方式,其中內部又能在細分為 * ue(v) * me(v) * se(v) * te(v) 在 H.264 的 document 中有這段描述 >The parsing process for these syntax elements begins with reading the bits starting at the current location in the bitstream up to and including the first non-zero bit, and counting the number of leading bits that are equal to 0. 此部份就很適合使用 `clz` 來計算,翻閱 H.264 的 opensource,在 [openh264/codec/encoder/core/inc/svc_enc_golomb.h](https://github.com/cisco/openh264/blob/master/codec/encoder/core/inc/svc_enc_golomb.h) 中的 `BsSizeUe()` 是用來取得 exponent golomb code,代碼如下: ```cpp static inline uint32_t BsSizeUE (const uint32_t kiValue) { if (256 > kiValue) { return g_kuiGolombUELength[kiValue]; } else { uint32_t n = 0; uint32_t iTmpValue = kiValue + 1; if (iTmpValue & 0xffff0000) { iTmpValue >>= 16; n += 16; } if (iTmpValue & 0xff00) { iTmpValue >>= 8; n += 8; } //n += (g_kuiGolombUELength[iTmpValue] >> 1); n += (g_kuiGolombUELength[iTmpValue - 1] >> 1); return ((n << 1) + 1); } } ``` 可以看到 2 個 if 的作用就是在算 leading zero。 ### `測驗 3` ```c int numberOfSteps (int num) { return num ? AAA + 31 - BBB : 0; } ``` #### `AAA` - [x] `(a)` `__builtin_popcount(num)` #### `BBB` - [x] `(b)` `__builtin_clz(num)` 從 bits 的角度看此題,會發現偶數除 2、奇數減 1 這兩個步驟可以看成當 LSB 為 0 則 right shift one bit、當 LSB 為 1 則減 1。所以我們只需要算有幾個 bit 為 1 (代表減 1 的次數),以及從 MSB 數來第一個 bit 為 1 的位置 (代表除 2 的次數),故答案為 `(a)` 與 `(b)`。 #### 延伸問題 * 改寫上述程式碼,提供等價功能的不同實作並解說; * 避免用到編譯器的擴充功能,只用 C99 特徵及 bitwise 運算改寫 LeetCode 1342. Number of Steps to Reduce a Number to Zero,實作出 branchless 解法,儘量縮減程式碼行數和分支數量; ```c int numberOfSteps (int num) { if (!num) return 0; int i = 0, bitmask = 0x80000000; unsigned tmp, c; tmp = num - ((num >> 1) & 0x55555555); // reuse input as temporary tmp = (tmp & 0x33333333) + ((tmp >> 2) & 0x33333333); // temp c = ((tmp + (tmp >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count while (!(num & bitmask)) { ++i; bitmask >>= 1; } return c + 31 - i; } ``` 參考 [Bit Twiddling Hacks](http://graphics.stanford.edu/~seander/bithacks.html) 裡的 [Counting bits set, in parallel](http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) 章節,實作 `__builtin_popcount` function,並使用迴圈來計算 leading zero。 :::info 不知道有沒有 brenchless 的 `__builtin_clz` 實作。 ::: 這邊很意外的運行時間比原本程式還好 (0ms vs 4ms),試著找出原因,翻閱 [`glibc/stdlib/longlong.h`](https://code.woboq.org/userspace/glibc/stdlib/longlong.h.html) 找出 `__builtin_clz` 原始碼,發現是直接使用組語實作: ```c do { SItype c_; __asm__ ("norm.f\t%0,%1\n\tmov.mi\t%0,-1" : "=r" (c_) : "r" (x) : "cc"); (count) = c_ + 1; } while (0) ``` 所以效能應該不會比我的 `while` 迴圈版本慢,查看 `__builtin_popcount` 發現有很多版本實作,推測是因為 leetcode 中 c compiler 的 `__builtin_popcount` 實作較慢。 :::info TODO: 用適合的實驗證明此推測 ::: ### `測驗 4` ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (XXX); return YYY; } ``` #### `XXX` - [x] `(b)` `v` #### `YYY` - [x] `(e)` `u << shift` 需先知道輾轉相除法有個簡單的形式: $$ \begin{align} gcd(a,b) = gcd(a-b, b) \quad \text{if a > b} \\ gcd(a,b) = gcd(a, b-a) \quad \text{if a < b} \end{align} $$ 寫成 pseudocode: ```p function gcd(a, b) if a = 0 return b while b ≠ 0 if a > b a ← a − b else b ← b − a return a ``` 此測驗中程式內 `do..while` 裡的 `if else` 其實跟這邊的 `if elst` 做的事情一樣,只是測驗中的程式多做了一次 swap 把 `u` 跟 `v` 對調。 另外程式還用了 gcd 的另一種特性:$gcd(a,b) = gcd(a/2, b) \quad \text{if b is odd}$,所以只要一方為奇數,就能將另一邊 shift 來減少運算量。 :::info 此程式用到的 gcd 相關技巧可參考 [wikipedia](https://en.wikipedia.org/wiki/Binary_GCD_algorithm): * gcd(0, v) = v, because everything divides zero, and v is the largest number that divides v. Similarly, gcd(u, 0) = u. * gcd(2u, 2v) = 2·gcd(u, v) * gcd(2u, v) = gcd(u, v), if v is odd (2 is not a common divisor). Similarly, gcd(u, 2v) = gcd(u, v) if u is odd. * gcd(u, v) = gcd(|u − v|, min(u, v)), if u and v are both odd. ::: 我們讓 `u` 保持為奇數,並且維持 `u > v`,這樣當我們 `v == 0` 時,`u` 就自然為 gcd,故 `OOO` 為 `v`;根據上面第二點提到,我們把 2 的倍數提出來,gcd 應藥補回提出的 2 的倍數,所以 `YYY` 為 `u << shift`。 #### 延伸問題 * 在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升; ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; int a = __builtin_ctzl(u), b = __builtin_ctzl(v); shift = b ^ ((a ^ b) & -(a < b)); u >>= shift, v >>= shift; int bit; if (bit = __builtin_ctzl(u)) u >>= bit;; do { if (bit = __builtin_ctzl(v)) v >>= bit; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); return u << shift; } ``` 效能分析: 隨機產生 64 bit 的資料,並測量 1000000 次的運行時間 | gcd64 | __builtin_ctz gcd64 | |:----------:|:-------------------:| | 0.667711 s | 0.425873 s | 可發現效能有有效的提昇,約提昇 $\frac{0.667711-0.425873}{1000000} = 2.4 * 10^{-7}$ 秒。 ### `測驗 5` ```c #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = KKK; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` #### `KKK ` - [x] `(e)` `bitset & -bitset` 與參考程式比對,可知用 bit 的版本透過 bit 操作省略了 LSB 為 0 的情形,我們只操作從 LSB 數來第一個為 1 的 bit,而操作完必須 clear 該 bit,所以答案為 `(e)`。 #### 延伸問題 * 思考進一步的改進空間; 程式碼中 `k * 64` 可以改成 bit shift 運算,且清除 LSB 的 bit 可利用 `__builtin_ctzll()` 的結果來達成。改進後的程式碼如下: ```c size_t my_improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t r = __builtin_ctzll(bitset); out[pos++] = k << 6 + r; bitset ^= 1 << r; } } return pos; } ``` * 設計實驗,檢驗 clz 改寫的程式碼相較原本的實作有多少改進?應考慮到不同的 [bitmap density](http://www.vki.com/2013/Support/docs/vgltools-3.html); 這邊我測試 0 ~ 64 個 bit 數量為 1 的情形,將每個為 1 的 bit 隨機分佈在 `bitset` 當中,在 butmapsize 為 128 上測量運行 10000 次的時間,結果如下: ![](https://i.imgur.com/hMsR3zg.png) * naive 的版本為線性時間成長,這是因為此版本一定會跑迴圈 64 次,所以效能影響全部由 bit 數量決定。 * improve 的版本在初期成長緩慢,之後卻急遽遞增,這是因為 bit 的數量越高,`while` 迴圈的次數就越多,而迴圈內部大量的運算就會降低程式效能。 * 改進後的版本,因 `while` 迴圈內部運算成本減少,可以發現在 bit 數高的情況下也能有效的壓低運行時間。 ###### tags: `linux2020`

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully