# [Faster remainders when the divisor is a constant: beating compilers and libdivide](https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/) ###### tags: `note` 對於處理器來說,每個運算所花的成本是不同的,比如 `add`, `sub` 就低於 `mul`,而這些運算的 cost 又低於 `div` ,觀察下面的圖表 ([source](http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/)),可以看到就 `integer division` 的成本幾乎是 `integer add` 的約 50 倍  所以如何處理 `div` 運算是每個 compiler 所需解決的問題。考慮 $\dfrac{n}{d}$,$d$ 是一個 constant,除式可以表示成 $n * (\dfrac{2^n}{d}) * \dfrac{1}{2^n}$,若 $d$ 是 2 的冪次方的話可以將除法轉換成乘法跟位移,cost 會比 `div` 少很多,如果除數 $d$ 不是 2 的冪次方,則 $\dfrac{2^n}{d}$ 需要做一些特別的處理,在 computer architecture 中,若 `n` 足夠大,其實 $\dfrac{2^n}{d}$ 是可以**大約**預估出來的。 文章中還有介紹一些其他的除法如下: * [Division by invariant integers using multiplication](https://dl.acm.org/doi/10.1145/773473.178249), by Granlund and Montgomery (1994) * [Hacker's Delight](https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685/), by Warren * [N-Bit Unsigned Division via N-Bit Multiply-Add](https://www.computer.org/csdl/proceedings-article/arith/2005/23660131/12OmNyQYt7U), by Robison 這篇文章的重點方法在 [Faster Remainder by Direct Computation](https://www.computer.org/csdl/proceedings-article/arith/2005/23660131/12OmNyQYt7U) 有完整的推導,下面是 fast div 的 C code ```c= uint32_t d = ...;// your divisor > 0 uint64_t c = UINT64_C(0xFFFFFFFFFFFFFFFF) / d + 1; // fastmod computes (n mod d) given precomputed c uint32_t fastmod(uint32_t n ) { uint64_t lowbits = c * n; return ((__uint128_t)lowbits * d) >> 64; } ``` divisibility test ```c= uint64_t c = 1 + UINT64_C(0xffffffffffffffff) / d; // given precomputed c, checks whether n % d == 0 bool is_divisible(uint32_t n) { return n * c <= c - 1; } ``` fast mod 還可以運用在著名的 [fizzbuzz](https://en.wikipedia.org/wiki/Fizz_buzz) 上,畢竟 [fizzbuzz](https://en.wikipedia.org/wiki/Fizz_buzz) 的核心就是在探討 division 一般版本 ```c= for (uint32_t i = 0; i < N; i++) { if ((i % 3) == 0) count3 += 1; if ((i % 5) == 0) count5 += 1; } ``` fast div 版本 ```c= uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; for (uint32_t i = 0; i < N; i++) { if (is_divisible(i, M3)) count3 += 1; if (is_divisible(i, M5)) count5 += 1; } ``` 比較結果  重現實驗結果 :::success TODO :::
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up