New EBB Strategy

考量到 JIT 中 codegen 以及 compilation 的 overhead 過高,我們需要讓一個 EBB 含有更多的指令,理想是希望將一整個 loop body 涵蓋在一個 EBB 中。

我們參考 rvemurvemu 先將 block 以 unconditional jump 作切分,接著將其 block 中所有 branch taken 會走到的 block 都涵蓋進來,一個 block 含有大量的指令且有好的效能表現。 我們的新策略也是希望一個 block 中含有大量的指令,所以我們 trace 整個 control flow graph 並把 trace 過的 block 都涵蓋進來。 control flow graph 的終點有兩種情形,一種是 unconditional jump,而另一個則是遇到 back edge,也就是指向已經 trace 過的節點,下圖展示一個簡單的 EBB 案例。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

過去的策略我們是直接指向某個 block 的 ir,一旦 block 被 cache 置換掉,EBB 就會變小。過去的另一個策略是用 memcpy,然而頻繁的 memcpy 成本過高,導致效能沒有很好的提昇。於是我們選用成本較低的 instruction fetch & decode,將 trace 到的 block 使用 instruction fetch & decode 存到 EBB 中,這個作法也不用擔心其他 block 被 cache 置換掉使 EBB 變小的問題。

另外,過去 EBB 有兩種版本,一種是 for interpter 而另一種是 for JIT 的,這導致作 JIT 時還要重新 extend 一次,現在則統一一種版本。

original EBB vs new EBB 指令數量

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

interpreter only (original EBB vs new EBB)

Test original EBB new EBB Speedup
CoreMark 1158.994 (Iterations/Sec) 1129.396(Iterations/Sec) -2.6%
dhrystone 1020 DMIPS 1087 DMIPS +6.6%
nqueens 8496.71 msec 8237.29 +3.1%
mandelbrot 29.85 msec 24.41 msec +22.3%

MIR JIT (rvemu EBB vs new EBB)

由於 original EBB 效能太差所以改用 rvemu EBB 來比較

Test rvemu EBB new EBB Speedup
CoreMark 2194.907 (Iterations/Sec) 2385.122(Iterations/Sec) +8.6%
dhrystone 2522 DMIPS 2774 DMIPS +9.1%
nqueens 3947.27 msec 3827.63 +3.1%
mandelbrot 102.61 msec 93.65 msec +9.6%