Design of a High Performance Branch Predictor Based on Global History Considering Hardware Cost

Abstract

分支預測(Branch prediction)是提高效能的關鍵技術
- CPU 執行時，分支佔所有指令的 20~30%
基於 global history 的動態分支預測結構簡單，占用的內存少，但有 index aliasing 的問題
本文對分支預測改進，將預測生成的 index 和指令 PC 位址最後 8-bits 做 XOR，緩解 index aliasing 的問題
使用 simplescalar 3.0 模擬
實驗表明，GHR 為 12 時，改進的分支預測器效果最好，平均比原本的高出 1.93%

Introduction

以一般 MIPS 架構的 five-level pipeline 為例，如果沒有分支預測，在遇到分支指令時可能導致 pipeline 中所有指令都被清除
分支預測分為靜態和動態
- 靜態的通常預測率較低
- 基於全局歷史寄存器（GHR）的動態分支預測器使用移位的分支歷史寄存器記錄第一級最近 n 條指令的執行情況，並使用具有四個狀態值的模式歷史表（PHT）來預測分支的跳轉方向二級指令
- 因為 GHR 寬度有限，不同指令可能有相同的 GHR，導致預測率下降
- 本文目標是改進 Gselect 類型的分支預測器

Shortages of Several Classical Dynamic Branch Predictors

Branch Predictor Based on 1-bit Saturation Counter

用 1-bit 來紀錄上次有沒有跳轉
如果一直預測成功就永遠不會改變狀態
如果預測結果頻繁改變效果就會很差，實務上沒有過這種作法

Branch Predictor Based on 2-bit Saturation Counter

用 2-bits 紀錄前兩次有沒有跳轉
變成四種狀態的狀態機，概念和上面的基本一樣

One-Level Branch Predictor

用一部分的 PC 值當作 index 去找 PHT
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
第一個問題是不同 PC 會有同樣的 index
第二個問題是某些狀況之下預測率會很低

Two-level branch predictor

有三個 PHT
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
用 Choice-PHT 決定要用哪張 PHT
只改變 Choice-PHT
硬體成本高
看不太懂= =

Hybird and Neural Network Predictor

結合多個預測器
延遲高，需要時間訓練
硬體成本也高

New Scheme Proposal

Gselect 分支預測器使用 PC 高位和 GHR 低位當作 PHT 的 index，而沒使用 PC 低位
因為沒辦法準確區分沒使用到的 PC，導致 Gselect 預測率下降
為了充分使用 PC，本文將 Gselect 分支預測器生成的 index 和 PC 低位 8-bits XOR
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
假設 PC 和 GHR 為 16-bits，Gselect 將 PC 的高 8 位和 GHR 的低 8 位合併得到 index；然後我們再將 PC 的低 8 位和結果做 XOR
- 只影響結果的低 8 位
一個 Gselect 會重複 index 但 XOR 不會的例子

Index Name	Value	Gselect	XOR-Gselect
PC1	1010001111001111	1010001110101010	1010001101100101
GHR1	0011110010101010
PC2	1010001100000001	1010001110101010	1010001110101011
GHR2	1010111110101010

無法證明 XOR 有更好的表現，因此使用 simplescalar 執行一堆指令來判斷效能

Experimental Environment and Results

Experimental Environment

使用 SimpleScalar 3.0 sim-bpred 模擬，並使用 Gselect, Bimode 比較

Experimental Results

GHR 的長度會影響預測的成功率，因此這會是我們需要研究的因素之一
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
- 長度為 12 的時候飽和
PC 低位使用的數量也會影響結果(for XOR)
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
根據上面的結果，GHR 長度使用 12，並和 Bimode, Gselect 比較的結果
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
- 平均比 Gselect 好 1.93%，但比 Bimode 差
- Bimode 的硬體成本高，因此輸了很正常

Conclusion

使用 XOR-Gselect 改善 index aliasing 的問題
實驗表明 GHR 長度為 12 時效果最好，並且 XOR 能改善 Gselect 的預測率

tags: `paper`