Try   HackMD

Prefetch 論文閱讀

contributed by<zhanyangch>

tags:zhanyangch sysprog2017 week3

論文閱讀

When Prefetching Works, When It Doesn’t, and Why

  • hardware/software prefetching 的合作與衝突
  1. INTRODUCTION
    • prefetching : tolerating cache miss latency exist
    • compiler 只能自動做到簡單的 prefetching,需插入 prefetching intrinsics,但很少嚴謹的規則,以及缺少對 software and hardware prefetching 的複雜組合的瞭解
    • intrinsics:看起來像函數,會被 compiler 直接替換成組語
    • 圖:sw software prefetching,GHB STR:hardware,比較有 sw prefetching 跟只有hw prefetching 的 speed up,以5% 劃分 positive netual negative
    • software prefetching targets short array streams, irregular memory address patterns, and L1 cache miss reduction, there is an overall positive impact with code examples
    • software prefetching can interfere with the training of the hardware prefetcher, resulting in strong negative effects, especially when using software prefetch instructions for a part of streams.
    • stream prefetcher(STR):prefectch 連續的資料
      stride prefetcher(GHB):prefectch 距離 stride 的指令或資料,stride 由先前的紀錄得知
  2. BACKGROUND ON SOFTWARE AND HARDWARE PREFETCHING
    • 不同的 data structures 的 access patterns 影響 prefectch 的方式
    • Recursive Data Structures (RDS)
    • x86 SSE SIMD extensions 的 instrinsic 會被轉換成 2道指令(direct addr)或4道指令(indirect addr)
    • Prefetch Classification:跟時間有關 Timely Late Early,重複 Redundant_dc Redundant_mshr,錯誤 Incorrect
    • Software Prefetch Distance
      D:prefetch distance
      l:prefetch latency
      s:length of the shortest path through the loop body
      D 必須大於 memory latency,但太大會造成 cache 內的資料被提早逐出,導致 cache miss 提高
      Dls.
    • direct memory indexing 易用 hardware prefetch
      indrect memory indexing,需特殊的硬體,易用 software prefetch
  3. POSITIVE AND NEGATIVE IMPACTS OF SOFTWARE PREFETCHING
  • software pefecting 的優點
    • hardware 的資源有限,Stream 數目多會有困難
    • stream detectors and book-keeping mechanisms
    • hardware prefetcher 需要訓練,如果 stream 長度太短 cache misses 不足以訓練
    • Hardware prefetchers 主要放 lower level cache,降低 L1 cache pollution
    • 在 software 中迴圈邊界可以被簡單的計算並且可藉由loop unrolling, software pipelining, and using branch instructions 避免 prefetch requests out of array bounds
  • software pefecting 的缺點
    • 增加指令數
    • prefetch 那些 data 事先決定,無法依 runtime 的情形決定
    • 對於指令少的迴圈很難插入 prefetching instruction,需要 loop splitting 的技巧
  • software + hardware 優點
    • Handling Multiple Streams
    • Positive Training
  • software + hardware 缺點
    • Negative Training : Software較慢或 hide streams
    • Harmful Software Prefetching : stress on cache, memory bandwidth
  1. EXPERIMENTAL METHODOLOGY
    Distance=K×L×IPCbenchWloop.
    • where K is a constant factor, L is an average memory latency, IPCbench is the profiledaverage IPC of each benchmark, and W is the average instruction count in one loop iteration
    • MacSim, a trace-driven and cycle-level simulator
  2. EVALUATIONS: BASIC OBSERVATIONS ABOUT PREFETCHING
    • Instruction Overhead:prefetch instructions + indirect memory accesses,Overhead 高不等於效能差
    • Software Prefetching Overhead:cache pollution, bandwidth consumption, memory access latency,redundant prefetch overhead,instruction overhead:除去各別變因,觀察對效能的影響
      • small:cache pollution,bandwidth
      • larger:memory latency
      • negligible:redundant prefetch instructions
      • not high:instruction overhead
    • optimal distance might vary by machine configurations
    • 利用軟體訓練硬體的成效有限:因軟體會放 L1,而硬體在 L2,且 hardware prefetch requests can be too aggressive
    • profile-guided optimization
    • ASD hardware prefetcher:Short Streams
    • content directed prefetching :irregular data structures
    1. 個案探討

參考

cmu Computer Architecture ppt