Try   HackMD

When Prefetching Works, When It Doesn’t, and Why 心得

constributed by<henry0929016816>

1. INTRODUCTION

寫這篇論文的目的是因為

  • 第1點:沒有太多的說明,如何最好的插入 prefetch intrinsics
  • 第2點:沒有很好的了解到 hardware prefetching 跟 software prefetching 複雜的相互作用

2:2

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

henry0929016816x 軸的 2006 cpu benchmark 是指各種的效能分析工具嗎?

底下 x 軸為各種測效能的程式 SPEC 2006 CPU

2:3

發現到 soft prefetching 的目標為

  • short array streams
  • irregular memory address patterns
  • L1 cache miss reduction
    有正向的影響,然而由於 software prefetch 會訓練 hardware prefetch ,所以再一部份的情況是不好的影響
Hence, our hardware prefetching schemes are limited to
stream and stride prefetche
rs, since as far as we know these are the only two imple-
mented commercially today. 

henry0929016816不太懂 implemented commercially 是神麼意思

2 BACKGROUND ON SOFTWARE AND HARDWARE PREFETCHING

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 名詞
    • stream:單獨的跨越一個 cache line 去存取
    • stride 跨越兩個以上 cache line 去存取

array and some RDS data struc-
ture accesses can be easily predicted, and thereby prefetched, by softwar
table I 指出 array 跟一些 RDS (recursive data structure) 資料結構能被很好的預測(下一筆資料是什麼,所以可以被 software prefetched

彥享林圖表裡有個 a[i+d] 由於 array 的資料是連續的,所以我們可以將它的下好幾筆資料先 prefetch(software prefetch) 進來

  • hardware-baselined prefetch mechanism 有
    • stream
    • GHB
    • contend-based

henry0929016816不太懂 GHB 跟 contend-based

2.1 Software Prefetch Intrinsics

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 名詞
    • temporal : Data will be used again soon
    • Non-temporal : Data which is referenced once and not reused in the immediate future (for example, for some multimedia data types, as the vertex buffer in a 3D graphics application).

henry0929016816 temporal 為用完還要再用的資料,所以下一層還要放入,比如 L1 有放 L2 ,L3 都要放,這樣當資料被踢出 L1 cache 時 還能從 L2 ,L3 load 而 None-temporal 為用完不再用的資料,所以 L1 放就好,被踢出了也不會再從 L2 ,L3 load 進來

2.3 Software Prefetch Distance

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • early : 太早 prefetch 資料,會使得資料待在 cache 的時間過常,等不到資料真正要用的時候,就已經被踢出 cache 了
  • late : 資料太晚 prefetch 導致 cache miss latency 產生

既然 prefetch 時間太晚的話會導致 prefetch 無法掩蓋 cache miss latency ,那 prefetch 時間要多早呢?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 名詞
    • l : prefetch latency
    • s : length of the shortest path through the loop body
      當 D (D 為 tableI 的 a[i+D]) 夠大時才有辦法掩蓋 cache miss latency 但是當 D 太大時可能會造成 prefetch 進來的資料將 cache 裡有用的資料踢出,而且 array 的開頭可能不會 prefetch 這些有可能會導致更多的 cache misses

2.4

indirect memory indexing 依賴於 software 的計算,所以我們會期望 software prefetch 會比 hardware prefetch 有效

3

3.1

soft prefetch 相較於 hardware prefetch 的優點

  • large number of streams
    *
  • short streams
  • irregular Memory Access
  • Cache Locality Hint
  • Loop Bounds