or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Computing Pi with prefetch
contributed by <
ierosodin
>, <oiz5201618
>tags:
sysprog-hw
prefetch
π calculation
論文
出處:
動機
已經有許多基於硬體及軟體的 prefetch 機制,但 prefetch 的使用方法及使用時機卻沒有一份詳細完整的說明,因此在該論文中將探討:
論文中提到 (讀完再做整理)
When Prefetching Works, When It Doesn’t, and Why
stride of an array => Array 中的一個元素的大小
Streaming Prefetch
Two prefetching approaches
SW
compiler issues prefetch instructions.
(problems with extra instruction overhead)
HW
hardware decides which memory addresses to prefetch based on past accesses or future instructions.
(problems with lateness, inaccurate addresses, lengthening the critical path)
Hardware prefetching mechanisms are generally categorized to sequential, stride and context methods.
Sequential prefetching mechanisms are simple and efficient which exploit spatial locality and work well for simple data structures like arrays.
Stride-based methods monitor miss addresses and detect constant strides from loop structures. The idea here is to build a table, record miss addresses and compare successive addresses to find constant strides.
Context-based methods, also named Markov predictors, use a set of past values for prefetching. It can capture linked list and pointer chasing activities.
實作cache latency test
設計實驗,利用 rdtsc 量測 cycle,連續讀取一塊記憶體位址,觀察 AVX 在 prefetch 前後 cache 的行為
ASM
在 C 語言中使用 asm 是 AT&T 規格,因此指令的 source 跟 destination 必須跟 intel 規格相反
_mm256_load_pd() 組語
RDTSC Overhead count
統計學處理資料
實驗數據
在 software prefetch 前,連續使用 rand() 隨機抓取陣列 load 進 ymm0 ,cache-misses 並沒有因此而增加(約2~3%)
嘗試使用不同的pi計算方式
Geometric Constructions
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →詳細的數學推導可以參考Newton's approximation of Pi
這些式子再由 Beeler 經過一些數學轉換變成
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →最後 Dik T. Winter 寫成 C code
效能測試
Spigot Algorithm for Pi
An algorithm which generates digits of a quantity one at a time without using or requiring previously computed digits
Reference