--- title: Adv Compiler(12/01)W12#9 - Instruction-Level Parallelism tags: 111-1, AdvCompiler --- [TOC] <style> .blue { color: blue; } .red { color: red; } .green { color: green; } .purple { color: purple; } </style> # **Billion-Dollar Question** - How fast can a program be run on a processor with **instruction-level parallelism**? ⬢ The <span class="blue">potential parallelism</span> in the program. ⬢ The <span class="blue">available parallelism</span> on the processor. ⬢ Our ability to <span class="blue">extract parallelism</span> from sequential programs. ⬢ Our ability to find <span class="blue">the best parallel schedule</span> given scheduling constraints - If all the operations in a program are <span class="red">highly dependent</span> on one another, then <span class="red">**no amount of hardware or parallelization techniques**</span> can make the program run fast in parallel. $\Rightarrow$ 是否會有高度相關的 proogram,無法用平行化的技術或硬體上的平行處裡做優化。 盡可能在硬體上做優化作改良,老師覺得在靜態上做優化是 ok der,因為在 dynamic 時,core 需要 run 在其他重要的工作上,拿來解決平行處理有點浪費。 # Processor Architectures ## 5-Stage Instruction Pipe ![](https://i.imgur.com/7XTImA1.png) # Code-Scheduling Constraints ## Code Scheduling ⬢ Code scheduling is a form of program optimization that applies to the machine code that is produced by the code generator. ⬢ Code scheduling is subject to three kinds of constraints: 1. Control-dependence constraints. - if-else - case 2. Data-dependence constraints. - **今天主講** 3. Resource constraints. - 是不是每個 core 都能做相同的功能? - ex. 一台電腦 4 個 core,全部都有處理 floating type 的能力嗎? $\rightarrow$ 看該台電腦的 CPU 內,若每個 core 則 ic成本增加、ic size 增、功率變大、要求的散熱能力益增加、power supply upup: 對整台電腦的要求也增加。 ## Data Dependence ⬢ True dependence: read after write(先讀再寫) ⬢ Anti-dependence: write after read(先寫再讀: 會讀到新 value) - 違反原本語意 ⬢ Output dependence: write after write(寫完再寫) - 第一個寫入的的 value 就會被覆蓋 - ex. an account 先從花蓮的 bank 寫入,再從其他縣市的 bank 寫入, ⬢ Q: What about read after read? - Read 不會互相影響。 ## Dependences among Memory Accesses - 列出來比較難的 dependence 分析,提供給碩士做論文用。放兩個進去可以讓口試委員覺得很厲害(?) - Array Data-Dependence Analysis - 有些指令會影響整個 array 的 value,就要納入 Read Write 的討論。通常只會寫 A[i],是否需要將整個 array 的值都拉出來討論? - ex. A[5] = 5 和 A[2] = 4 ,若 implement write 的方式是將整個 array 都標示正在 write,則平行處理上就會有問題。 - Pointer-Alias Analysis - 多個變數(不同name)都參考/指向同一個記憶體位置(即改記憶體位置會改到同一塊),所以在分析計算考量時就要將此分析納入。 - =同個 location 有不同的名字 - referece link: 1.[打字猴](https://coherence0815.wordpress.com/2014/08/16/pointer-aliasing/) 2.[幾句話說清楚](https://decodezp.github.io/2018/12/11/quickwords4-pointer-aliasing/) - Inter-procedural Analysis - referece link: 1. # Basic-Block Scheduling 1. Data-Dependence - ![](https://i.imgur.com/tKQ0SuJ.png) 2. List Scheduling a basic block ![](https://i.imgur.com/vg4o89g.png) - Algorithm Graph ![](https://i.imgur.com/G1H70oh.png) # Observations - Without resource constraints, **the shortest schedule** is given by the **critical path**, the longest path through the data-dependence graph. - If all operations are independent, the **length of the schedule** is constrained by the **resource available**. - We can use the source ordering to **break ties** between operations. ## Result of List Scheduling ![](https://i.imgur.com/hNRGjIw.png) - Q: Why not parallelize the second and the third instructions? ![](https://i.imgur.com/cxFvnM8.png) # Global Code Scheduling ## An example ![](https://i.imgur.com/pQLhnYQ.png) ## Region-Based Scheduling - 一個 region 就是一個 block - ![](https://i.imgur.com/aAyKKkP.png) ### Algorithm ![](https://i.imgur.com/M9hx3bl.png) ## Loop Unrolling ![](https://i.imgur.com/uZaIgZ4.png) 上方的 code 表示在四個 core 的 CPU 裡頭,literally 呼叫,能省去後方做 parallelize 的動作。 # Quiz 8 ![](https://i.imgur.com/LZYjszB.png) - 80%/4 * 1000 + 20% * 1000 = 400s