---
title: Adv Compiler(12/01)W12#9 - Instruction-Level Parallelism
tags: 111-1, AdvCompiler
---
[TOC]
<style>
.blue {
color: blue;
}
.red {
color: red;
}
.green {
color: green;
}
.purple {
color: purple;
}
</style>
# **Billion-Dollar Question**
- How fast can a program be run on a processor with **instruction-level parallelism**?
⬢ The <span class="blue">potential parallelism</span> in the program.
⬢ The <span class="blue">available parallelism</span> on the processor.
⬢ Our ability to <span class="blue">extract parallelism</span> from sequential programs.
⬢ Our ability to find <span class="blue">the best parallel schedule</span> given scheduling
constraints
- If all the operations in a program are <span class="red">highly dependent</span> on one another, then <span class="red">**no amount of hardware or parallelization techniques**</span> can make the program run fast in parallel.
$\Rightarrow$ 是否會有高度相關的 proogram,無法用平行化的技術或硬體上的平行處裡做優化。
盡可能在硬體上做優化作改良,老師覺得在靜態上做優化是 ok der,因為在 dynamic 時,core 需要 run 在其他重要的工作上,拿來解決平行處理有點浪費。
# Processor Architectures
## 5-Stage Instruction Pipe

# Code-Scheduling Constraints
## Code Scheduling
⬢ Code scheduling is a form of program optimization
that applies to the machine code that is produced
by the code generator.
⬢ Code scheduling is subject to three kinds of
constraints:
1. Control-dependence constraints.
- if-else
- case
2. Data-dependence constraints.
- **今天主講**
3. Resource constraints.
- 是不是每個 core 都能做相同的功能?
- ex. 一台電腦 4 個 core,全部都有處理 floating type 的能力嗎?
$\rightarrow$ 看該台電腦的 CPU 內,若每個 core 則 ic成本增加、ic size 增、功率變大、要求的散熱能力益增加、power supply upup: 對整台電腦的要求也增加。
## Data Dependence
⬢ True dependence: read after write(先讀再寫)
⬢ Anti-dependence: write after read(先寫再讀: 會讀到新 value)
- 違反原本語意
⬢ Output dependence: write after write(寫完再寫)
- 第一個寫入的的 value 就會被覆蓋
- ex. an account 先從花蓮的 bank 寫入,再從其他縣市的 bank 寫入,
⬢ Q: What about read after read?
- Read 不會互相影響。
## Dependences among Memory Accesses
- 列出來比較難的 dependence 分析,提供給碩士做論文用。放兩個進去可以讓口試委員覺得很厲害(?)
- Array Data-Dependence Analysis
- 有些指令會影響整個 array 的 value,就要納入 Read Write 的討論。通常只會寫 A[i],是否需要將整個 array 的值都拉出來討論?
- ex. A[5] = 5 和 A[2] = 4 ,若 implement write 的方式是將整個 array 都標示正在 write,則平行處理上就會有問題。
- Pointer-Alias Analysis
- 多個變數(不同name)都參考/指向同一個記憶體位置(即改記憶體位置會改到同一塊),所以在分析計算考量時就要將此分析納入。
- =同個 location 有不同的名字
- referece link:
1.[打字猴](https://coherence0815.wordpress.com/2014/08/16/pointer-aliasing/)
2.[幾句話說清楚](https://decodezp.github.io/2018/12/11/quickwords4-pointer-aliasing/)
- Inter-procedural Analysis
- referece link: 1.
# Basic-Block Scheduling
1. Data-Dependence
- 
2. List Scheduling a basic block

- Algorithm Graph

# Observations
- Without resource constraints, **the shortest schedule** is given by the **critical path**, the longest path through the data-dependence graph.
- If all operations are independent, the **length of the schedule** is constrained by the **resource available**.
- We can use the source ordering to **break ties** between operations.
## Result of List Scheduling

- Q: Why not parallelize the second and the third instructions?

# Global Code Scheduling
## An example

## Region-Based Scheduling
- 一個 region 就是一個 block
- 
### Algorithm

## Loop Unrolling

上方的 code 表示在四個 core 的 CPU 裡頭,literally 呼叫,能省去後方做 parallelize 的動作。
# Quiz 8

- 80%/4 * 1000 + 20% * 1000 = 400s