Try   HackMD

Billion-Dollar Question

  • How fast can a program be run on a processor with instruction-level parallelism?
    ⬢ The potential parallelism in the program.
    ⬢ The available parallelism on the processor.
    ⬢ Our ability to extract parallelism from sequential programs.
    ⬢ Our ability to find the best parallel schedule given scheduling
    constraints
  • If all the operations in a program are highly dependent on one another, then no amount of hardware or parallelization techniques can make the program run fast in parallel.
    是否會有高度相關的 proogram,無法用平行化的技術或硬體上的平行處裡做優化。
    盡可能在硬體上做優化作改良,老師覺得在靜態上做優化是 ok der,因為在 dynamic 時,core 需要 run 在其他重要的工作上,拿來解決平行處理有點浪費。

Processor Architectures

5-Stage Instruction Pipe

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Code-Scheduling Constraints

Code Scheduling

⬢ Code scheduling is a form of program optimization
that applies to the machine code that is produced
by the code generator.
⬢ Code scheduling is subject to three kinds of
constraints:

  1. Control-dependence constraints.
    • if-else
    • case
  2. Data-dependence constraints.
    • 今天主講
  3. Resource constraints.
    • 是不是每個 core 都能做相同的功能?
    • ex. 一台電腦 4 個 core,全部都有處理 floating type 的能力嗎?
      看該台電腦的 CPU 內,若每個 core 則 ic成本增加、ic size 增、功率變大、要求的散熱能力益增加、power supply upup: 對整台電腦的要求也增加。

Data Dependence

⬢ True dependence: read after write(先讀再寫)
⬢ Anti-dependence: write after read(先寫再讀: 會讀到新 value)
- 違反原本語意
⬢ Output dependence: write after write(寫完再寫)
- 第一個寫入的的 value 就會被覆蓋
- ex. an account 先從花蓮的 bank 寫入,再從其他縣市的 bank 寫入,
⬢ Q: What about read after read?

  • Read 不會互相影響。

Dependences among Memory Accesses

  • 列出來比較難的 dependence 分析,提供給碩士做論文用。放兩個進去可以讓口試委員覺得很厲害(?)

  • Array Data-Dependence Analysis

    • 有些指令會影響整個 array 的 value,就要納入 Read Write 的討論。通常只會寫 A[i],是否需要將整個 array 的值都拉出來討論?
    • ex. A[5] = 5 和 A[2] = 4 ,若 implement write 的方式是將整個 array 都標示正在 write,則平行處理上就會有問題。
  • Pointer-Alias Analysis

    • 多個變數(不同name)都參考/指向同一個記憶體位置(即改記憶體位置會改到同一塊),所以在分析計算考量時就要將此分析納入。
    • =同個 location 有不同的名字
    • referece link:
      1.打字猴
      2.幾句話說清楚
  • Inter-procedural Analysis

    • referece link: 1.

Basic-Block Scheduling

  1. Data-Dependence
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  1. List Scheduling a basic block
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Algorithm Graph
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Observations

  • Without resource constraints, the shortest schedule is given by the critical path, the longest path through the data-dependence graph.
  • If all operations are independent, the length of the schedule is constrained by the resource available.
  • We can use the source ordering to break ties between operations.

Result of List Scheduling

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Q: Why not parallelize the second and the third instructions?
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Global Code Scheduling

An example

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Region-Based Scheduling

  • 一個 region 就是一個 block
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Algorithm

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Loop Unrolling

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

上方的 code 表示在四個 core 的 CPU 裡頭,literally 呼叫,能省去後方做 parallelize 的動作。

Quiz 8

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 80%/4 * 1000 + 20% * 1000 = 400s