Diana
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # [raytracing](https://hackmd.io/s/B1W5AWza) [github](https://github.com/diana0651/raytracing) contributed by <`Diana Ho`> ###### tags: `d0651` `sys` ## 案例分析: ### 作業要求 以 `make PROFILE=1` 重新編譯程式碼 以 gprof 指出效能瓶頸,改寫檔案 `math-toolkit.h` 在內的函式實做 ### 開發環境 Ubuntu 14.04 LTS ### 開發工具 - `$ sudo apt-get install graphviz` 畫示意圖 - `$ sudo apt-get install imagemagick` 轉換格式 --- ## gprof * [GNU Profiling Tool](https://sourceware.org/binutils/docs/gprof/) * [使用Gnu gprof进行Linux平台下的程序分析](http://os.51cto.com/art/200703/41426.htm) * gprof 是一個可以分析程式的每個 function 使用多少次的工具(包含執行次數、消耗時間...等),方便我們知道程式的瓶頸,並且知道應該對哪些 function 進行優化可以得到性能提升,尤其是 CPU 應用 * Gprof 實現原理 * `$ gcc –pg test.c –o test` : 新建 test.c 文件,並用 -pg 編譯和鏈結,產生一個 test.o 的目的檔 * `$./test`: 執行 test.o,產生 gmon.out 文件,供 gprof 分析數據 * `$ gprof -b a.out gmon.out | less` : 因為 gprof 输出的訊息比較多,使用 less 命令可以直接查看 gprof 的輸出 | 表示 gprof -b a.out gmon.out 的輸出作為 less 的輸入 可以看到 gprof 分析程式,每個函數所花的時間,因此可以針對時間花費最多的地方進行改善 * 使用 Gprof 分析 Cflow 開源項目 * CFlow 是程序流程分析工具,可以分析 C 語言,產生程序調用圖。和 Gprof 差不多,但 CFlow 是靜態分析且不能分析 C++ 語言 * 下載 http://www.gnu.org/software/cflow/ ,解壓縮 `$ tar zxvf cflow-1.1.tar.gz` * 編譯與執行 `$./configure` * `$ make CFLAGS=-pg LDFLAGS=-pg` 產生 cflow,但是沒有 –pg * ... * 生成圖形化的函數調用圖 * [Graphviz](http://www.graphviz.org/) : Graph Visualization 圖形可視覺化工具 ```graphviz digraph G { node1; node2; node3; node1 -> node2 [label="edge_1_2"]; node1 -> node3 [label="edge_1_3"]; node2 -> node3 [label="edge_2_3"]; } ``` --- ## 未優化版本 ### 作業操作 * 編譯時`make PROFILE=1` 產生raytracing執行檔 讓程式重新編譯並加上參數`-pg` 編譯器在每一個 function 加上 mcount * 執行時執行 `./raytracing ` 讓 gprof 啟動並分析數據,執行產生 gmon.out `$ gprof -b raytracing gmon.out | less` 得到 raytracing 中各函數的詳細資訊,**執行並紀錄,所以跑起來會比原程式慢的許多** * 與**perf top**比較: 每隔一段 period 會去做採樣(Sample),最後統計出大概的數據 ### 測試程式效能 #### 原始測試 ```clike= $ ./raytracing # Rendering scene Done! Execution time of raytracing() : 2.923739 sec $ make check # Rendering scene Done! Execution time of raytracing() : 3.038861 sec Verified OK ``` #### gprof測試 * `$ make PROFILE=1` 之前必須 `$ make clean` ``` clike= $ make PROFILE=1 cc -std=gnu99 -Wall -O0 -g -c -o objects.o objects.c cc -std=gnu99 -Wall -O0 -g -c -o raytracing.o raytracing.c cc -std=gnu99 -Wall -O0 -g -c -o main.o main.c cc -o raytracing objects.o raytracing.o main.o -lm $ ./raytracing # Rendering scene Done! Execution time of raytracing() : 5.704226 sec ``` * `clike= $ gprof raytracing gmon.out ` ##### 第一部分 shows how much time was spent executing directly in each function ```clike= Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 27.35 0.88 0.88 69646433 0.00 0.00 dot_product 17.09 1.43 0.55 56956357 0.00 0.00 subtract_vector 9.94 1.75 0.32 31410180 0.00 0.00 multiply_vector 9.01 2.04 0.29 10598450 0.00 0.00 normalize 8.70 2.32 0.28 13861875 0.00 0.00 rayRectangularIntersection 7.15 2.55 0.23 17836094 0.00 0.00 add_vector 4.04 2.68 0.13 17821809 0.00 0.00 cross_product 3.26 2.79 0.11 13861875 0.00 0.00 raySphereIntersection 2.80 2.88 0.09 4620625 0.00 0.00 ray_hit_object 2.49 2.96 0.08 1048576 0.00 0.00 ray_color 1.55 3.01 0.05 1048576 0.00 0.00 rayConstruction 1.24 3.05 0.04 1 0.04 3.22 raytracing 0.93 3.08 0.03 4221152 0.00 0.00 multiply_vectors 0.93 3.11 0.03 2110576 0.00 0.00 compute_specular_diffuse 0.78 3.13 0.03 2110576 0.00 0.00 localColor 0.62 3.15 0.02 2520791 0.00 0.00 idx_stack_top 0.62 3.17 0.02 1241598 0.00 0.00 refraction 0.62 3.19 0.02 1204003 0.00 0.00 idx_stack_push 0.31 3.20 0.01 3838091 0.00 0.00 length 0.31 3.21 0.01 2558386 0.00 0.00 idx_stack_empty 0.31 3.22 0.01 1241598 0.00 0.00 reflection ``` 執行時間較多的 function,在 math-toolkit.h 裡,以他們為目標優先優化 ##### 第二部分 shows which functions called which others, and how much time each function used when its subroutine calls are included ```clike= Call graph granularity: each sample hit covers 2 byte(s) for 0.31% of 3.22 seconds index % time self children called name 0.04 3.18 1/1 main [2] [1] 100.0 0.04 3.18 1 raytracing [1] 0.08 2.94 1048576/1048576 ray_color [3] 0.05 0.11 1048576/1048576 rayConstruction [14] 0.00 0.00 1/1 calculateBasisVectors [25] 0.00 0.00 1048576/1048576 idx_stack_init [27] ----------------------------------------------- <spontaneous> [2] 100.0 0.00 3.22 main [2] 0.04 3.18 1/1 raytracing [1] 0.00 0.00 3/3 append_sphere [29] 0.00 0.00 3/3 append_rectangular [28] 0.00 0.00 2/2 append_light [30] 0.00 0.00 1/1 write_to_ppm [35] 0.00 0.00 1/1 delete_rectangular_list [32] 0.00 0.00 1/1 delete_sphere_list [33] 0.00 0.00 1/1 delete_light_list [31] 0.00 0.00 1/1 diff_in_second [34] ----------------------------------------------- ``` ### 小結 - 由結果可知: 我們必須對前幾個執行次數高、執行時間長的函式,如`dot_product()`、`subtract_vector()`、`multiply_vector()`等進行效能的改善。 - 關於執行時間: 使用 gprof 來追蹤程式的時間較直接`$ make`來產生 raytracting 執行檔的時間久。 --- ## 優化版本1: loop unrolling 使用 loop unrolling 將 for 迴圈中的表示式直接展開時,我們可以省下 **counter 計算時間** 和 **branch predict 的 fail 的時間** >[loop unrolling](https://www.cs.umd.edu/class/fall2001/cmsc411/proj01/proja/loop.html) >[loop unrolling wiki](https://en.wikipedia.org/wiki/Loop_unrolling) - dot_product 的 function 改寫:和原先的 loop 相比,instructions 數只剩1/3 ```clike= static inline double dot_product(const double *v1, const double *v2) { double dp = 0.0; dp += v1[0] * v2[0]; dp += v1[1] * v2[1]; dp += v1[2] * v2[2]; return dp; } ``` - 對 multiply_vector、multiply_vectors、subtract_vector,有相似程式碼也進行改寫 ### 測試程式效能 - Execution time: 從 5.704226 改善到 4.397736 sec ```clike= $ ./raytracing # Rendering scene Done! Execution time of raytracing() : 4.397736 sec ``` - 經過 loop unrolling 的 function 也降低執行時間 ```clike= $ gprof -b raytracing gmon.out | head -n 20 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 28.00 0.68 0.68 69646433 0.00 0.00 dot_product 10.29 0.93 0.25 56956357 0.00 0.00 subtract_vector 9.06 1.15 0.22 13861875 0.00 0.00 raySphereIntersection 9.06 1.37 0.22 13861875 0.00 0.00 rayRectangularIntersection 8.24 1.57 0.20 10598450 0.00 0.00 normalize 7.00 1.74 0.17 17836094 0.00 0.00 add_vector 6.59 1.90 0.16 17821809 0.00 0.00 cross_product 6.59 2.06 0.16 31410180 0.00 0.00 multiply_vector 3.71 2.15 0.09 4620625 0.00 0.00 ray_hit_object 2.26 2.21 0.06 1048576 0.00 0.00 ray_color 1.65 2.25 0.04 4221152 0.00 0.00 multiply_vectors 1.65 2.29 0.04 2110576 0.00 0.00 compute_specular_diffuse 1.24 2.32 0.03 2110576 0.00 0.00 localColor 0.82 2.34 0.02 1241598 0.00 0.00 protect_color_overflow 0.82 2.36 0.02 1241598 0.00 0.00 refraction ``` --- ## 優化版本2: force inline function 因爲在 Makefile 中擁有 -O0 的指令(關閉最佳化),所以 inline 不會被執行 inline 在編譯器中不一定會展開,因此修改函式就可以強制展開:`static inline __attribute__((always_inline))` * `static`:對 definition 有效,限制 function 或 global variable 只會在所在檔案中被 access * `lnline`:告訴編譯器將 function 直接展開成 definition 的程式,可透過加上 `__attribute__((always_line))` 強制 gcc 在最佳化沒有開啟時 inline。 >[Inline function觀念](http://blog.yam.com/swwuyam/article/11745212) >[內嵌函數(inline function)筆記](https://dotblogs.com.tw/v6610688/2013/11/27/introduction_inline_function) >[Inline function in C](http://www.greenend.org.uk/rjk/tech/inline.html) :::info [Inline function、Marco](http://ascii-iicsa.blogspot.tw/2010/08/inline-functionmarco.html) Macro 是在 Compiler 進行編譯之前就被替換,而 Inline 是會被 Compiler 看到的,且在展開時系統會自動做安全檢查或型態的轉換 Macro巨集 跟 Inline 有點像,都是對該段代碼進行直接替換 用法: 在 math-toolkit.h 中直接使用 ::: ### 測試程式效能 * Execution time: 從 4.397736 改善到 2.473517 sec ```clike= $ ./raytracing # Rendering scene Done! Execution time of raytracing() : 2.473517 sec ``` --- ## 優化版本3: OpenMP OpenMP(Open Multi-Processing)是一套支持跨平台、共享記憶體方式的多執行緒平行運算的API - 它比起使用作業系統直接建立執行緒有三大優點: - CPU核心數量的擴展性 - 便利性 - 可移植性 - OpenMP的用法: - 平行設計 parallel:用以建造一個平行塊 - 資料處理 資料處理在OpenMP是最爲重要的一部分,例如private( ):它把變數宣告爲私有變數,讓其他執行緒無法存取,可以解決race condition的問題。 - 任務排程 schedule:有四種類型:dynamic、guided、runtime和static,使用時可以根據程式的結構對線程進行不同調度。 - OpenMP在什麼情況下可提升效能? 使用平均分配的方法,適合執行工作複雜度平均的程式碼 - 以光影追蹤爲例,產生的圖片每個地方的複雜度是不一樣的,如果依照平均分配工作要進行任務,被分派至複雜度高區域做工的執行緒會比其他的慢。 > > [觀念參考](https://embedded2016.hackpad.com/Enhance-raytracing-program-f5CCUGMQ4Kp) ### OpenMP 的用法 - step1: 呼叫 `#include <omp.h>` - step2: 在迴圈前面加一行指令 `#pragma omp parallel for` - step3: 編譯時加上參數 `-fopenmp` #### 案例執行 ` #pragma omp parallel for schedule(guided,1) collapse(2) num_threads(64) private(stk), private(d),private(object_color) ` ### 測試程式效能 - 程式記憶體區段錯誤 - #pragma omp parallel for - #pragma omp parallel for num_threads(64) - (`$ ./raytracing` 可以執行) ```clike= $ ./raytracing # Rendering scene Done! Execution time of raytracing() : 0.025289 sec $ ./raytracing # Rendering scene 程式記憶體區段錯誤 (core dumped) $ make check # Rendering scene Done! Execution time of raytracing() : 0.019904 sec 二元碼檔 baseline.ppm 與 out.ppm 不同 Fail Verified OK ``` ![](https://i.imgur.com/eM8g4Sk.png) - Execution time: 從 2.473517 改善到 1.039256 sec - 因為變數 stk、d、object_color 必須要透過 private clause 宣告為執行緒私有,否則預設之下在 #pragma omp 前的變數都是 shared,會導致計算結果錯誤。 - #pragma omp parallel for num_threads(64) private(stk), private(d), private(object_color) ```clike= $ ./raytracing # Rendering scene Done! Execution time of raytracing() : 1.039256 sec $ make check # Rendering scene Done! Execution time of raytracing() : 1.044793 sec Verified OK ``` --- ## 優化版本4: Multi-Thread > [POSIX線程(pthread)入門](http://dragonspring.pixnet.net/blog/post/32963482) >[POSIX Threads Programming](https://computing.llnl.gov/tutorials/pthreads/) >[Introduction to Parallel progamming](https://computing.llnl.gov/tutorials/parallel_comp/) - 多核心程式的好處: - Parallelism:一個系統可以同時執行多個任務 - Data Parallelism:分佈相同的資料子集在多核心,每個資料具有相同的運作 - Task Parallelism:分佈執行緒在核心中,每個執行緒具有獨特的運作 - PThread: PThread 定義了一套 C 語言的型態定義、函數和常數。Pthread API 中有 100 個函式呼叫,全都以 `pthread\_` 開頭,並可以分為 4 類 - 執行緒管理:建立執行緒、等待執行緒、查詢執行緒狀態 - Mutex:建立、銷毀、鎖定、解鎖、設定屬性等操作 - Condition Variable:建立、消滅、等待、通知、設定與查詢屬性等操作 - 讀寫鎖的執行緒間的同步管理 > > [觀念參考](https://embedded2016.hackpad.com/Enhance-raytracing-program-f5CCUGMQ4Kp) ### PThread 函式 #### [pthread_create](http://man7.org/linux/man-pages/man3/pthread_create.3.html) ```clike= #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg); ``` 產生一個 Thread 並執行附帶的 Function #### [pthread_join](http://man7.org/linux/man-pages/man3/pthread_join.3.html) ```clike= #include <pthread.h> int pthread_join(pthread_t thread, void **retval); ``` 暫停目前執行 pthread_join 的 Thread,等到目標 Thread 執行完畢之後目前執行 pthread_join 的 Thread 才會繼續執行 ### Thread 策略分析 > > [實作參考](https://embedded2016.hackpad.com/2016q1-Homework-2A-bLGQtRraTES) > > [實作參考](https://embedded2016.hackpad.com/Enhance-raytracing-program-f5CCUGMQ4Kp) 1. 使用 Light Complexity 看每一點的複雜度,降低每個thread的時間差 - 可以降低每個thread的時間差,讓比較快做完的thread不必等慢的太久 - 比較複雜部分的先做 - 縮小比較複雜的部分,減少計算這個部分的執行緒的時間 2. 決定thread平行策略,讓每個部分都可以並行處理 - 分row / 分column - 上左、上右、下左、下右 - 旋轉 --- <style> h2.part {color: red;} h3.part {color: green;} h4.part {color: blue;} h5.part {color: black;} </style>

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully