陳致佑
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2016q3 Homework1 (raytracing) ##### tags: `jeff60907` contributed by <`jeff60907`> ## [作業說明](https://hackmd.io/s/B1W5AWza)及預期目標 善用 POSIX Thread, OpenMP, software pipelining, 以及 loop unrolling 一類的技巧來加速程式運作 ## 開發環境 ``` 作業系統 Lubuntu 16.04 CPU:Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K ``` * 依照作業介紹安裝以下開發工具 ``` $ sudo apt-get update $ sudo apt-get install graphviz $ sudo apt-get install imagemagick ``` ### 未優化前測試 ``` # Rendering scene Done! Execution time of raytracing() : 2.404955 sec ``` [學習GNU gprof ](http://os.51cto.com/art/200703/41426.htm) >可以追蹤程式時間到底花在那邊 function到底執行多少次、時間、流程 量出效能上的熱點 ``` $ make PROFILE=1 ``` ``` # Rendering scene Done! Execution time of raytracing() : 5.026501 sec ``` ``` $ gprof ./raytracing | less ``` 分析 程式執行的內容 ``` Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 21.12 0.42 0.42 69646433 0.00 0.00 dot_product 17.10 0.76 0.34 56956357 0.00 0.00 subtract_vector 11.56 0.99 0.23 31410180 0.00 0.00 multiply_vector 11.06 1.21 0.22 10598450 0.00 0.00 normalize 8.80 1.39 0.18 13861875 0.00 0.00 rayRectangularIntersection 5.78 1.50 0.12 4620625 0.00 0.00 ray_hit_object 4.02 1.58 0.08 17821809 0.00 0.00 cross_product 4.02 1.66 0.08 1048576 0.00 0.00 ray_color 3.52 1.73 0.07 17836094 0.00 0.00 add_vector 3.02 1.79 0.06 4221152 0.00 0.00 multiply_vectors 2.77 1.85 0.06 13861875 0.00 0.00 raySphereIntersection 2.01 1.89 0.04 2110576 0.00 0.00 compute_specular_diffuse 2.01 1.93 0.04 1241598 0.00 0.00 protect_color_overflow 1.01 1.95 0.02 1048576 0.00 0.00 rayConstruction 0.75 1.96 0.02 113297 0.00 0.00 fresnel ``` > 可以發現前面呼叫次數非常多,時間花費幾乎都在這邊,所以想要優化程式從這邊著手,而不是靠`感覺`去瞎猜,要依照真實數據去改善。 上課講解影片提到,Makefile 可以更改編譯最佳化測試 ``` CFLAGS = \ -std=gnu99 -Wall -O0 -g 改成 CFLAGS = \ -std=gnu99 -Wall -Ofast -g ``` ``` Execution time of raytracing() : 0.559535 sec ``` 電腦編譯自動最佳化 5秒變成0.55秒 ## 人腦VS大腦 超越電腦編譯器最佳化 ### 優化測試1 > 課程video 提到 for 回圈會造成分支,如果使用 展開可以有效提升執行速度 > 但是不可能總把回圈拆開來硬幹,要依照自己程式的需求目標。 #### loop unrolling 改善 第1名 dot_product() 把for展開 測試 ``` double dot_product(const double *v1, const double *v2) { double dp = 0.0; dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2]; return dp; } ``` ``` Execution time of raytracing() : 4.577559 sec ``` > 原本5.02秒變成4.57秒,只改了一行for寫法就改善了時間效能 ``` Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 24.41 0.40 0.40 56956357 0.00 0.00 subtract_vector 12.81 0.61 0.21 69646433 0.00 0.00 dot_product 11.59 0.80 0.19 17836094 0.00 0.00 add_vector 9.76 0.96 0.16 31410180 0.00 0.00 multiply_vector ``` > 重新看了 gprof dot_product的名次也下降了,繼續看其他數學式程式碼改善效能 把add_vector()、subtract_vector()、multiply_vectors()、multiply_vector()的for loop也拆開來 ``` Execution time of raytracing() : 4.211204 sec ``` > 效能改善4.57 到 4.211 改善幅度沒有像dot_product 多 ### 優化測試2 [wiki inline function](https://en.wikipedia.org/wiki/Inline_function) [參考rnic開發紀錄](https://embedded2016.hackpad.com/2016q1-Homework-2A-GalzL151aZc) 其中提到 inline function效能提升,看資料研究原理 [參考1](http://blog.yam.com/swwuyam/article/11745212) [參考2](http://hugedream2.blogspot.tw/2008/09/macroinline.html)提到 巨集 VS inline > macro與inline的差異: > * macro是由preprocessor處理,由inline則是compiler來負責 > * macro對於檢查傳入參數的型別,而inline則會檢查型別 > * marco 不執行參數型別檢查、也沒有語法檢查,也可能產生邏輯錯誤 [參考3](http://openhome.cc/Gossip/CppGossip/inlineFunction.html)提到行內函式只能建議編譯器,也就是說建議並不一定會被採納,這視您的編譯器而定。 #### 嘗試inline function 確定展開 將原先每個 static inline 後面加上 _ _attribute _ _((always_inline))確定展開 ``` static inline __attribute__((always_inline)) ``` ``` Execution time of raytracing() : 2.048317 sec ``` > 從4.211秒 效能直接改善約一半 到2.048秒 ``` Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 36.52 0.46 0.46 13861875 0.00 0.00 rayRectangularIntersection 15.88 0.66 0.20 2110576 0.00 0.00 compute_specular_diffuse 11.91 0.81 0.15 13861875 0.00 0.00 raySphereIntersection 11.91 0.96 0.15 1048576 0.00 0.00 ray_color 7.15 1.05 0.09 4620625 0.00 0.00 ray_hit_object 4.76 1.11 0.06 1048576 0.00 0.00 rayConstruction 3.97 1.16 0.05 2110576 0.00 0.00 localColor ``` > 因為function都展開了,少了呼叫數學 function傳值的時間 嘗試把 return 計算式內容,而不是先存後傳發現效能有提高一些! ``` static inline __attribute__((always_inline)) double dot_product(const double *v1, const double *v2) { return v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2]; } ``` ``` Execution time of raytracing() : 1.988770 sec ``` > 時間降約 0.005秒左右 ### 優化測試3 #### 多執行緒處理-使用OpenMP > 過去沒有做過多執行緒平行化處理的程式,知道一些些概念卻不是很懂,需要補充自己的背景知識 > > 多執行緒還有許多方法可以探討 [OpenMP系列教學文](https://kheresy.wordpress.com/2006/06/09/%E7%B0%A1%E6%98%93%E7%9A%84%E7%A8%8B%E5%BC%8F%E5%B9%B3%E8%A1%8C%E5%8C%96%E6%96%B9%E6%B3%95%EF%BC%8Dopenmp%EF%BC%88%E4%B8%80%EF%BC%89%E7%B0%A1%E4%BB%8B/) 回頭再試試看其他指令及範例練習 [參考dada實作](https://embedded2016.hackpad.com/2016q1-Homework-2-A-jugsB8br8Bt) [參考raytracing共筆](https://embedded2016.hackpad.com/ep/pad/static/f5CCUGMQ4Kp) OpenMP(Open Multi-Processing)是一套支持跨平台、共享記憶體方式的多執行緒平行運算的API。 * 它比起使用作業系統直接建立執行緒有三大優點: * CPU核心數量的擴展性 * 便利性 * 可移植性 首先要進行平行化處理,必須先探討平行處理的程式碼是不是各自獨立沒有相依性,如果有相依性,進行平行化處理thread計算速度各自不同,資料就會錯誤。 * 查看 raytracing.c的 raytracing function 發現pixel可以各自平行處理,彼此沒有相依性的問題。 在 raytracing.c 中的 raytracing function內的for建立平行化處理 * 沒有選擇多少執行緒處理,直接用電腦核心平行運算 ``` Execution time of raytracing() : 1.207583 sec ``` > 時間減少約 0.8秒 如果指定多少執行緒處理? * num_threads選擇多少數量的執行緒平行處理 * private 讓每個執行緒中,都有一份變數的複本,以免互相干擾。 ``` #pragma omp parallel for num_threads(512) \ private(stk), private(d), \ private(object_color) for (int j = 0; j < height; j++) { for (int i = 0; i < width; i++) { ``` ``` 2 threads Execution time of raytracing() : 1.965742 sec 4 threads Execution time of raytracing() : 1.655131 sec 8 threads Execution time of raytracing() : 1.229635 sec 32 threads Execution time of raytracing() : 1.006759 sec 64 threads Execution time of raytracing() : 0.968196 sec 128 threads Execution time of raytracing() : 0.941614 sec 256 threads Execution time of raytracing() : 0.911764 sec 512 threads Execution time of raytracing() : 0.908156 sec 1024 threads Execution time of raytracing() : 0.949907 sec ``` ``` Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 40.01 0.02 0.02 85023 0.00 0.00 raySphereIntersection 40.01 0.04 0.02 84535 0.00 0.00 rayRectangularIntersection 20.01 0.05 0.01 7767 0.00 0.00 rayConstruction 0.00 0.05 0.00 33026 0.00 0.00 ray_hit_object 0.00 0.05 0.00 18471 0.00 0.00 idx_stack_empty ``` > threads 前面看起來越高效果越好,可是真的是這樣子嗎? > 原本dada使用的是64 threads但是效能卻不是最高的,看了一下main.c COL 是用512,所以使用 512 threads可以有效率的平行展開處理,反而獲得最佳的處理效能。 >

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully