Shaoyu-Chen
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # **Mergesort Concurrent** ## **1. PThreads** [**[Source]**](http://shop.oreilly.com/product/9781565921153.do) ### **Mode** * Boss/Worker Mode:Boss 接收到一個新要求時,動態建立一個 worker 完成該項要求,適用於伺服器上。可於程式開始時先建立好一定數量的執行緒(Thread Pool),減少執行時期的負擔。 * Peer Mode:相較於 Boss/Worker 模式,此模式的每執行緒擁有自己的輸入,適用於矩陣運算、資料庫搜尋等。 * Pipeline Mode:執行緒由上一階段取得輸入資料,並將結果傳至下一階段的執行緒。整體輸出受制於執行最久的階段,須注意每一階段的工作負擔要盡可能平均。 ### **Synchronization** * **pthread_join:** 某執行緒暫停執行直到另一直行緒結束 * **Mutex:** 同一時間內只有一執行緒可以保有該 lock 及保護資料的存取權 * Initialize * Static:```PTHREAD_MUTEX_INITIALIZER``` * Dynamic:```pthread_mutex_init()``` * Lock * ```pthread_mutex_lock()``` * Release * ```pthread_mutex_unlock()``` * **Condition variable:** 將事件實質化,並提供函式喚醒等待該事件的執行緒 * Initialize * Static:```PTHREAD_COND_INITIALIZER``` * Dynamic:```pthread_cond_init()``` * Waits for condition variables * 等待直到喚醒:```pthread_cond_wait()``` * 等待特定時間:```pthread_cond_timewait()``` * 喚醒等待中執行緒 * 喚醒其中之一:```pthread_cond_signal()``` * 喚醒所有等待中執行緒:```pthread_cond_broadcast()``` * 為避免 [**Spurious Wake Ups**](https://en.wikipedia.org/wiki/Spurious_wakeup) 或其他先被喚醒的執行緒執行導致條件變數再次不成立,一般會以迴圈判斷條件是否成立,不成立則再次進入等待 * **pthread_once:** 保證初始函式被許多直行緒呼叫時僅執行一次 * 宣告 pthread_once_t 並靜態初始為```PTHREAD_ONCE_INIT``` * 以 pthread_once() 與 once 區域呼叫目標函式 * 不容許傳遞參數至 once 所保護的函式 ### **Pthreads Management** #### Key 一種指標,將資料和執行緒進行關聯 #### Thread cancellation * **States:** * ```PTHREAD_CANCEL_DISABLE``` * **Type:** ignored * ```PTHREAD_CANCEL_ENABLE``` * **Types:** * ```PTHREAD_CANCEL_ASYNCHRONOUS```:立刻取消 * ```PTHREAD_CANCEL_DEFERRED```:當執行到 Cancellation-point 時發生 * **Cancellation-point** * ```pthread_cond_wait(), pthread_cond_timewait(), pthread_join()``` * 使用者定義```pthread_testcancel()``` * **Asynchronous Cancellation;** 當執行緒在此狀態時,即使在執行函式庫呼叫或是系統呼叫時也能夠被取消,除非被定義為 cancellation-safe 否則應防止該情形發生 ## **2. Modern Microprocessors** [**[Source]**](http://www.lighterra.com/papers/modernmicroprocessors/) ### **2-1. Pipelining** * One instruction is executing, the next instruction is being decoded, and the one after that is being fetched... ![](https://i.imgur.com/PFADDeP.png) * At the beginning of each clock cycle, the data and control information for a partially processed instruction is held in a **pipeline latch**. An output just in time to be captured by the next pipeline latch at the end of the clock cycle. ![](https://i.imgur.com/c31XUUe.png) * Since the result is available after the execute stage, the next instruction ought to be able to use that value immediately. To allow this, forwarding lines called **bypasses** are added ![](https://i.imgur.com/qzyiSwb.png) ### **2-2. Deeper Pipelines – Superpipelining** The logic gates that make up each stage can be subdivided, especially the longer ones, converting the pipeline into a deeper super-pipeline with a larger number of shorter stages. * Processor can be run at **higher clock speed** (cuz workload for each cycle decreases) * Processor still completes 1 instruction per cycle -> more instructions per second ### **2-3. Superscalar** * The execute stage of the pipeline consists of **different functional units** (each doing its own task) :::info We can execute multiple instructions in parallel with the fetch and decode/dispatch stages enhanced to ***decode multiple instructions*** in parallel and send them out to the "execution resources." ::: ![](https://i.imgur.com/0NtROVJ.png) * There are independent pipelines for each functional unit * Simpler instructions complete more quickly * It's normal to refer to the depth of a processor's pipeline when executing integer instructions(usually the shortest) * A bunch of bypasses within and between the various pipelines * Even more functional units could be added ![](https://i.imgur.com/5JfbRQD.png) * The issue width is less than the number of functional units. :::info The number of instructions able to be issued, executed or completed per cycle is called a **processor's width**. ::: * Superpipeline + Superscalar (just called superscalar for short) ![](https://i.imgur.com/n5e57dF.png) ### **2-4. Very Long Instruction Word (VLIW)** The **instructions** are groups of little sub-instructions * Each instruction contains information for multiple parallel operations * Much like a superscalar, except the decode/dispatch stage is much simpler and only occurs for each group of sub-instructions * Most VLIW designs are not **interlocked** * not check for dependencies * often have no way of stalling instructions :::danger Compiler needs to insert the appropriate number of cycles between dependent instructions if necessary. ::: ### **2-5. Instruction Dependencies & Latencies** * The number of cycles between when an instruction reaches the execute stage and when its result is available to be used is called the **instruction's latency** * The deeper pipeline could easily get filled up with bubbles due to instructions depending on each other * Latencies for memory loads are particularly troublesome * difficult to fill their delays with useful instructions * unpredictable ### **2-6. Branches & Branch Prediction** * When processor encounters a **conditional branch**, it must make a guess to prevent losing performance gained from pipeline. * Those instructions will not be committed until the outcome of the branch is known. * guess wrong:the instructions cancelled (cycles wasted) * guess right:the processor will be able to continue on at full speed * How the processor make the guess * Static branch prediction:the compiler marks which way to go * a bit in instruction format to **encode** the prediction * **convention** * Guess at runtime:**on-chip** branch prediction table * two-level adaptive predictor * gshare or gselect predictor * implements several branch predictors and select between them based on which one working best for each individual branch * Deep pipelines suffer from [diminishing returns](https://en.wikipedia.org/wiki/Diminishing_returns) :::info The deeper the pipeline, the further into the future you must predict, the more likely you'll be wrong, and the greater the mispredict penalty. ::: ### **2-7. Eliminating Branches with Predication** ``` cmp a, 7 ; a > 7 ? ble L1 mov c, b ; b = c br L2 L1: mov d, b ; b = d L2: ... ``` Simplifiled with predicated instruction: :::info **Predicated instruction:** works by executing as normal, but only commits if condition is true ::: ``` cmp a, 7 ; a > 7 ? mov c, b ; b = c cmovle d, b ; if le, then b = d ``` Always doing the first mov then overwriting it if necessary. :::warning If the blocks of code in the if and else cases were longer, using predication would mean executing more instructions than using a branch ::: ### **2-8. Instruction Scheduling, Register Renaming & OOO** * Find a couple of other instructions from further down in the program to fill the bubbles caused by branches and long-latency instructions in the pipeline(s) * Two ways to do that: **1.** Reorder in hardware at **runtime**:the dispatch logic must be enhanced to look at groups of instructions and dispatch them out of order. **Register renaming:** * Not dealing with the raw architecturally-defined registers, but a set of **renamed registers** * By mapped to different physical registers, they can be executed in parallel * The processor must keep a mapping of the instructions and the physical registers in flight :::info A larger set of real registers extract even more parallelism out of the code ::: **2.** The compiler optimizes the code by rearranging the instructions, called static, or compile-time, instruction scheduling. * avoiding complex OOO logic * see further down the program than the hardware :::warning Without OOO hardware, the pipeline will stall when the compiler fails to predict something like a **cache miss** ::: ### **2-9. The Brainiac Debate** **brainiac vs speed-demon** debate: Whether the costly out-of-order logic is really warranted, or compilers can do well enough without it * Brainiac designs:with lots of OOO hardware trying to squeeze every last drop of instruction-level parallelism * Speed-demon designs:relying on a smart compiler ### **2-10. The Power Wall & The ILP Wall** Increasing the clock speed of a processor will typically increase its power usage even more * The transistors switch more often * The voltage also needs to be increased to drive the signals through the circuits faster to meet the shorter timing requirements * Leakage current also goes up as the voltage is increased Power increases linearly with clock frequency, and increases as the square of voltage $$fV^{2}$$ **Power wall:** not possible to provide much power and cooling to a silicon chip in any practical fashion **ILP wall:** normal programs don't have a lot of fine-grained parallelism ### **2-11. What About x86?** **Problem:** the complex and messy x86 instruction set. **Solution:** Dynamically decode the x86 instructions into RISC-like **micro-instructions (μops)**, then executed by a RISC-style **register-renaming** OOO superscalar core. ![](https://i.imgur.com/dv5SwP8.png) **Improvement:** * Buffer or μop instruction cache to avoid translate the same instructions :::info Pipeline depth * 14 stages when the processor is running from its L0 μop cache (which is the common case) * 19 stages when running from the L1 instruction cache and having to translate the x86 instructions into μops ::: ### **Threads – SMT, Hyper-Threading & Multi-Core** * If additional independent instructions aren't available, there is another potential source of independent instructions – other running programs, or other threads within the same program **->** To fill those empty bubbles in the pipelines * **Simultaneous multi-threading (SMT):** one physical processor core to present two or more logical processors to the system * duplicating all of the parts of the processor which store the "execution state" of each thread * other resources, such as the decoders and dispatch logic, the functional units, and the caches, are shared between the threads ![](https://i.imgur.com/dhYI4Fd.png) :::warning Should not be confused with multi-processor or multi-core processor, but there's nothing preventing a multi-core implementation where each core is an SMT design. ::: * Downsides: * If one thread saturates one functional unit which the other threads need, it effectively stalls all of the other threads * Competition between the threads for cache space * Applications which are limited primarily by memory latency benefit dramatically from SMT since it offers an way of using the otherwise idle time :::info The Pentium 4 was the first processor to use SMT, which Intel calls **hyper-threading**. ::: ### **2-12. More Cores or Wider Cores?** * The complex multiple-issue dispatch logic scales up as roughly the square of the issue width (n candidate instructions compared against every other candidate) * For applications with lots of active but memory-latency-limited threads more simple cores would be better * For most applications, there simply are not enough threads active, and the performance of just a single thread is much more important ### **2-13. Data Parallelism – SIMD Vector Instructions** * Rather than looking for ways to execute groups of instructions in parallel, SIMD make one instruction apply to a group of data values in parallel. :::info More often, it's called vector processing. ::: * The same operation as a 32-bit addition, except that every 8th carry is not propagated. ![](https://i.imgur.com/Fen0njg.png) * It is possible to define entirely new registers **->** more data to be processed in parallel ### **2-14. Memory & The Memory Wall** * Loads tend to occur near the beginning of code sequences (basic blocks), with most of the other instructions depending on the data being loaded **->** hard to achieve ILP * The facts of nature hinders fast memory system * speed of light:delays of signal transferred out to RAM and back * slow speed of charging and draining the tiny capacitors * **Memory wall:** the gap between the processor and main memory ### **2-15. Caches & The Memory Hierarchy** * A cache is a **small but fast** type of memory located on or near the processor chip, used to solve the problem of the memory wall * **Memory hierarchy:** The combination of the on-chip caches, off-chip external cache and main memory... (lower level **->** larger but slower) * Caches achieve amazing hit rates because most programs exhibit locality * **Temporal Locality:** there's a good chance a program will need to reaccess the same piece of memory in the near future **->** exploited by keeping recently accessed data in the cache * **Spatial Localityand:** there's a good chance a program will need to access other nearby memory in the future **->** data is transferred from main memory into the cache in blocks of a few dozen bytes **(cache line)** at a time * A cache works like a two-column table * Higher-end part of the address **(tag)** used for search * Lower part of the address used to index the cache ![](https://i.imgur.com/caDM5oI.png) * Cache Lookup * virtual address:the cache might need to be flushed on every context switch * physical address:virtual-to-physical mapping must be performed as part of the cache lookup * **virtually-indexed physically-tagged:** virtual-to-physical mapping (TLB lookup) can be performed in parallel with the cache indexing ### **2-16. Cache Conflicts & Associativity** * A cache usually only allows data from any particular address in memory to occupy one, or at most a handful, of locations within the cache * **Cache Conflict:** memory locations mapped to the same location are wanted at the same time * **Thrashing:** repeatedly accesses two memory locations which happen to map to the same cache line, and the cache must keep storing and loading from main memory * **Associativity:** The number of places a piece of data can be stored in a cache * Map * **Direct-mapped:** each piece of data is simply mapped to address % size within the cach * **Set-associative:** there are several tables, all indexed in parallel, and the tags from each table are compared [**延伸閱讀**](https://embedded2015.hackpad.com/ep/pad/static/QXYBh2n9wLD) ### **2-17. Memory Bandwidth vs Latency** * The transfer rate of a memory system is called its **bandwidth** * Increases bandwith:adding more memory banks and making the busses wider * Increases latency:Synchronously Clocked DRAM [**(SDRAM)**](https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory) [**Source of This Note**](https://hackmd.io/s/H161bNj0)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully