Corn
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 【ML】TinyML and Efficient Deep Learning Computing Note ## [Lecture Website](https://hanlab.mit.edu/courses/2023-fall-65940) ## Lecture 2 Basics of Deep Learning ### Deep Learning Continuous to Scale 深度學習模型持續擴大,模型的大小是指數型上升的,資本的力量! ![image](https://hackmd.io/_uploads/S1X0kOJER.png =75%x) 問題是算力無法跟上,記憶體也是,所以才有AI Data Center需求跟分布式訓練 ![image](https://hackmd.io/_uploads/S18XlOy4A.png =75%x) ### Grouped Convolution Layer #### 背景 AlexNet作者當年背景,由於GPU不是很發達,所以在做卷積運算時,需要分割卷基層的權重跟輸入的特徵,因而誕生。 #### 概念 Group Convolution Layer 是一種卷積層,它將輸入特徵圖分成多個組,對每個子組進行卷積操作。 - 輸入特徵圖 : $X : (n, c_{i}, h_{i}, w_{i})$ - 輸出特徵圖 : $Y : (n, c_{o}, h_{o}, w_{o})$ - Weights : $W : (c_o, c_i, k_h, k_w) \rightarrow (\text{g} \cdot c_{o}/\text{g}, c_i/\text{g}, k_h, k_w)$ - Bias:$b: (c_{o},)$ | Variable | Define | | -------------- | --------------------- | | $n$ | Batch size | | $c_{i}, c_{o}$ | Input/Output Channels | | $h_{o}, h_{o}$ | Input/Output height | | $w_{o}, w_{o}$ | Input/Output width | | $k_h$, $k_w$ | Kernel Height/Width | | $g$ | Groups | ![image](https://hackmd.io/_uploads/Skz5uMc7R.png) ### Depthwise Convolution Layer #### 背景 google的MobileNet,需要在手機等邊緣裝置上裝載模型,需要降低卷基層 #### 概念 將 Grouped Convolution $g = c_o = c_i$ 變成為 Depthwise Convolution,每一個kerenl只會跟一個特徵層做卷積,在空間特徵上失去了不同通道特徵的交互作用,因此會影響效果 - 輸入特徵圖 : $X : (n, c_{i}, h_{i}, w_{i})$ - 輸出特徵圖 : $Y : (n, c_{o}, h_{o}, w_{o})$ - Weights : $W : (c_o, c_i, k_h, k_w) \rightarrow (c_{o}, k_h, k_w)$ - Bias:$b: (c_{o},)$ ![image](https://hackmd.io/_uploads/HJmzFfqXR.png) ### Normalization Layer 特徵的正規化有以下優點 * 穩定訓練過程,確保特徵保持一定分布 由於每次特徵分布都確保是平均為0方均差為1 * 正歸化會導致特徵失去原有分布,故新增兩個可以學習參數 $(\gamma,\beta)$ 作為補償 \begin{align} \mu &= \frac{1}{N} \sum_{i=1}^{N} x_i \\ \sigma^2 &= \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2 \\ \hat{x}_i &= \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &= \gamma \hat{x}_i + \beta \end{align} ![image](https://hackmd.io/_uploads/ry6mG6Y7R.png) 在 Pytorch 實現中,推論需要將模型手動設置成推論模式,才會不影響 Batch Layer 的 running mean 跟 running val ```python model.eval() ``` Feature Scaling 能夠讓下一層的Layer看到分布上一至的特徵,不會因為某個原因造成特徵值過大導致偏差過大,這樣的作法能夠讓神經網路的Loss對Weight的curve更加平滑,或者說明讓下一層的特徵能夠看到一致分布的特徵 ![image](https://hackmd.io/_uploads/ByiCQ7cQA.png =70%x) ### Activation Functions * 引入非線性的元素,讓神經網路可以逼近任意函數 * 不同的線性函數擁有不同的梯度狀況 * $\text{ReLU}$ 最省成本比較好進行量化,並維持正向響應神經元的梯度, ![image](https://hackmd.io/_uploads/rJmceOkVA.png) ### Efficiency Metrics How should we measure the efficiency of neural networks? **3個影響模型大小的因素** 1. 模型大小 $\rightarrow$ 存儲空間有關(Parameters vs Model Size) 2. 運算速度 $\rightarrow$ 運算的延遲速度(Latency vs Throughput) 3. 能耗 $\rightarrow$ 消耗的能量為多少(Memory > Compute) **2個核心指標** 1. Compute 計算 2. Memory 記憶體空間 ![image](https://hackmd.io/_uploads/SkxJZ_1ER.png) ### Latency 延遲 Measure the delay for a specific task ![image](https://hackmd.io/_uploads/Hyl_RYg4R.png) ![image](https://hackmd.io/_uploads/SyUgy9gN0.png =85%x) ### Throughput 吞吐量 Measures the rate at which data is processed ![image](https://hackmd.io/_uploads/ryUFRFeEC.png) ### Latency 延遲 vs Throughput 吞吐量 Latency與 Throughput 之間是沒有絕對關係的 1. 高吞吐量不代表低延遲,可能他是平行處理的 2. 低延遲不代表高吞吐量,可能他是單線程 ![image](https://hackmd.io/_uploads/HJgy19eVR.png) ### Energy Consumption 能耗上,將 **Data的移動** 與 **Data reference** 會造成更多能量消耗,DRAM就是個最耗能的操作 ![image](https://hackmd.io/_uploads/B1o0rcgER.png) ### Number of Parameters | Layer | \# Parameters (bias is ignored) | | --------------------- |:-------------------------------------------------------------------------------------------------------------- | | Linear Layer | $c_o \cdot c_i$ | | Convolution | $c_o \cdot c_i \cdot k_h \cdot k_w$ | | Grouped Convolution | $\frac{c_o}{g} \cdot \frac{c_i}{g} \cdot k_h \cdot k_w \cdot g$ $=c_o \cdot c_i \cdot k_h \cdot k_w \cdot \frac{1}{g}$ | | Depthwise Convolution | $c_o \cdot k_h \cdot k_w$ | ### Model size * Model Size is measures the storage for the weights of the given neural network. * In general, if the whole model uses the same data type * $Model size = \text{parameter} \cdot \text{Bit Width}$ * Example:Alex Net has 61M parameters * 32 bit * $61M \cdot 4 Bytes = 224 MB (224 \times 10^6 Bytes)$ * 8 bit * $61M \cdot 1 Bytes = 61 MB (224 \times 10^6 Bytes)$ * Example GPT3 has 1750B * 32 bit * $175B \cdot 4 Bytes = 700GB (700 \times 10^9 Bytes)$ ### Number of Activations * Activations就是激活函數的輸入或輸出,兩種定義都有,這邊就是要量測輸入輸出的特徵大小因為這有可能是IOT設備上的瓶頸(Bottleneck) * Activations在不同層的分布會差很多喔! 下圖表明一件事情,參數量降低後,但是 Peak Activation 並不會降低太多,由於inverted bottleneck 上設計就是將特徵先擴展,並用較大 kernel 取特徵,最後在還原回來。 ![image](https://hackmd.io/_uploads/r1BKzh1U0.png =75%x) ### Number of Multiply-Accmulate Operation * Multiply-Accumulate operation (MAC) 乘積累加運算,將b跟c乘完後跟累加器A相加在放入累加器 * Matrix-Vector Multiplication (MV) * General Matrix-Matrix Multiplication (GEMM) ![image](https://hackmd.io/_uploads/r19pHOlER.png) 備註:$\text{MACs}$為$\text{MAC}$複數版本而已 ### Number of Floating Point Operations (FLOP) MAC包含了兩次浮點數操作 1. 乘法操作 2. 加法操作 重點: 約等於兩倍的$\text{MAC}$ $\text{FLOP} = 2 \times \text{MAC}$ | Layer | FLOPs Calculation | | --------------------- |:------------------------------------------------------------------------- | | Linear Layer | $c_o \cdot c_i$ | | Convolution | $c_o \cdot c_i \cdot k_h \cdot k_w \cdot h_o \cdot w_o$ | | Grouped Convolution | $c_o \cdot c_i \cdot k_h \cdot k_w \cdot \frac{1}{g} \cdot h_o \cdot w_o$ | | Depthwise Convolution | $c_i \cdot k_h \cdot k_w \cdot h_o \cdot w_o$ | ## Lecture 11 TinyEngine and Parallel Processing ### Parallel Computing Techniques #### Loop Reordering 迴圈重新排序 執行迴圈的時候讀取資料的時候,可以按行 ( row-major order ) 與按列( column-major order )循訪中,依照環境放元素的方式選擇,可以在不改變迴圈的情況下達成加速,**因為可以增加cache的命中率**,進而達成性能的提升。 * Improve Data Locality * Data movement(cache miss) is much more expense. * Chuck of memory is fetched at a time(cache line). * Reduce Cache Miss * Chang the order of loop iterative variable * Index Order: $(i,\ j,\ k) \rightarrow (i,\ k,\ j)$ ![Loop-Reordering](https://hackmd.io/_uploads/rk6Y8X1b1e.png) #### Loop Tiling 循環分磚 在面對巨型資料處理時,如果矩陣大小大到 **cache** 放不下,那麼在執行 Data Reference 就有可能將資料從 cache 放出來,以讓其他元素可以進行操作,這樣便會造成 cache miss 的開銷,因此需要將大矩陣分解成小矩陣依序處理 * 為什麼能夠改進 Cache miss的情況 * 使用 Partition Loop * 確保資料在 Cache 裡面,直到被使用為止 * 讓 Loop 循環的元素大小可以 Fitting 到 Cache Size 是最佳的 ![image](https://hackmd.io/_uploads/H1kv_mJZ1x.png) ### Loop Unrolling 循環展開 也就是將迴圈操作展開,暴力解法,對於小 kerenl 卷積層適用,其實就是將迴圈一部份展開 * 能夠減少循環的維護(循環條件判斷與計數器累加) * 可能增加程式碼的大小 ![image](https://hackmd.io/_uploads/Hyj2dmyZ1x.png) ### SIMD (single instruction, multiple data) programming #### What is the Instruction set * 作為軟體與硬體間的溝通橋樑,定義了軟體如何控制硬體 * 能夠知道處理器甚麼時候完成與如何完成 ### Instruction set Types 1. Complex Instruction Set Computer(CISC) 2. Reduced Instruction set Computer(RISC) ### SIMD introduction 一種平行化處理的技術,透過單一指令將多組資料同時操作 * Vector Register (向量暫存器) * 一種特殊的暫存器,可以儲存與處理多筆資料 * Vector Operations (向量操作) * 對多筆資料進行邏輯與算數運算 * 優勢 * 大幅增加程式碼的 Throughput and speed * 增加能量的使用效率 ![image](https://hackmd.io/_uploads/BkhwkHy-Jx.png) ### SEE SSE 是 Intel的SIMD擴展, * **SSE:** `_mm_load_ps`/`_mm_mul_ps`/`_mm_add_ps` * **mm :** multi-media * **load/mul/add :** load/multiply/add * **ps :** packed single-precision ### NEON NEON 是ARM處理器的SIMD擴展, * **NEON:** `vld1q_f32`/`vmulq_f32`/`vaddq_f32` * **v :** vector * **ld/mul/add :** load/multiply/add * **1 :** number of vector * **q :** quadword(8 bytes) ### Multi-threading 使用多個 Thread加速處理程序,多 Thread 處理的資料最好不要有相關性,最好是獨立的,避免資料同步問題 ![image](https://hackmd.io/_uploads/ByULVG2Xkx.png) #### Two Type of Tools 1. Pthreads * A C library for creating and managing POSIX threads 2. OpenMP * An API for C/C++ and Fortran to support parallel programming using shared-memory model. ### CUDA Programming 詳情參照 - [【CUDA】CUDA Programming Note](/-vx2yQneSsWW84Y06zU2ZQ) ### im2col ### Point-Wise Convolution 在 Pytorch 中資料格式為 $B \times C \times H \times W$ ,如果今天使用 $\text{Conv} 1 \times 1$ ,會造成 Cache Miss,因為每次 Reference Tensor 都需要跳著處理元素,因此會先將 Tensor Memory Layout 處理成 $B \times H \times W \times C$ ,就能夠增加 Data Locality ![image](https://hackmd.io/_uploads/B1DiTvLXge.png) ### Depth-Wise Convolution 而 Depth-Wise 則跟 Point-Wise 則是相反的,採用 $B \times C \times H \times W$ 的 Memory Layout ![image](https://hackmd.io/_uploads/S1nl0vL7le.png) ### Winograd Convolution 加速卷積運算 # Reference 1. [animatedai](https://animatedai.github.io/)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully