lzydaphne
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Tensor Core ## Background Matrix multiplications lie at the heart of Convolutional Neural Networks (CNNs). Both training and inferencing require the multiplication of a series of matrices that hold the input data and the optimized weights of the connections between the layers of the neural net. The Tesla V100 is NVIDIA's first product to include tensor cores to perform such matrix multiplications very quickly. Assuming that half-precision (FP16) representations are adequate for the matrices being multiplied, CUDA 9.1 and later use Volta's tensor cores whenever possible to do the convolutions. Given that very large matrices may be involved, tensor cores can greatly improve the training and inference speed of CNNs. ## Evolution of Tensor Core ![](https://lh7-us.googleusercontent.com/4KH6XU9dOBy3pvEmOVc28k-WDDJq3vqylce1qdUiYmUPNPSWhW5osyGiq7X-JsUQH9cUN-F3D6KR9kuPlGR9R3SPJpx8EfDZoYmA96lqj4e8qNo5wgE4yerOYMj3ZHSZyKRCZ6DM8mt8P_7MSaUUXNLryQ=s2048)![](https://lh7-us.googleusercontent.com/I7i3HTE7OXReIA3nSk3qWALpuvTg28Ptk1SC1mIfL5FAnAL-NOp-YG0DDHpzFYku-5hAwPGoC8G37pez21YmtQBh728oA_wlAfFkFqXYji4xtGEcqhaXZ3tnSLJE1qM-7XxD1GBm1Q-NCxbLbav7kZGqSw=s2048) Ampere Tensor Cores introduce acceleration for sparse matrix multiplication with fine-grained structured sparsity and a new machine learning data type called bfloat16 (BF16). Ampere Ar-chitecture redesigns the micro-architecture of Tensor Cores. ## What? > Aiming for higher throughput for GEMM( General Matrix Multiply) > 使用較低精度的 FP16 進行 MMA(matrix multiply-and-accumulate) 運算,讓加速運算速度的同時,又能夠最小化對於模型精準度的負面影響 The Tensor Core is - A fast hardware functional block for dense matrix multiplication (inner-product) - ![image.png](https://s2.loli.net/2024/01/31/5KYZasXNtihegVr.png) - A specialized cores that enable mixed precision training - Capable of performing **4 x 4 matrix multiply-and-accumulate (MMA)** operations, which are a series of multiplications and additions performed on arrays of numbers (matrices) - These operations are fundamental to the calculations for neural network training and inference. - When? - It was first introduced in NVIDIA’s Volta architecture . - Advantage? - Highly efficient - Provide a significant speedup for tasks that involve a large number of such matrix operations, which is typical in deep learning computations, such as the training and inference phases of neural networks. - Traditionally, the multiplication and addition would be separate operations, which would require more computational cycles and would involve storing intermediate results. - Problems? - Sparse Matrix - The cost of excessive computation of DNN due to the over-provisioned parameter space, reducing efficiency - [Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs](https://docs.google.com/presentation/d/1JmWpPJ8TDDdprFiVJbQRrRaSkS7sc3fXiLZIY5Jp1UM/edit?usp=sharing) - Current mma API vs legacy wmma API - wmma APIs support fewer operand shapes and can not leverage the new sparse matrix multiplication feature of the newest Ampere Tensor Core ### Tensor Core inside SM ![image.png](https://s2.loli.net/2024/01/30/c2RW6Z8DApQqdCB.png) - An SM consists of four subcores. - Each SM consists of four warp schedulers or four sub-cores to issue four warp instructions simultaneously - In each subcore, there is a warp scheduler, a math dispatch unit, streaming processor arrays for multiple data types (a.k.a. CUDA Cores), special function units (SFUs), two Tensor Cores, LD/ST unit, and register files. - Tensor Cores use the same memory hierarchy as CUDA cores. - Both Tensor Cores and CUDA cores can fetch data through direct addressing ( `ld` instruction) from the Shared Memory or global memory via the **per-thread** scheme - Per-thread means the instruction is executed by a single thread independently; both behavior and result of the thread are deterministic. - Tensor Cores have two special load instructions – recent `ldmatrix` and legacy `wmma:load` via the **per-warp** scheme - Per-warp means the instruction is executed by the 32 threads within the same warp cooperatively; neither behavior nor the result of individual thread is deterministic. - The L1 data cache and shared memory are shared among the four subcores within the SM. ### Tensor Core Architecture ![image.png](https://s2.loli.net/2024/01/30/usEkt5Gyq3irbNH.png) ![image.png](https://s2.loli.net/2024/01/31/LhbJACNZF72SRf4.png) #### How a WMMA operation is mapped to the Tensor Core architecture 1. There are **2 octets** in a Tensor Core. 1. Inside an octet, there are **8 dot product (DP) units**, each of which can compute a <u>4-dim vector dot product per cycle</u>. 2. **1 worktuple** is mapped to **1 octet** and thus each threadgroup takes four DP units 3. one set of HMMA instructions 是由 1 個 worktuple 處理 2. The octet has <u>operand buffers</u> to feed the worktuple via the tiled data when executing one set of HMMA instructions. 1. Each threadgroup has dedicated operand buffers for Operand A and Operand C. 2. **Each operand buffer can hold a 4×4 tile**. 3. **The operand buffer dedicated to operand B can hold a 4×8 tile** and the data inside are shared by the two threadgroups in the same worktuple 3. **One threadgroup** computes the multiplication of a 4×4 tile by a 4×8 tile in **a set of HMMA instructions**, which is 4×8=32 4-dim dot products. (總共需要 32 次內積運算) 1. Because **4 DP units compute 4個 4-dim vector dot products per cycle**, those set of HMMA instructions require at least 8 clock cycles to finish the computing of the 4×8×4 matrix multiplication ## How? ### 基本功能 - **Each Tensor Core is able to execute a 4×4×4 MMA (matrix multiplication and addition) in one cycle.** - Why does tensor core use 4 x 4 matrix ? - 為了對應 `threadgroup` 的分法? - tile的大小以M×N×K表示,A的維度是M×K,B的維度是K×N,C和D的維度是M×N。 - A => 4 x 4 - B => 4 x 4 - C => 4 x 4 - The Tensor Core in Volta GPUs provides **2 execution modes** - **FP16 Mode**: In this mode, the Tensor Core **reads three 4x4 FP16 matrices** as input and performs operations on them. - This mode uses 16-bit floating-point matrices, which are smaller and require less computational power than higher precision formats. - **Mixed-Precision mode**: This mode enables the use of both 16-bit and 32-bit floating-point matrices. It reads two 4x4 16-bit matrices and pairs them with a third 4x4 32-bit matrix for accumulation. This approach balances the need for speed (with 16-bit operations) and precision (with 32-bit accumulation) to improve the performance and accuracy of the computations. ![image](https://hackmd.io/_uploads/BJDTv2vc6.png) **內部零件:** ![image](https://hackmd.io/_uploads/ryvcJtGM0.png) ### 程式和架構運行 - 2 Tensor Cores are used concurrently by a warp #### How does Tensor core operate ![image.png](https://s2.loli.net/2024/01/30/WkcXFxgHsfJYwl2.png) - 32 threads within a warp collaborate to conduct the 16×16×16 matrix multiply and accumulate operation efficiently - To execute a WMMA, the 32 threads in a warp are divided into **8 threadgroups**. - Each threadgroups has 4 threads - All threads in a threadgroup work together to compute 4×4 tile multiplications - For better data reuse, **2 threadgroups** work together as  **a worktuple**. - Each **worktuple** is responsible for computing 一個 8×8 tile of D - Worktuple i consists of threadgroup i and threadgroup i + 4. - Thethreadgroup Id of a given thread is  $$\left\lfloor \frac{\text{threadId}}{4} \right\rfloor.$$ Each worktuple is responsible for com-puting one 8×8 tile of D. - For example, Worktuple 0 ( threadgroups 0, threadgroups 4) computes `D[0:7,0:7]` - To achieve this, Worktuple 0 multiplies `A[0:7,0:15]` and `B[0:15,0:7]`, adds the product 8×8 tile with `C[0:7,0:7]`, and saves the result to `D[0:7,0:7]`. ![image.png](https://s2.loli.net/2024/01/30/vSfhap4xnc7lbjI.png) ![image.png](https://s2.loli.net/2024/01/31/L6vsGSHigJy2uTQ.png) #### Compilation - During the compilation time, a WMMA operation breaks down into 4 sets of machine-level HMMA instructions - In the mixed precision mode, each set of HMMA instructions computes the product of a 4×4 tile of A and a 4×8 tile of B. - The 2 threadgroups in Worktuple 0 share a 4×8 tile of B in each set. - The 4×4 tiles of A are private to the threadgroups. - The four sets of HMMA instructions of Threadgroup 0 and 4 compute the product submatrices `D[0:3 ,0:7]` and `D[4:7 ,0:7]` ### Programming API In the CUDA programming model , Tensor Cores are exposed to programmers in the **CUDA WMMA(Warp Matrix Multiply and Accumulate) API**. - The WMMA API includes dedicated **matrix load and store** primitives, and **matrix multiply and accumulate** operations for Tensor Cores. - The WMMA matrix load and store operations are designed for moving data between register files and the memory hierarchy WMMA API in CUDA 12.2 - Support various tile size combination ![image.png](https://s2.loli.net/2024/01/31/xo1EYr3bNJcT6mK.png) 名詞解釋 - `tile`: A sub-section or block of a larger matrix that is processed as a unit by the GPU - **Warp Execution on Tile**: A "warp" is a group of threads that execute in lockstep on a GPU. A warp in NVIDIA's CUDA platform typically consists of 32 threads. When a warp is executing a matrix operation using Tensor Cores, it operates on a "tile" of the matrix. This tile is a submatrix or block of the larger data set that is being processed. - Advantage: - Processing matrices in tiles is beneficial because it allows the workload to be divided among multiple processing units (like Tensor Cores), enabling parallel computation. It also helps to fit the computation within the fast, but limited, shared memory space available to the GPU's multiprocessors. - `fragment`: A data structure that holds a part of the tile data - A concept that is specific to NVIDIA's WMMA (Warp Matrix Multiply-Accumulate) API for Tensor Cores - **Fragment in Registers**: A "fragment" is a portion of the tile that each thread or a group of threads in the warp works on. Fragments are small enough to fit into the fast-access register memory of the individual threads. This allows each thread to quickly access and process its assigned portion of the tile. - Advantage: - By breaking a tile into fragments, the WMMA API enables fine-grained parallelism where each thread in a warp can work on its own fragment. This allows the Tensor Cores to be used efficiently since each core can process multiple fragments simultaneously To put it into the flow of execution: 1. The larger matrix is divided into tiles that are appropriate for processing by the warp's Tensor Cores. 2. Each tile is then broken down into fragments. 3. The fragments are loaded into the registers of the threads in the warp. 4. The warp's Tensor Cores perform the matrix operations on these fragments in parallel. 5. The results are then written back to the main memory or used in further computations as required. ```cpp template<typename Use, int m, int n, int k, typename T, typename Layout=void> class fragment; void load_matrix_sync(fragment<...> &a, const T* mptr, unsigned ldm); void load_matrix_sync(fragment<...> &a, const T* mptr, unsigned ldm, layout_t layout); void store_matrix_sync(T* mptr, const fragment<...> &a, unsigned ldm, layout_t layout); void fill_fragment(fragment<...> &a, const T& v); void mma_sync(fragment<...> &d, const fragment<...> &a, const fragment<...> &b, const fragment<...> &c, bool satf=false); ``` `fragment`:Tensor Core資料儲存類,支援matrix_a、matrix_b和accumulator `load_matrix_sync`:Tensor Core資料載入API,支援將矩陣資料從global memory或shared memory載入到fragment `store_matrix_sync`:Tensor Core結果儲存API,支援將計算結果從fragment儲存到global memory或shared memory `fill_fragment`:fragment填滿API,支援常數值填充 `mma_sync`:Tensor Core矩陣乘計算API,支援D = AB + C或C = AB + C ![image.png](https://s2.loli.net/2024/01/31/Q9vpRD7JNrejYE6.png) The programming of Tensor Cores has also evolved over the generations. Figure 2 illustrates the legacy and current programming interface for Tensor Cores. The first step is to load data from Shared Memory to the Register File and then use Tensor Cores to accelerate the matrix computation. Legacy wmma interface was introduced with first-generation Volta Tensor Cores. In gen-eral, the legacy wmma programming interface can satisfy basic usage of Tensor Cores and requires less programming efforts, because wmma.load can help manage the special input operand storage layout in the Register File for Ten-sor Cores. - However, wmma instructions can only leverage limited features of the Tensor Cores. - Specifically, it **can not use sparse acceleration and has fewer choices of the matrix operand shape**. - Furthermore, wmma.load instruc-tions have more strict requirements for the data layout in Shared Memory. By contrast, the new programming interface provides access to all features of the Tensor Cores (i.e. more matrix operand shapes and novel sparse acceleration). The vendor’s library CUTLASS also chose the new programming interface as the underlying implementation for the best possible perfor-mance. Therefore, users should use the new programming interface when seeking for best possible performance or exploiting the new sparse acceleration feature #### differences between legacy wmma.mma and current mma PTX instruc-tions ![image.png](https://s2.loli.net/2024/01/31/ZqaoXRtnW7duTyx.png) ## MMA, FMA, GEMM MMA (Matrix Multiply and Accumulate) and FMA (Fused Multiply-Add) are two types of arithmetic operations that are optimized in modern processors to perform computations efficiently, but they operate on different scales and are used in different contexts. | Feature/Operation | Fused Multiply-Add (FMA) | Matrix Multiply and Accumulate (MMA) | General Matrix Multiply (GEMM) | |-------------------|--------------------------|--------------------------------------|-------------------------------| | **Scale** | Individual floating-point numbers | Matrices (e.g., 4x4 matrices) | Matrices of any size | | **Operation** | Performs a single multiply-add operation: <br>result = a x b + c | Involves matrix multiplication followed by addition to an accumulator matrix D = A x B + C| Performs matrix-matrix multiplication followed by matrix addition: <br>C = $\alpha$ A x B + $\beta$ C | | **Purpose** | Reduces rounding errors and increases performance for high precision computations | Accelerates machine learning and graphics by optimizing matrix operations | Generalized operation for linear algebra computations across various applications | | **Hardware** | Supported by most modern CPUs and GPUs, executed in a single instruction cycle | Executed on specialized hardware like NVIDIA's Tensor Cores | Software-implemented but optimized for hardware capabilities (e.g., using SIMD instructions) | | **Application** | Widely used in numerical computations and is a part of CPU instruction sets like FMA3 or FMA4 | Utilized in specialized domains, particularly deep learning on GPUs | Core function in BLAS, crucial for high-performance computing, scientific computing, and deep learning | ### Tensor core 和 Cuda core 差別 - tensor core和cuda core 都是運算單元,是硬體名詞,其主要的差異是算力和運算場景。 - 場景:cuda core是全能通吃型的浮點運算單元,tensor core專為深度學習矩陣運算設計。 - 算力:在高精確度矩陣運算上 tensor cores 大勝 cuda cores。 | 特性 | Tensor Core | CUDA Core | | ---- | ---- | ---- | | **出現背景** | AI快速發展,對矩陣乘法的算力需求不斷增大,有廠商提出TPU概念 | 科學計算迅速發展,為了使用GPU的高算力,科學家需要將科學計算任務適配成圖形影像任務 | | **設計目的** | 強化矩陣運算能力(AI算力),融入CUDA生態 | 將高平行度的浮點運算能力提供給科學運算領域,加速科學運算,佔領科學計算市場 | | **計算任務類型** | 矩陣乘法運算 | 浮點加、乘、乘加運算 | | **計算精度** | 強調低精度: fp16/fp8/int8/int4/b1 | IEEE-754 標準的float精度 | | **SM上的裝配數目** | 8 | 128 | | **調度粒度/執行粒度** | warp-level/warp-level | warp-level/thread-level | | **每GPU Cycle的計算速度** | multiply two 4 x 4 F16 matrices and add the multiplication product fp32 matrix per GPU cycle | 1 single precision multiply-andaccumulate operation per GPU cycle | | **運算速度** | 125 Tflops/s | 15.7 Tflops/s | ## Lesson of Tensor Core ### Thread-Level Parallelism (TLP) for MMA Execution - **TLP in Action**: Tensor Cores demonstrate Thread-Level Parallelism by dividing matrix operations across multiple threads within a warp. This form of parallelism can be leveraged in other contexts where tasks can be decomposed into smaller, independent sub-tasks. ### Special Functional Units for DP Calculation - **Dedicated Hardware for Specific Tasks**: The specialized nature of Tensor Cores for double-precision (DP) and mixed-precision calculations shows the benefits of having dedicated hardware for specific types of computation. This can lead to both increased speed and efficiency. ### Data Reuse ![image.png](https://s2.loli.net/2024/01/31/kWEzAGCeoF2Odbu.png) - **Optimizing Data Access**: By reusing data effectively (such as keeping a tile in shared/local memory for as long as possible), we can reduce the need for slow memory accesses. This principle of data locality and reuse is essential for optimizing performance in any system. ### ISA Support - **Instruction Set Architecture (ISA) Extensions**: Tensor Cores require support from specialized ISA (like WMMA in CUDA) provided by the compiler. This demonstrates the importance of ISA extensions in unlocking new hardware capabilities and optimizing performance. # Ref: - Course slides: Accelerator Architectures for Machine Learning Lecture 8: Tensor Core, Tsung Tai Yeh - Maohua Zhu, Tao Zhang, Zhenyu Gu, and Yuan Xie. 2019. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA, 359–371. https://doi.org/10.1145/3352460.3358269 - Sun, W., Li, A., Geng, T., Stuijk, S., & Corporaal, H. (2022). Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors. IEEE Transactions on Parallel and Distributed Systems, 34(1), 246-261. - [Programming Tensor Cores in CUDA 9 | NVIDIA Technical Blog](https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully