aakashlahoti14
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICML 2024 BiMamba Rebuttals ## Response to All Reviewers We would like to express our sincere gratitude to the reviewers for the valuable feedback and constructive suggestions. We are glad that the reviewers found our novel *Matrix Mixer* view for sequence modeling interesting and effective, and that they found the paper to be well-written and easy to follow. In this shared response, we address some of the common concerns, and provide new empirical results and analyses: 1. **GLUE Performance:** In addressing the concerns raised by reviewers zzad and PoE8 with our model's GLUE performance, we gently point out that the results in the submission **exceeded the Transformer baseline without any hyperparameter tuning whatsoever**. We have since substantially improved our results, **now leading BERT by 1.1, and M2 by 3.4 points** through minimal tuning of the finetune hyperparameters, all the while using less compute than the baselines. We further highlight that our model **excels across domains, achieving a top 1% ImageNet accuracy of 81% vs. ViT's 78.8%.** 2. **Limitations:** The reviewers have rightfully noted the need for a discussion on the limitations of our framework. We agree that this is an important discussion that is rich in the nuances of **Representation-Computation tradeoff and hardware efficiency concerns**. We delve into the details of these nuances below. 3. **Reproducibility:** We provide step-by-step instructions to pretrain the Bi-Mamba model and evaluate it on all GLUE tasks. We will open-source the model code and instructions to reproduce all main results from the submission. <!-- different inductive biases --> ### Performance: ImageNet and GLUE Before we present our improved GLUE results, we deem it important to highlight that the paper's scope extends beyond language tasks alone; **our proposed framework and method is broadly applicable to all sequence modeling tasks**. Notably, Bi-Mamba achieves substantially better results on ImageNet than prior methods using the standard recipe [1], **achieving top-1% accuracy of 81% vs ViT's 78.8%**, demonstrating our method's efficacy. For the rest of this section, we expand on new results on the GLUE benchmark. We recognize the concerns from reviewers zzad and PoE8 regarding Bi-Mamba's model size and performance on the GLUE benchmark. In light of their feedback, we have made significant improvements to our results - **outperforming BERT by 1.1 points, a considerable jump from our previous 0.4 point lead**. To achieve this, we made the following adjustments: - **Fine-tuning Strategy**: We want to emphasize that the well-established BERT and M2 models benefit from highly optimized training and finetuning recipes. **The GLUE scores in our submission were produced using the M2 recipe out-of-the-box, without any hyperparameter tuning**, despite which Bi-Mamba outperformed both BERT and M2. To improve our results, **we only do a short sweep for the learning rate and number of epochs for finetuning tasks**. We ensure that the number of epochs do not surpass their original values for fairness. Please refer to Table 1 for the finetune recipes used by BERT, M2, and Bi-Mamba . - **Parameter Matching BERT and Bi-Mamba**: We reduce the number of layers of BiMamba from $24$ to $23$ (n_param: 116M &rarr; 112M) to parameter match the model to BERT (110M) **Table 1:** The finetuning hyperparameters of learning rate and number of epochs for each task. We highlight the differences in finetuning recipes from BERT by **bold** texts. | Model | MNLI | QNLI | QQP | RTE | SST2 | MRPC | COLA | STS | |---------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------| | BERT [2] | lr=5e-5, wd=5e-6, epochs=3, seq_len=256 | lr=1e-5, wd=1e-6, epochs10, seq_len=256 | lr3e-5, wd=3e-6, epochs=5, seq_len=256 | lr=1e-5, wd=1e-6, epochs3, seq_len=256 | lr3e-5, wd=3e-6, epochs3, seq_len=256 | lr=8e-5, wd=8e-6, epochs=10, seq_len=256 | lr5e-5, wd=5e-6, epochs=10, seq_len=256 | lr3e-5, wd=3e-6, epochs=10, seq_len=256 | | M2 | lr=5e-5, wd=5e-6, epochs=3, seq_len=**128** | lr=**5e-5**, wd=1e-6, epochs=10, seq_len=**128**, **pool_all=True**^1 | lr=3e-5, wd=**1e-2**, epochs=**10**, seq_len=**128** | lr=1e-5, wd=**1e-2**, epochs=**6**, seq_len=**128** | lr=3e-5, wd=3e-6, epochs=3, seq_len=**128** | lr=**5e-5**, wd=**1e-2**, epochs=10, seq_len=**128** | lr=5e-5, wd=5e-6, epochs=10, seq_len=**128** | lr=**7e-5**, wd=**1e-2**, epochs=10, seq_len=**128** | | Bi-Mamba | lr=**1e-4**, wd=5e-6, epochs=**2**, seq_len=256 | lr=**5e-5**, wd=1e-6, epochs=**7**, seq_len=256 | lr=**5e-5**, wd=3e-6, epochs=**3**, seq_len=256 | lr=1e-5, wd=1e-6, epochs=3, seq_len=256 | lr=**5e-5**, wd=3e-6, epochs=**2**, seq_len=256 | lr=8e-5, wd=8e-6, epochs=10, seq_len=256 | lr=**1e-4**, wd=5e-6, epochs=10, seq_len=256 | lr=3e-5, wd=3e-6, epochs=10, seq_len=256 | ^1: Global average pooling over input tokens, instead of appending [CLS] token Our updated GLUE scores for BiMamba (112M) are listed in Table 2. We note that **Bi-Mamba surpasses the heavily-tuned scores of BERT and M2 across all tasks; Bi-Mamba achieves a 0.4 score lead in MNLI and an overall lead of 1.1 GLUE score compared to BERT.** **Table 2:** *The updated GLUE scores for BiMamba. The reported numbers are the averages of five runs.* | Model | #Params | MNLI | QNLI | QQP | RTE | SST2 | MRPC | COLA | STS | AVG | |----------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------| | BERT | 110M | 84.1 | 89.8 | 91.2 | 77.2 | 91.2 | 87.5 | 54.6 | 88.9 | 83.2 | | M2 | 116M | 80.5 | 86.0 | 87.0 | 69.3 | 92.3 | 89.2 | 56.0 | 86.9 | 80.9 | | Bi-Mamba (w/ M2 recipe) | 116M | 83.7 | 89.7 | 89.7 | 77.4 | 92.8 | **91.5** | 54.7 | **90.1** | 83.7 | | Bi-Mamba | 112M | **84.5** (*Δ=+0.8*) | **90.0** (*Δ=+0.3*) | **91.3** (*Δ=+1.6*) | **77.5** (*Δ=+0.1*) | **93.5** (*Δ=+0.7*) | 91.2 (*Δ=-0.3*) | **57.2** (*Δ=+2.5*) | 88.9 (*Δ=-1.2*) | **84.3** (*Δ=+0.6*) | [1] *A ConvNet for the 2020s. Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie* [2] *Mosaicbert: A bidirectional encoder optimized for fast pretraining. Jacob Portes, Alexander Trott, Sam Havens, Daniel King, Abhinav Venigalla, Moin Nadeem, Nikhil Sardana, Daya Khudia, Jonathan Frankle* ### Limitations and Discussion of Our Method We appreciate the reviewers bringing up the absence of a discussion of the limitations of our framework and method. We recognize its importance and we now delve into the nuances of some of the trade-offs and concerns involved: **1. Representation-Computation Tradeoff:** While **structured matrix mixers are computationally more efficient** than their dense matrix mixer counterparts like softmax attention, **they are also representationally less expressive**, which may be seen as a limitation of these methods. For instance, concurrent works [3,4,5] have begun investigating the representational power of SSMs by analyzing their performance on memorization-centric tasks. They report that SSMs with a fixed model capacity are eventually outperformed by softmax attention for longer sequences. This can be viewed as a consequence of the matrix being *too structured*, and hence *too inexpressive* for the problem. On the other hand, we remark that the *degree of structure* of a structured matrix is a knob that can be tuned according to the specific task, that is we can tradeoff the **computational efficiency of the method for larger expressivity**. For instance, within the structured matrix class of low rank matrices, we can tune the rank upto the size of the matrix, which is the sequence length. As the rank of the matrix class increases so does its expressive power; however, at the same time it also diminishes the compute efficiency associated with the matrix being low rank. As another example, in response to Reviewer PoE8's second question on the performance of SSMs on retrieval-style tasks, **we demonstrate this tradeoff for SSD**, which is the modern variant of Mamba. Specifically, we show that **SSD is able to recover the accuracy attained by softmax attention once we control for the compute capacity** of the model. In contrast, hardware limitations of the selective scan algorithm make it impractical to match the compute capacity in Mamba, explaining the emerging findings from [3,4,5] that SSMs underperform on memorization-centric tasks. This makes it evident that the development and analysis of SSMs is an active area of research with substantial room for exploration and improvement. **2. Hardware Efficiency:** Despite the fact that structured matrices have associated sub-quadratic matrix multiplication algorithms, their implementation **may not be *hardware-friendly*,** which can reduce the execution speed in practice. In the next revision of the paper, we will include a comprehensive discussion of the limitations associated with structured matrices. [3] *Zoology: Measuring and Improving Recall in Efficient Language Models. Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré* [4] *Simple linear attention language models balance the recall-throughput tradeoff. Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré* [5] *Repeat after me: Transformers are better than state space models at copying. Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach* ----------------------------------------------------------------------------- ### Reproducibility of results Recognizing the importance of reproducibility, we are pleased to share the source code for our Bi-Mamba through this [URL](https://gist.github.com/anonymous-icml-10127/c45d3f762d2a27299ac2340fde5cc713). After following the preparation steps described in [M2 repository](https://github.com/HazyResearch/m2) and properly placing the provided codes, the results in Table 2 can be reproduced using the following commands. ``` # Pretrain on C4 composer -n 8 main.py BiMamba-24layers-116M-C4.yaml # Finetune from C4-pretrained weights python glue.py BiMamba-24layers-116M-GLUE.yaml # Print random seeds ls ./local-finetune-checkpoints/BiMamba-24layers-116M/task=mnli/ | grep -oP 'seed=\K\d+' # Finetune from MNLI-pretrained weights # Insert a seed number into {SEED} printed by the above command python glue.py BiMamba-24layers-116M-GLUE.yaml from_mnli=True \ base_run_name="BiMamba-24layers-116M-from-mnli" \ local_pretrain_checkpoint_folder="./local-finetune-checkpoints/BiMamba-24layers-116M/task=mnli/seed={SEED}" ``` We are finalizing preparations for the public release of the code to further facilitate research transparency. To note, the provided code employs PyTorch exclusively for ease of understanding. For the public release, the 'mamba_chunk_scan_fused' function will be substituted with a Triton-based alternative, which significantly enhances training speed. We welcome any further requests for clarification or additional information regarding our training procedures. <!-- [tell them why we have not released full triton code?] We have not provided this version in this response because it is from the "State Space Duality" submission from the supplemental that our paper is based on--> ------------------------------------------ ## Response to Reviewer 1 We are glad by the reviewer's recognition of our work's contributions, particularly how the Matrix Mixer view offers valuable insights into the performance of Transformers and the latest SSMs. We now turn to addressing their questions and concerns. > 1. The reproducibility of results is not clear. Addressing the reviewer's concerns on reproducibility, we provide step-by-step instructions for replicating the pretraining and finetuning results of Bi-Mamba, as outlined in the common response. ---------------- > 2. the proposing of bidirectional mamba is somehow straightforward We agree with the reviewer that using bidirectional sequence models is quite common in the literature. We further note that many approaches have been developed for incorporating bidirectionality into recurrent models [1]. However, **these approaches treat the causal sequence mixer as a black-box and utilize heuristics like addition, concatenation, hadamard product to devise bidirectional encoders**. From various heuristic extensions of Mamba on bidirectional settings developed by academics [2,3,4], and from the GitHub [issues](https://github.com/state-spaces/mamba/issues/99) and [pull requests](https://github.com/state-spaces/mamba/pull/52) raised by practioners, it is clear **that this is not a settled problem in the machine learning community**. In our work, we approach this problem **under a more natural paradigm of structured sequence mixers**. This premise allows us to narrow down the architecture search space, which would otherwise be very broad. We also show that various existing, performant sequence mixers can be subsumed under this framework, giving it further credance. Under this framework, we then **principally arrive at Quasiseparable matrices**, as our matrix mixer of choice for Bi-Mamba. Through comprehensive ablations (see Table 4 and Figure 3 in the paper), **we demonstrate that the theoretically motivated structured matrix approach outperforms the other heuristically motivated bidirectional methods.** [1] *Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. Alex Graves, Santiago Fernández, Jürgen Schmidhuber*. [2] *Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang*. [3] *VMamba: Visual State Space Model. Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Yunfan Liu*. [4] *Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling. Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov*. > 3. Although using more parameters, Bi-Mamba does not show clear improvements over previous models. We understand the reviewer's concern on the model size and the performance of Bi-Mamba on GLUE tasks. We have now significantly improved our results - **Bi-Mamba achieves a lead of 0.4 points in MNLI and an average lead of 1.1 points across all tasks, compared to the heavily tuned BERT results**. Furthermore, we have **parameter matched Bi-Mamba** (116M &rarr; 112M) to BERT (110M). To achieve these improvements, unlike our initial approach of using out-of-the-box M2 recipe, we perform **a minimal sweep over learning rates and number of epochs for finetuning tasks**. For complete details, we invite the reviewer to refer to our shared response. > 4. Limitations of structured matrices. We are glad that the reviewer raised this question, and we kindly request the reviewer to view the shared response for a detailed discussion on the **Representation-Computation tradeoff, and the hardware concerns** associated with structured matrices. ------------------------------------------ ## Response to Reviewer 2 We sincerely appreciate the reviewer's acknowledgement of the contributions of our work, particularly highlighting the insights gained from the Matrix Mixer perspective and the positive remarks on our development of Bi-Mamba, which demonstrates notable performance. We are pleased to address the reviewer's concern. > It could not be possible to encompass all algorithms via the Matrix Mixer view. We acknowledge that the vast landscape of potential sequence mixing algorithms signifies that some may not fit within the Matrix Mixer framework. Nevertheless, the Matrix Mixer perspective remains a valuable framework for developing innovative algorithms. Our paper introduces new matrix mixers that achieve commendable accuracy, as shown in Table 5. We have conducted more experiments and evaluated additional methods, details of which will be incorporated in the next revision. ------------------------------------------ ## Response to Reviewer 3 We are grateful for the reviewer's recognition of *Matrix Mixers* as an innovative and effective approach to sequence modeling. We are also glad that the reviewer found our validation of Bi-Mamba across multiple tasks and domains extensive. We are eager to address the concerns and questions below. > 1. It seems hard to fill the 0.4 MNLI gap, even though bi-mamba uses larger model size. We understand the reviewer's concern on the model size and the performance of Bi-Mamba on GLUE tasks. We have now significantly improved our results - **Bi-Mamba achieves a lead of 0.4 points in MNLI and an average lead of 1.1 points across all tasks, compared to the heavily tuned BERT results**. Furthermore, we have **parameter matched Bi-Mamba** (116M &rarr; 112M) to BERT (110M). To achieve these improvements, unlike our initial approach of using out-of-the-box M2 recipe, we perform **a minimal sweep over learning rates and number of epochs for finetuning tasks**. For complete details, we invite the reviewer to refer to our shared response. ---------------- > 2. SSMs have critical weaknesses in retrieval-like tasks. The related tasks can be evaluated and reported in the paper. We appreciate the reviewer's excellent question on the perfomance of Bi-Mamba, and more broadly SSMs, on retrieval-like tasks. They have correctly noted that **concurrent works** like Zoology [1] and Repeat After Me [2] report that SSM models like Mamba tend to underperform on the **memorization-centric** tasks like Associative Recall (AR). However, we would like to highlight **that these experiments have been performed on older variants of SSMs, and their test design does not fully demonstrate the capabilities of SSMs.** Specifically, prior works **have not controlled for the effective state (memorization) capacity of the models** when comparing these methods. In a nutshell, **we can view SSMs with a knob of "state size" to control the model capacity**. They become equivalent to attention in capacity and compute cost with state size equals the sequence length^1. Since older variants of SSMs like Mamba were practically infeasible to run on larger state sizes, previous works did not fully test the representational capabilities of SSMs. We now empirically validate our hypothesis on the **latest iteration of SSMs - SSD, whose algorthmic improvements allows it to run on much larger state sizes**. Furthermore, we note that our model **Bi-Mamba is based on SSD, which allows it to enjoy the same retrieval capabilities as SSD.** ^1: This simplified explanation assumes that the number of layers and the model dimension are the same **Experimental Setting:** We test on the Multi-Query Associative Recall (MQAR) synthetic benchmark, introduced in [1]. For our experiment, we chose the sequence length, and the number of kv-pairs: $(l,d)=(1024, 256)$; this setting is more difficult than the most challenging scenario $(l,d)=(512,128)$ reported in [1]. Other hyperparameters remain unchanged from those used in [1]. | Model | Model Dimension | State Size | Number of Layers | Capacity wrt to Attention | Accuracy | |-----------|-----------------|------------|------------------|---------------------------------------|------------| | Attention | 64 | N/A | 2 | 1x | 1.0 | | Mamba | 64 | 16 | 2 | 1/32x | 0.00 | | SSD | 64 | 64 | 4 | 1/8x | 0.44 | | SSD | 64 | 128 | 8 | 1/2x | 0.93 | | SSD | 64 | 256 | 4 | 1x | 0.99 | We observe that due to a very low capacity, **Mamba completely fails to learn the task**. On the other hand, **as we match the capacity of the SSD model with softmax attention, we recover its strong performance**, thus validating our hypothesis. [1] *Zoology: Measuring and Improving Recall in Efficient Language Models. Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré* [2] *Repeat after me: Transformers are better than state space models at copying. Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach* ------------------- <!-- > 3. The advantages of the proposed method can be better highlighted. We thank the reviewer's feedback. We will revise our paper to more effecitvely underscore the advantages and contributions of our method. (@June I love this answer, wish all of them could be this!) --> ----------------------- > 3. What are the numeric issues of the proposed model? Mamba model often needs fp32 for stable training. We appreciate the reviewer's concern regarding the stability of training with mixed precision. However, since we use the newer variant of Mamba (SSD), **we were able to successfully train and evaluate all the structured matrix variants, including Bi-Mamba, using bfloat16 floating point numbers**. Throughout the experiments, we encountered no instances of instability that could be attributed to the use of low precision floating point numbers. -------------------------- > 4. No scaling curves provided. We understand and value the reviewer's request for scaling curves for a more thorough understanding of how Bi-Mamba's performance varies with compute, dataset size and model size. However, since this requires substantial computational resources, producing these plots is currently beyond the scope of our academic compute budget. -------------------------- > 5. Limitations of structured matrices. We are glad that the reviewer raised this question, and we kindly request the reviewer to view the shared response for a detailed discussion on the **Representation-Computation tradeoff, and the hardware concerns** associated with structured matrices. ------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully