Cavan T
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Rebuttal ## General Response We are grateful for the reviewers' efforts and the recognition of our contributions: **Novel Research Topic:** The paper explores the unique impact of vocabulary size on language model performance, an aspect often overlooked in LLM research [EsaU,Y8zo,zt86]. **Analyses:** We provide in-depth analyses why it exists a optimal vocabulary size with the FLOPs and the optimal vocabulary size increases with more FLOPs, with theoretical anayses in Appendix A.1 and demostration experiments in Sec 3 and Apppendix A.2. [EsaU,zt86] **Experiments:** The paper includes two effective methods (IsoFLOPs and a derivative-based approach) to predict the optimal vocabulary setting. Extensive experiments on **1200 pre-trained models** (6 non-vocabulary parameters settings x 10 vocabulary sizes settings x 20 training data settings) are conducted. [EsaU,zt86] **Applications:** The study's findings offer two practical tools for optimizing the compute allocaiton of LLMs training, with the consideration of the vocaulary size. [EsaU,Y8zo,zt86] In response to the feedback of reviewers, we have performed additional analyses and experiments to address the raised concerns. Below, we summarize our responses and the improvements made to our paper. **New Analyses and Experiments** - We supplement the paper with a new perspective from parameter growing to demontrate that the vocabulary parameters need to be separately considered from the total parameters, and larger models deserve larger vocabularies. - We evaluate the 3B pre-trained models with different vocabulary sizes on 5 more downstream tasks. The model that uses our suggested vocabulary size outperforms its counterpart by a clear margin. - We compare the models with the same training tokens instead of the same training FLOPs as required. All of the suggestions will be considered in our polished version. <br><br> ## Reviewer 1 (Score 7) ### W1: This paper conducted experiments on language models of various parameter sizes, but the largest model tested was only 3 billion parameters. Answer: We acknowledge the importance of evaluating our approach on larger models to establish its scalability. Increasing the model size necessitates pre-training on a larger corpus, which in turn demands more computational resources. For instance, conducting pre-training experiments on 7B models would require an immense computational budget exceeding 10^22 FLOPs, translating to approximately 6 weeks of training time on a cluster with 64 A100 GPUs. However, such a substantial level of computational resources is currently beyond our reach during the rebuttal period. Despite our desire to explore larger model sizes, we are constrained by the practical limitations of our available resources. Nonetheless, the significance of the scaling law lies in investigating it through experiments on a relatively small scale to help us reasonably allocate computational resources when training a large model, thus avoiding wasted computational power. Our experiments with 1200 pre-trained models have demonstrated the existence of an optimal vocabulary size under FLOPs constraints, and the predictions from the theoretical analysis (derivative-based approach) and experimental fitting (IsoFLOPs-based approach) agree with each other. In fact, we have also made predictions for the pre-training of a 300B model, and the two approaches align well, and so we believe it should work for larger models. Furthermore, the change in vocabulary size from 32K in Llama-2 to 128K in Llama-3, which resulted in performance improvement, can be also seen as a verification of our conclusion regarding increasing the vocabulary size when there are more computational budgets. In conclusion, we appreciate your suggestions and we will try our best to conduct more experiments on larger models. We hope you can understand our computational limitations. We will also discuss this in the revised version. <br><br> ## Reviewer 2 (Score 5) ### W1: Lacks performance on large-scale models, such as whether increasing the vocabulary size to a greater extent performs better than existing models in the market. Table 2's experiments look a little bit less. Answer: Thanks the reviewer for bringing this concern up, we reply the concern from more experiments and anaylses. Increasing the vocabulary size can be operated by 2 ways: 1) Train a model with a larger vocabulary size from scratch; 2) Expand the existing model with continual pre-training. 1) **Train from scratch** As shown in the our Table 2, our prediction enables a better model by only adjusting the vocabulary size in different FLOPs budgets. For example, we improve performance on ARC-Challenge from 29.1 to 31.5 with the same 2.3e21 FLOPs. We add more results of new downstream tasks in the table below. The model using our suggested vocabulary size outperforms its counterpart consistently by a clear margin. | Tasks | MMLU | CommonsenseQA | CoQA | TruthfulQA | Lambada | Lambada | |-----------------|----------|----------------|----------|----------|------------|---------------------| | Metric | Normalized Accuracy | Normalized Accuracy | Exact Match | BLEU | Normalized Accuracy | Perplexity | | $V$=32K | 25.02±0.37 | 20.15±1.15 |32.32± 01.95 | 30.35±1.61 | 43.04±0.69 | 15.57±0.48 | | $V^{opt}$=43K | **25.46**±0.37 | **20.97**±1.10 | **37.43**± 01.99| **31.33**±1.62 | **44.91**±0.69 | **13.87**±0.40 | > Qian: We can just remove the perplexity and leave the normalized accuracy here for Lambada? Since other metrics are all higher better The description of the newly added tasks: **MMLU**:Massive Multitask Language Understanding benchmark for broad domain language evaluation. **CommonsenseQA**: A multiple-choice QA dataset for measuring commonsense knowledge. **CoQA**: Conversational question answering tasks to test dialog understanding. **TruthfulQA**: A QA task aimed at evaluating the truthfulness and factual accuracy of model responses. **Lambada**: Tasks designed to predict the endings of text passages, testing language prediction skills. 2) **Continual pre-training** Continual pre-training with vocabulary expanding is a good topic to discussion, and it involves several non-trivial challenges that are not strongly related with the main contributions of this paper, i.e. ,explore the optimal compute allocation with the consideration of vocabulary sizes. Therefore, we will discuss it in our polished version and leave it as an important future work. There are the challenges we will discussion: - Expanding the vocabulary necessitates changes in the tokenization process, which can lead to inconsistencies in tokens segmentation. - Ensuring that these new embeddings are compatible and effectively integrate with the pre-trained embeddings is non-trivial. - Catastrophic forgetting with old word embeddings when learning the new word embeddings. > It is a little bit tricky to discuss continual pre-training parallel to the from-scratch pre-training here. > Qian Suggestion: Thank you for raising this concern. We share the intention to compete with existing powerful models in the market, such as Llama-2-7B. However, training a 7B model on 2 trillion tokens from scratch is far beyond our current computational resources. Nevertheless, we have evaluated our 3B models on more benchmarks to alleviate your concern. Specifically, we have added new experimental results on the following benchmarks: - **MMLU**:Massive Multitask Language Understanding benchmark for broad domain language evaluation. - **CommonsenseQA**: A multiple-choice QA dataset for measuring commonsense knowledge. - **CoQA**: Conversational question answering tasks to test dialog understanding. - **TruthfulQA**: A QA task aimed at evaluating the truthfulness and factual accuracy of model responses. - **Lambada**: Tasks designed to predict the endings of text passages, testing language prediction skills. The following table combines the original Table 2 in the paper and the new experimental results. As shown, our prediction enables better model performance by adjusting the vocabulary size within different FLOPs budgets. The 3B model with a 43K vocabulary size outperforms the 32K counterpart on 11 out of 12 tasks using the same FLOPs budget. For example, we improve performance on ARC-C from 29.1 to 31.5. In conclusion, the model using our suggested vocabulary size (i.e., 43K) consistently outperforms its counterpart (i.e., 32K) by a clear margin. | Tasks | Metric | $V$=32K (Baseline) | $V^{opt}$=43K (Ours) | |---------------|---------------------|-----------|---------------| | Winogrande | Normalized Accuracy | 55.7±1.4 | **58.7**±1.4 | | PIQA | Normalized Accuracy | 72.6±1.0 | **72.7**±1.0 | | OBQA | Normalized Accuracy | **34.4**±2.1 | 33.0±2.1 | | Hellaswag | Normalized Accuracy | 55.1±0.5 | **55.7**±0.5 | | BoolQ | Normalized Accuracy | 60.1±0.9 | **62.3**±0.8 | | ARC-E | Normalized Accuracy | 53.4±1.0 | **55.0**±1.0 | | ARC-C | Normalized Accuracy | 29.1±1.3 | **31.5**±1.4 | | MMLU | Normalized Accuracy | 25.0±0.4 | **25.5**±0.4 | | CommonsenseQA | Normalized Accuracy | 20.2±1.2 | **21.0**±1.1 | | CoQA | Exact Match | 32.3± 2.0 | **37.4**± 2.0 | | TruthfulQA | BLEU | 30.4±1.6 | **31.3**±1.6 | | Lambada | Normalized Accuracy | 43.0±0.7 | **44.9**±0.7 | Another feasible way to compete with models in the market would be continual pre-training with a larger vocabulary size. We believe this is a good topic for discussion, but it involves several non-trivial research challenges not strongly related to the main contributions of this paper, i.e., exploring the optimal compute allocation considering vocabulary sizes. Therefore, we will discuss it in the revised version and leave it as an important future work. The challenges we will discuss include: - Expanding the vocabulary necessitates changes in the tokenization process, which can lead to inconsistencies in token segmentation. - Ensuring that these new embeddings are compatible and effectively integrate with the pre-trained embeddings is non-trivial. - Catastrophic forgetting of old word embeddings when learning new word embeddings. We will discuss all the above in the revised version. Thank you again for your valuable comments. ### W2.1: IsoFLOPs method is very sensitive. Answer: <!-- Thanks for your insightful question. In fact, the IsoFLOPs-based approach does exist sensitivity to some extent, depending on the granularity, range, and quality of the fitting data. Starting with pioneering work on scaling laws such as Kaplan et al. 2020 [1] and Hoffmann et al. 2022 [2], IsoFLOPs-based approach becomes a widely-used tool to study the trend of model performance [3]. We have discussed it in our Appendix B.1 and we will add more details about how to reduce the sensitivity, such as outlier data removal, repeated experiments, in our polished version. --> <!-- Further, we use relative mean square error (rMSE) and coefficient of determination (R^2) to evalute whether we make the good fitting of the experimental data. As shown in Figure 3, the results show that the rMSE < 0.001 and R^2 >= 0.89 for all the considered attributes, non-vocabulary parameters, vocabulary parameters and training characters. It indicates that this data makes a good fit thats all of considered attributes meet a power law with the FLOPs budget. --> <!-- Last, the predictions results reported on Table 1 from IsoFLOPs-based method and derivative-based method are aligned from small-scale models to large-scale models. It verifies the predictions from IsoFLOPs-base method from the independent derivative-based method. --> Thank you for your insightful question. You raise a valid point – the IsoFLOPs-based approach can be sensitive to some extent, depending on the granularity, range, and quality of the fitting data. Since the pioneering work on scaling laws by Kaplan et al. 2020 [1] and Hoffmann et al. 2022 [2], the IsoFLOPs-based approach has become a widely-used tool to study the trend of model performance [3]. We have discussed it in our Appendix B.1, and we will add more details on how to reduce sensitivity, such as outlier data removal and repeated experiments, in our polished version. To evaluate the goodness of fit, we use relative mean square error (rMSE) and the coefficient of determination (R^2). As shown in the table below (also in Figure 3), the results indicate a good fit, with rMSE < 0.001 and R^2 >= 0.89 for all the considered attributes: non-vocabulary parameters ($N_{nv}$), vocabulary parameters ($N_v$), and training characters ($H$). This suggests that these attributes follow a power law with respect to the FLOPs budget. | | $N_{nv}$ | $N_v$ | $H$ | |--------|----------|------|-----| | rMSE | 0.00026 | 0.00051 | 0.00017 | | R<sup>2</sup> | 0.93 | 0.89 | 0.96 | Furthermore, the optimal vocabulary predictions (reported in Table 1) from the IsoFLOPs-based method and the derivative-based method are aligned across small-scale and large-scale models. This independent verification by the derivative-based method validates the predictions from the IsoFLOPs-based method. Therefore, we believe that the IsoFLOPs-based method works well in our case. [1] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. [2] Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. 2022. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 [3] Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. 2021. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446. ### W2.2: The experiments also looks not enough. Answer: It is noteworthy that we conduct extensive experiments on **1200 models pre-trained from scratch** (6 non-vocabulary parameters settings x 10 vocabulary sizes settings x 20 training data settings) for the fitting of our vocabulary scaling law. The key contributions in this paper is the several findings about how the vocabulary affects the model performance and how much compute should be allocated on the vocabulary based on the proposed 2 approaches. Following the previous study [1,2,3], we mainly use the held-out validation loss value for the evaluation of the trained 1200 models. It is a better metric than the downstream tasks performance as the held-out loss provides an unbiased measure of the model’s ability to generalize to new data, but also enjoys high computing efficiency. Instead, the performance of downstream tasks has a great variety across different tasks, which is not suitable as the main evaluation metric. The evaluation of downstream tasks is part of the ways to verify our prediction, therefore we do not take too much content to discuss it in our main paper. For downstream tasks, we conduct more experiments in the answer of your #Q1. The new results will be added in our polished version. ### Q1: In determining the scaling law for the relationship between non-embedding model size and data (such as the Chinchilla law), why is it assumed that the vocabulary size is independent of these two factors. Answer: Thanks for your question! We do not assume that the vocabulary size is independent of parameters and data. Instead, we make some adjustments in the Section of Preliminary: 1) We break down the total parameters into non-vocabulary parameters and vocabulary parameters; 2) We measure data not in tokens but in training characters. By doing so, the vocabulary size $V$ is independent with the non-vocabulary parameters $N_{nv}$ and the number of training characters $H$. In an experimental configuration, the developers can vary the vocabulary size without affecting non-vocabulary parameters or training characters. Then, we details our motivation why we separate the vocabulary parameter and non-vocabulary parameter below: Traditionally, scaling up model parameters in language models has been approached in two ways: increasing depth (i.e., the number of layers) or width (i.e., the hidden size). Current empirical practices often involve expanding both simultaneously [4]. This approach overlook crucial distinctions in how different parameters benefit from parameters expansions. Non-vocabulary parameters can benefit from increases in both depth and width, allowing for more complex hierarchical representations and broader feature capture. In contrast, vocabulary parameters, associated with word embeddings and language model heads, are generally confined to a single layer, limiting their ability to benefit from increases in the model depth. This disparity in growth potential between non-vocabulary and vocabulary parameters suggests that to maintain a balanced growth rate, it is better to separate the vocabulary parameter and non-vocabulary parameter into consideration. [4] Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Tran, Dani Yogatama, and Donald Metzler. 2023. Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12342–12364, Singapore. Association for Computational Linguistic <br><br> ## Reviewer 3 (Score 6) ### W1: The results are probably mostly applicable to a small number of well-funded labs. Answer: For small labs, our scaling laws are also advantageous. As we provide a verified compute allocation suggestion for developers to achieve high model performance without the need for multiple attempts on the vocabulary configurations, thereby saving compute resource. On the other hand, the proposed derivative-based method are relied on the theoretical deduction but not the data fitting on heavy experiments. Developers can use this approach by the numerical search, which requires just several seconds to get the recommended vocabulary configuration by the CPU. > Qian Suggestion: > > Thanks for pointing this out! We want to clarify that we are not a well-funded lab either. Due to our limited computing resources, we can only afford to train models with up to 3B parameters in our experiments, as the cost of validating scaling law experiments is indeed very high. > > However, we believe that our conclusions are beneficial to the general research community, especially for small labs. Our scaling laws with vocabulary provides a compute-optimal allocation suggestion, enabling small labs to train high-performance models without repeatedly trying different vocabulary configurations, thereby saving computing resources. > > Even for teams who want to conduct scaling law experiments themselves, our derivative-based method offers a simple and feasible approach based on theoretical derivation. Researchers do not need to run a large number of scaling law experiments to obtain a good vocabulary configuration. This is particularly advantageous for small labs. We will also make all our scaling law experimental results public so that more people can benefit from our work. ### Q1: The abstract states that “beyond the conventional 32K” – is this really the convention? See e.g. GPT4o. Answer: Yes, there is more than one classic vocabulary size, just as there is more than one classic model size. We chose one of the classic vocabulary sizes, 32K, following Llama. The reason is that the training corpus of Llama is mainly in English. We also use the English-dominated corpus, slimpajama, for pre-training. The vocabulary size of GPT4o is larger as it is a multi-lingual model. It is interesting to explore how to set the vocabulary size in a multi-lingual scenario in the future, and we discuss it in the Appendix B.4. Additionally, the broader impact of our work is in raising the community's attention to vocabulary when training language models, and how to allocate the suitable compute on the vocabulary. Actually, big companies nowadays are starting to notice that their previous compute allocation to vocabulary was too small, and thus they are starting to increase the vocabulary size, e.g., Llama has increased its vocabulary size from 32K to 128K. > Qian Suggestion: > Thank you for your insightful question! We acknowledge that there is no single "conventional" vocabulary size for language models, as it can vary based on the pre-training corpus and the intended use case. A vocabulary size of 32K is widely regarded as a common choice, particularly for models trained on English-centric corpora, such as Llama-1, Llama-2, and Mistral. Since our work primarily utilizes the English-centric SlimPajama corpus for pre-training, we have adopted the 32K vocabulary setting employed by these models as a "conventional" vocabulary size. We will modify the statement in the abstract accordingly to reflect this clarification. > > As for GPT4o, we think its vocabulary size is relatively larger because it is designed to handle multiple languages (e.g., Chinese). This also highlights an important consideration for future research: determining the optimal vocabulary size for multilingual models, which we have discussed in Appendix B.4. > > Our broader goal is to draw attention to the importance of vocabulary size in training language models and to encourage the appropriate allocation of computational resources for this aspect. Recently, there has been a shift in the industry, with major companies recognizing that their previous allocations for vocabulary were insufficient. For example, Llama has increased its vocabulary size from 32K to 128K, reflecting this evolving understanding. We hope this clarification helps, and will add the discussion in the revised version. ### Q2: How does Table 2 look if you train on the same number of tokens instead of using the same FLOPs budget? Answer: Thanks for your prompting question! As you suggested, we also trained the model using the same number of tokens, i.e., 129B tokens, beyond the same FLOPs budget setting. As shown in the following table, the performance of the model with the suggested vocabulary size of 43K improves further compared to the 32K vocabulary size when using the same number of training tokens. We will add the results in the revised version. | **$V$** | **$N_v$** | **$D$** | **Winogrande** | **PIQA** | **OBQA** | **Hellaswag** | **BoolQ** | **ARC-E** | **ARC-C** | **Average** | |---------|-----------|---------|----------------|----------|----------|---------------|-----------|-----------|-----------|-------------| | 32K (Baseline) | 0.20B | 129B | 55.7 ± 1.4 | 72.6 ± 1.0 | **34.4** ± 2.1 | 55.1 ± 0.5 | 60.1 ± 0.9 | 53.4 ± 1.0 | 29.1 ± 1.3 | 51.5 | | 43K (Ours with same FLOPs) | 0.27B | 125B | **58.7**±1.4 |**72.7**±1.0 |33.0±2.1 |55.7±0.5 |62.3±0.8 | 55.0±1.0 | 31.5±1.4 | 52.7 | 43K (Ours with same Tokens) | 0.27B | 129B | 58.6±1.4 | **72.7**±1.0 | 33.6±2.1 | **55.8**±0.5 | **62.4**±0.9 | **55.5**±1.0 | **31.5**±1.4 | **52.9**

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully