pierluca
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ## Shared answer We thank the reviewers for their helpful feedback. Overall, reviewers appreciated our method, finding it innovative and useful (VWkY), interesting and very effective (2GW2), and able to lead to strong results (j8Na). They appreciated the depth of our emprical analyses and evaluations (wqBP, VWkY), as well as our presentation and visualizations (wqBP, VWkY, 2GW2, j8Na). Reviewer wqBP and Reviewer VWkY expressed interest in understanding how computationally expensive Motif is. Running our code based on PagedAttention as implemented in vLLM (Kwon et al., 2023) on a node with eight A100s, the annotation of the full dataset of pairs takes about 4 GPU days when using Llama 2 13b and 7.5 GPU days when using Llama 2 70b. Given the asynchronous nature of our code, the required wall-clock time can be significantly reduced if additional resources are available. We have now included all this information in the paper in Appendix A.8.4. We believe that, together with the use of open models and our algorithm's robustness to data diversity and cardinality (see Appendix A.8.3), this makes experimenting with Motif affordable and accessible to many academic labs. We will release our efficient implementation, together with the entire set of annotations used in our experiments, for the benefit of the research community. Note that, once the dataset is annotated, there is no use of the LLM anymore, and the combination of reward model training and a 1B-steps RL training run can take less than 10 hours on an A100 GPU. This also means that a policy trained with Motif can be deployed to act in real time, as long as the policy architecture runs fast enough, and regardless of the computational cost of running the LLM itself at annotation time. Our updated paper explicitly highlights these computational considerations. Reviewer VWkY and Reviewer j8Na were interested in comparing Motif to additional approaches using LLMs for decision-making. The first LLM-based baseline to which we compare is based on leveraging Llama 2 70b as a policy on the raw text space (similar to Wang et al, 2023). This did not lead to any performance improvement over a random baseline (we discuss this in Section F of the Appendix). The second LLM-based baseline we implemented is a version of the recently-proposed ELLM algorithm (Du et al, 2023). This implementation of ELLM closely follows the details from the paper. As intrinsic reward it uses the cosine similarity between the representation (provided by a BERT sentence encoder) of the game messages and the "early game goal" extracted from the NetHack Wiki (the first two lines from [the early game strategy](https://nethackwiki.com/wiki/Standard_strategy#The_early_game)). Despite Motif not relying on any such information from the NetHack Wiki, it significantly outperforms ELLM in all tasks. ELLM does not provide any benefit on complex tasks: its reward function cannot, by design, exhibit the exploration and credit assignment properties of the one produced by Motif (see Section 4.2 of the paper, "Alignment with human intuition"). Note that the results for the ELLM baseline are still running and we currently can only show up to 650M steps (due to its iteration speed being considerably slower than Motif). We will include the full curves in the final version of the paper. ## Reviewer wqBP (Score: 6, Confidence: 3) Thank you for your feedback! > Using a 70-billion LLM to generate a preference dataset from given captions is quite expensive Due to our implementation being modular and asynchronous (that will be publicly released), dataset annotation is not especially expensive. Please see the general response for complete computational considerations. In addition, in Figure 15 we show that the performance of our method is particularly robust to the size of the dataset of annotations: Motif is able to outperform the baselines even with a dataset that is five times smaller (i.e., ~100k annotations). > while I understand this is out of the scope of the paper, perhaps using a large VLM to annotate frames without captions might have been more economical? The question of whether annotations extracted from a VLM would be more efficient than the ones extracted from running an LLM on captions is interesting. Unfortunately, our experiments with current open VLM models suggest that none of them are able yet to interpret visual frames well enough to provide an effective evaluation or even accurate captions (most likely because current open models are predominantly trained on natural images). However, given the current pace of VLM research, this may change very soon. Thus, investigating the difference in the efficiency of various types of feedback will be an interesting avenue for future research. In general, the question of large VLMs operating on images vs LLMs operating on captions brings interesting tradeoffs. Large VLM are more general, since they do not assume access to captions, but are faced with a more challenging task since they work with complex images rather than compressed text descriptions. From a purely computational standpoint, if captions are available and are of high quality then LLMs are likely more economical since their inputs are smaller. > it might be worthwhile having a baseline that gives preferences using a simpler model (say sentiment analysis) and learn the RL policy using this intrinsic reward model. We added the results of an experiment using a sentiment analysis model as a preference model in the updated paper (Figure 8 of Appendix A.6). We use a [T5 model](https://huggingface.co/mrm8488/t5-base-finetuned-imdb-sentiment) fine-tuned for sentiment analysis, and extract, for each message, a score computed as the sigmoid of the confidence of the model in its positive or negative sentiment prediction. Then, for each pair in the dataset, we assign a preference based on the message with higher sentiment score. Finally, we execute reward training and RL training as with Motif. Results on the `score` task show performance close to zero, both with and without extrinsic reward. This poor performance can be easily explained: a generic sentiment analysis model cannot capture the positivity or negativity of NetHack-specific captions. For instance, killing or hitting are generally regarded as negative statements, but they become positive in the context of killing or hitting monsters in NetHack. Llama 2 can understand this out-of-the-box without any fine-tuning, as demonstrated by our experiments. Also note that such a vanilla sentiment analysis model cannot be easily steered, thus losing any opportunity for the controllability offered by Motif. To attest to Motif's strong performance, we also compared with an additional LLM-based baseline (as requested by Reviewer VWkY). The details of this additional experiment are presented in the common response above. > the paper mentions that the agent exhibits a natural tendency to explore the environment by preferring messages that would also be intuitively preferred by humans. Is this a consequence of having a strong LLM, or is it due to the wording of the prompt? We believe this is due to the fact that the LLM was pretrained on massive amounts of human data, and then fine-tuned on human preferences. Indeed, even when using the zero-knowledge prompt presented in Prompt 2 of the Appendix B, Motif's reward function allows agents to play the game effectively even without any reward signal from the environment (see Figure 6b). > An ablation over $\alpha_2$ been provided in the appendix, but the value of $\alpha_1$ the coefficient for the intrinsic reward is kept fixed at 0.1; could you explain the reason behind that? The important factor when combining two terms in a reward function is the relative weight given to each one of them. We decided to ablate by varying $\alpha_2$ to progressively give more weight to the extrinsic reward, compared to the intrinsic reward. This allows us to show that, while Motif already performs well in the absence of extrinsic reward (i.e., for $\alpha_2=0$), adding progressively more importance to the reward signal coming from the environment (by increasing $\alpha_2$) correspondingly increases performance, but only up to a point. In the moment at which the relative weight given to the intrinsic reward becomes too small, the performance starts degrading as the agent acts essentially more and more as the extrinsic-only baseline. > In Figure 6c, the score for the reworded prompt is quite low but its dungeon level keeps steadily rising compared to the default prompt. Figure 6c shows that a rewording of the prompt in a task as complex as the one of finding the oracle can cause a complete change in the strategy followed by the agent. In the case of the original prompt, the agent hacks the reward, solving the task without the need of going down the dungeon. When using the reworded prompt, the agent instead starts going down the dungeon to look for the oracle, and finds it for up to 7% of the episodes. The plot thus shows the sensitivity of systems like Motif to variations of the prompts that could be perceived as small to humans. In the paper we put a strong emphasis on understanding such changes in behavior as we believe it to be fundamental if we are to release any LLM-based agent in more realistic situations. As a side note, we corrected the y axis label and normalization in Figure 6c in the updated paper to be consistent with the rest of the paper (from "Score" to "Success Rate"). ## Reviewer VWkY Thank you for you feedback! > The paper could benefit from a more extensive comparison to other methods, especially those that also attempt to integrate LLMs into decision-making agents. First of all, we refer the reviewer Figure 7 in the Appendix F, in which we show that Motif outperforms four competitive baselines, including E3B (Henaff et al, 2022) and NovelD (Zhang et al., 2021), two state-of-the-art approaches specifically created for procedurally-generated domains such as NetHack. In the updated paper, we have now additionally added a comparison to ELLM (Du et al., 2023), a recent approach for deriving reward functions from LLMs, showing that Motif's performance significantly surpasses such LLM-based baselines across all tasks. This is due to the peculiar features of Motif's intrinsic reward (e.g., its anticipatory nature), which, by design, are not implied by a reward function based on the cosine similarity between a goal and a caption. Our implementation is described in detail in the general answer above. > There is a lack of discussion on the computational cost and efficiency aspects of implementing Motif. Due to our implementation being modular and asynchronous (that will be publicly released), dataset annotation is not especially expensive. Please see the general response for complete computational considerations. In addition, in Figure 15 we show that the performance of our method is particularly robust to the size of the dataset of annotations: Motif is able to outperform the baselines even with a dataset that is five times smaller (i.e., ~100k annotations). > While the paper makes a strong case for Motif, it doesn't delve deeply into the limitations or potential drawbacks of relying on LLMs for intrinsic reward generation. We believe, as the reviewer does, that addressing limitations in LLM-based work is critical: this is why a substantial fraction of our paper is devoted to analyzing limitations and pitfalls of intrinsic motivation from an LLM's feedback. In particular, we want to highlight that we dedicated a full page of the paper to demonstrating evidence for, explaining, and characterizing _misalignment by composition_, a negative phenomenon relevant to our framework, whose emergence is a current limitation of Motif. In addition, we studied the sensitivity of Motif to different prompts, showing in Figure 6c that semantically-equivalent prompts can lead, in complex tasks, to drastically different behaviors. We believe this is a limitation of current approach based on an LLM's feedback, and hope that future work will be able to address it. Finally, we also included in Appendix H.3 a study on the impact of the data diversity of the dataset (through which we elicit preferences) and the resulting final performance. Please notice that we are also very upfront about the fundamental assumption behind Motif, which is also a fundamental assumption behind the zero-shot application of LLMs to new tasks: that the LLM contains prior knowledge about the environment of interest. Our Introduction is centerered around this assumption. > Could the authors offer insights into why agents trained on extrinsic only perform worse than those trained on intrinsic only rewards? Our paper already provides some insights in the "Alignment with human intuition" paragraph of Section 4.2, but we will now provide an additional perspective that can be beneficial to understand this result. By inspecting the messages preferred by the intrinsic reward function, one can quickly realize that the agent will receive from the LLM's feedback three kinds of rewards: _direct rewards_, _anticipatory rewards_ and _exploration-directed rewards_. Direct rewards (e.g., for "You kill the cave spider!") leverage the LLM's knowledge of NetHack, implying a reward very similar to the score (i.e., the extrinsic reward). Motif's reward, however, goes beyond this. Anticipatory rewards (e.g., for "You hit the the cave spider!") implicitly transport credit from the future to the past, encouraging events not rewarded by the extrinsic signal and easing credit assignment. Finally, exploration-directed rewards (e.g., for "You find a hidden door.") directly encourage explorative actions that will lead the agent to discover information in the environment. Together, these three types of rewards allow the agent to maximize a proxy for the game score that is way easier to optimize compared to the actual game score, explaining the increased performance. > What's the best strategy to optimally balance intrinsic and extrinsic rewards during training? We show in Figure 10c in Appendix that Motif is quite robust to how the two rewards are balanced. Broadly speaking, given that the nature of Motif's intrinsic reward brings it closer to a value function, future work can explore potentially more effective ways to leverage such type of intrinsic reward, for instance via potential-based reward shaping (Ng et al., 1999). > Can the authors elaborate on the limitations of using LLMs for generating intrinsic rewards? Are there concerns about misalignment or ethical considerations? As highlighted in our previous answer to the third point, the space given to our studies on misalignment and robustness in our paper is a conscious decision. We believe this constitutes a first step in establishing this as a common practice when designing new algorithms. We want to remark here, as we did in our conclusions, that "we encourage future work on similar systems to not only aim at increasing their capabilities but to accordingly deepen this type of analysis, developing conceptual, theoretical and methodological tools to align an agent’s behavior in the presence of rewards derived from an LLM’s feedback." > How robust are agents trained with Motif against different types of adversarial attacks or when deployed in varied environments? Our experiments on prompt sensitivity (Figure 6c, 11b, 11c) can be interpreted as being close to this kind of study, showing the seemingly small variations of a prompt can trigger large or small variations of performance and behavior, depending on the environment. Future work should explore the possibility of studying the effect of actual adversarial attacks on prompts. ## Reviewer 2GW2 Thank you for your feedback! > Can Motif be applied to other environments beyond the NetHack Learning Environment (NLE)? We could not investigate this question in our current paper, as it is already compact with detailed analysis on the behavior, the risks of using LLMs and the possibilities for defining diverse rewards. By adding other environments, we could not provide such in-depth analysis. We strongly believe that Motif is a general method, and it can applied to other environments after reasonably-sized efforts, when its main assumptions are satisfied. In particular, Motif's LLM needs to have enough knowledge about the environment, which is related to the presence on the Internet of text related to it, and the availability of an event captioning system. These assumptions are realistic in many environments, both when dealing with a physical system (e.g., a robot accompanied by a vision captioner) and a simulated/virtual world (e.g., a commonly-played videogames or Web browsing). Additionally, one could apply the general architecture of Motif to any environments based on visual observations, by just substituting a VLM in place of the LLM. We believe this is an exciting direction for future work. To give some context on our choice of environment, NetHack is a challenging and illustrative domain to deploy an algorithm like Motif. Captions in NetHack are non-descriptive: they do not provide a complete picture on the underlying state of the game. Moreover, these captions are sparse, appearing in only 20% of states. This means that overall there is a high degree of partial observability. Despite this challenge Motif is able to thrive and show results that we have not witnessed in the literature previously. We believe that if we were to apply Motif in other environments with more complete descriptions we could see even stronger performance. This would bring important questions to be studied: what exactly is the impact of partial observability on preferences obtained from an LLM? Do more detailed captions unlock increasingly more refined behaviors from the RL agent? Such important questions could be investigated by future work. > What a RL algorithm is used for RL fine-tuning? We use the asynchronous PPO implementation of Sample Factory (Petrenko et al., 2020). This information is available in the paper on the bottom of page 4. We chose this implementation as it extremely fast: we can train an agent on 2B steps of interactions in only 24h using only one V100 GPU. ## Reviewer j8Na Thank you for your feedback! > I think the point of this paper is that "joint optimization of preference-based and extrinsic reward helps resolve the sparse reward problems". As the source of feedback, either humans or LLMs are OK. I think describing this as LLM's contribution might be an overstatement. [...] As a preference-based RL method, I guess there are no differences from the original paper [1]. We never claim that an LLM's feedback is inherently better than the feedback coming from humans, even though we believe assessing whether that could be the case is an interesting avenue for future work. Instead, we simply leverage an LLM's feedback because of its scalability: in just a few hours on a small-sized cluster, one can annotate thousands of pairs of games events, which would require significant amounts of human NetHack experts's labour otherwise. This scalability, leveraged also in recent work on chat agents (e.g., Constitutional AI from Bai et al, 2022), allows for a method like Motif to fully leverage human common sense to bootstrap an agent's knowledge. > In the LLM literature, [2] leverages GPT-4 to solve game environments, and [3] incorporates LLM-based rewards for RL pretraining. In [2] the Voyager algorithm uses complex prompt engineering involving significant amounts of human knowledge and engineering (such as deliberately prompting the LLM to use the “explore” command regularly). Additionally, Voyager relies on a Javascript API to bridge the gap between the LLM's language outputs and the high dimensional observations and actions of Minecraft. Finally, Voyager relies on perfect state information about certain features of the game (e.g. agent position and neighbouring objects). Voyager also builds on GPT-4's strong coding abilities, which would likely not be the case of current open models. Altogether, these factors strongly limit Voyager's general applicability and reproducibility. On the other hand, Motif relies on very limited human knowledge, being able to get significant performance even without any information about NetHack. Moreover Motif is way simpler to implement, with very few, clearly separated, moving pieces, providing a robust solution for leveraging prior knowledge from large models. This makes our approach a significantly more general method that has the potential to be applied to multiple domains or be possibly combined with powerful Large Vision Language Models. We have additionally compared Motif to the ELLM approach of [3], adapted as described in the general response, showing that Motif significantly outperforms it in all NLE tasks. Please see the common response for a detailed description of the experiment. The paper also highlights in Section 5 (Related Work) the important differences between Motif and the ELLM algorithm. In particular, ELLM's reward function cannot, by design, exhibit the exploration and credit assignment properties of the one produced by Motif (see Section 4.2 of the paper, "Alignment with human intuition"). We believe those differences are key to the strong performance of Motif. > Terminology: I'm not sure if a preference-based reward should be treated as an "intrinsic" reward. I think it is extrinsic knowledge (from humans or LLM). As standard, we refer to extrinsic reward as the reward that comes from the task to be performed in the environment, whereas intrinsic rewards are provided by the algorithm. From that point of view, the reward provided by the LLM is intrinsic (as opposed to extrinsic) to the agent. Please notice that this terminology has previously been used in the literature (see, for instance, the ELLM [3] paper). > Which RL algorithm is used for Motif? I may miss the description in the main text. We use the asynchronous PPO implementation of Sample Factory (Petrenko et al., 2020). This information is available in the paper on the bottom of page 4. We chose this implementation as it extremely fast: we can train an agent on 2B steps of interactions in only 24h using only one V100 GPU. > Are there any reason why employ LLaMA 2 rather than GPT-3.5 / 4? Yes, we believe there are important and significant reasons to prefer using Llama 2 rather than GPT3.5 or GPT4. GPT3.5 and GPT4 are subject to changes over time, require significant financial efforts to be used at scale, and rely on unknown methodologies and practices. Despite the fact that they might provide better performance, they are problematic for rigorous scientific reproducibility, and thus they are significantly less preferrable than Llama 2 for conducting scientific research. We explicitly made this decision for the benefit of the scientific community, and we will also release our code and dataset to ease experimenting with a method like Motif for other members of the community. > (Minor Issue) We thank the reviewer for spotting the typo. We corrected it in the updated version of the paper. ### Second reply to Reviewer j8Na Thank you for your timely answer! We are glad to know that we have effectively addressed the majority of the reviewer's concerns. We now provide answers to the remaining two concerns on Contribution and Terminology. __Contribution__ We appreciate the suggestion from the reviewer. We highlight that we already discuss this early in the paper, in our introduction, stating that "since the idea of manually providing this knowledge on a per-task basis does not scale, we ask: what if we could harness the collective high-level knowledge humanity has recorded on the Internet to endow agents with similar common sense?". To make the advantage of AI feedback even more explicit to a reader, we added a brief but precise sentence to the conclusion, saying that "[Motif] bootstraps an agent's knowledge using AI feedback as a scalable proxy for human common sense.". __Terminology__ In our paper, we demonstrate that the reward from Motif is not _only_ "a proxy for extrinsic reward", but instead captures rich information about the future that (1) helps the agent *explore* unknown parts of the environment, (2) *discover* inherently interesting patterns in the environment and (3) achieve *creative* solutions (please see Section 4.2 "Misalignment by composition in the oracle task"). Notice that these three characteristics are part of the formal definition of intrinsic motivation of Schmidhuber, 1990. In particular, Motif's intrinsic reward "is something that is independent of external reward, although it may sometimes help to accelerate the latter" (Section V.B from Schmidhuber, 1990). Even though we not follow the specific way in which intrinsic motivation is defined in that seminal work (i.e. through learning progress), we believe Motif adhere's to the underlying principles. To better explain what we mean, we report here an excerpt of our response to VWkY, which provides a more detailed discussion on why Motif's reward is much more than simply a replication of the extrinsic reward, and instead incorporates strong elements of intrinsic motivation: > Our paper already provides some insights in the "Alignment with human intuition" paragraph of Section 4.2, but we will now provide an additional perspective that can be beneficial to understand this result. By inspecting the messages preferred by the intrinsic reward function, one can quickly realize that the agent will receive from the LLM's feedback three kinds of rewards: direct rewards, anticipatory rewards and exploration-directed rewards. Direct rewards (e.g., for "You kill the cave spider!") leverage the LLM's knowledge of NetHack, implying a reward very similar to the score (i.e., the extrinsic reward). Motif's reward, however, goes beyond this. Anticipatory rewards (e.g., for "You hit the the cave spider!") implicitly transport credit from the future to the past, encouraging events not rewarded by the extrinsic signal and easing credit assignment. Finally, exploration-directed rewards (e.g., for "You find a hidden door.") directly encourage explorative actions that will lead the agent to discover information in the environment. In this passage we explicitly distinguish the three ways in which Motif helps: (1) through rewards directly related to the score, (2) through anticipatory rewards and (3) through exploration-directed rewards. Notice that if Motif only provided rewards of the type (1), we could see Motif as a proxy to the score. However, (2) and (3) make it abundantly clear that Motif goes much further than that and provides intrinsic motivation to the agent to discover the environment. It is also through (2) and (3) that Motif's performance significantly outperforms the baselines. Finally, we would like to highlight that we are as explicit as we can be as to the nature of the reward obtained by Motif, i.e. that it is preference based. This is present in the title, the introduction and throughout the paper at numerous occasions. Notice that it is also the basis for the name of our algorithm (**Motif** -> **Moti**vation from AI **f**eedback). We are hopeful that these answers should address your concerns, but let us know if any further clarification is required. Jurgen Schmidhuber, Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully