Jakub Grudzień
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
      • No invitee
    • Publish Note

      Publish Note

      Everyone on the web can find and read all notes of this public team.
      Once published, notes can be searched and viewed by anyone online.
      See published notes
      Please check the box to agree to the Community Guidelines.
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
No invitee
Publish Note

Publish Note

Everyone on the web can find and read all notes of this public team.
Once published, notes can be searched and viewed by anyone online.
See published notes
Please check the box to agree to the Community Guidelines.
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
2
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Multi-Agent Reinforcement Learning (2/3): Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning This blog is based on the paper *"Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning"*, by Kuba et al., available at https://arxiv.org/pdf/2109.11251.pdf. It is the second blog in the series "Multi-Agent Reinforcement Learning". You can read the prequel of it at https://hackmd.io/rkNojzNzQzWXlU0HoaPOrg. ## The Limitations of Multi-Agent Policy Gradients We are already familiar with **MARL**, in which independent agents want to optimize their parameterized policies $\pi^i$ to maximize $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \mathcal{J}(\boldsymbol{\pi}) = \mathbb{E}_{s\sim \rho^{0:\infty}_{\boldsymbol{\pi}}, \boldsymbol{a_{0:\infty}}\sim \boldsymbol{\pi} }\big[ \sum_{t=0}^{\infty}\gamma^t r(s_t, \boldsymbol{a}_t)\big]$. Following traditional deep RL approach, we would always model every agent's policy with a neural network $\theta^i$ as $\pi^i_{\theta^i}$, and learn it with stochastic gradient ascent $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \theta^i \gets \theta^i + \alpha \nabla_{\theta^i}\mathcal{J}(\boldsymbol{\pi}_{\boldsymbol{\theta}})$. Such a policy-gradient (PG) based approach is known to be problematic already in single-agent RL (and Machine Learning in general), because the step taken by the gradient update can be too large and harm performance. When it comes to mutli-agent PG (MAPG), this problem is even more severe because the update directions pointed by the gradients of the agents can be conflicting... What?! The gradient should always lead up the reward, right?! Not exactly. Bear in mind that in MARL, every agent follows its OWN gradient, which tells what the agent can do to improve the joint performance. Meanwhile, when all agents try to do the same thing, the joint update may be a disaster. ![](https://i.imgur.com/SNckxAw.jpg) It's like if you were driving a car with your friend, and at some point the road would diverge, with a tree in the middle. Assuming you do nothing, your friend pulls the wheel right, to avoid a crash. However, being mindfull, you want to avoid the tree by turning left. You can't however, because the force applied by your friend is stopping you. The wheel remains still... ![](https://i.imgur.com/ZIXFRg8.jpg) Therefore, in order to make decisions that are beneficial fo the whole team, the agents must always ***collaborate***. Unfortunately, name a method, be it MADDPG, IPPO, MAPPO, all of them make the agents to mind only themselves and follow their own gradients. Hence, we still have no clue how to assure the performance improvement in MARL... Until now 😁 ## Multi-Agent Trust Region Learning In single-agent RL, trust region learning enables stability of updates and policy improvement; at every iteration $k$, the new policy $\pi_{k+1}$ increases the return: $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \mathcal{J}(\pi_{k+1}) \geq \mathcal{J}(\pi_k)$. Because of the reasons described above, simply apllying trust region learning to MARL fails: even if a trust-region update would guarantee improvement of one agent, all agents' updates can be damaging for the whole team. Today, however, you will see the new *multi-agent trust region learning*, which implements cooperation, and leads to the joint policy improvement 🎉. The key ingredients of it are the novel multi-agent functions, which describe the contribution of subsets of agents to the joint return. ### Multi-Agent Advantage Decomposition First, the *multi-agent state-action value function* for an arbitrary ordered agent subset $i_{1:m} = \{i_1, \dots, i_m\}$ is defined as $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad Q_{\boldsymbol{\pi}}(s, \boldsymbol{a}^{i_{1:m}}) \triangleq \mathbb{E}_{\boldsymbol{a}^{-i_{1:m}}\sim\boldsymbol{\pi}^{-i_{1:m}}}\big[Q_{\boldsymbol{\pi}}(s, \boldsymbol{a}^{i_{1:m}}, \boldsymbol{a}^{-i_{1:m}} ) \big]$. Simply speaking, this function says what is the average return if agents $i_{1:m}$ take a joint action $a^{i_{1:m}}$ at state $s$. On top of it we can define *the multi-agent advantage function* $\quad \quad \quad \quad \quad \quad \quad \quad A_{\boldsymbol{\pi}}^{i_{1:m}}(s, \boldsymbol{a}^{j_{1:k}}, \boldsymbol{a}^{i_{1:m}}) = Q^{j_{1:k}, i_{1:m}}_{\boldsymbol{\pi}}(s, \boldsymbol{a}^{j_{1:k}}, \boldsymbol{a}^{i_{1:m}}) - Q^{j_{1:k}}_{\boldsymbol{\pi}}(s, \boldsymbol{a}^{j_{1:k}})$. This function compares the quality of joint action $\boldsymbol{a}^{i_{1:m}}$ of agents $i_{1:m}$ against the average one for joint action $\boldsymbol{a}^{j_{1:k}}$ of agents $j_{1:k}$. Just think about how useful would it be if $i_{1:m}$ could know $\boldsymbol{a}^{j_{1:k}}$ and the multi-agent advantage. They could then "react" cleverly by choosing a joint action $\boldsymbol{a}^{i_{1:m}}$ with large multi-agent advantage... Actually, there is a lemma which describes the awesome consequence of such a scenario, known as the ***Multi-Agent Advantage Decomposition Lemma***: for any ordered subset $i_{1:m}$ of agents $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad A^{i_{1:m}}_{\boldsymbol{\pi}}(s, \boldsymbol{a}^{i_{1:m}}) = \sum\limits_{j=1}^{m}A^{i_j}_{\boldsymbol{\pi}}(s, \boldsymbol{a}^{i_{1:j-1}}, a^{i_j})$. Isn't it amazing 😀?! If every agent $i_j$ knew what agents $i_{1:j-1}$ do, then it could react with $a^{i_j}_*$ to maximize its own multi-agent advantage (whose max is always positive). ![](https://i.imgur.com/Ue5arsO.png) Then, by setting $m=n$, the lemma assures that the joint advantage will be positive! And if the number of agents is large, it should actually be "very" positive! Let's go and see how to contrive a learning algorithm with this idea. ### Monotonic Improvement To learn joint policies which perform well throughout the whole game, we must look at the bigger picture than only one state and action---we consider their marginal distribution. Let's suppose that our agents follow a joint policy $\boldsymbol{\pi}=(\pi^1, \dots, \pi^n)$. They decide to learn according to some order $i_{1:n}$. Suppose that $i_1, \dots, i_{m-1}$ have already made their updates to new policies $(\bar{\pi}^{i_1}, \dots, \bar{\pi}^{i_{m-1}}) = \boldsymbol{\bar{\pi}}^{i_{1:m-1}}$. Then, for any candidate policy $\hat{\pi}^{i_m}$ we define the surrogate return $\quad \quad \quad \quad \quad L_{\boldsymbol{\pi}}^{i_{1:m}}(\boldsymbol{\bar{\pi}}^{i_{1:m-1}}, \hat{\pi}^{i_m}) = \mathbb{E}_{s\sim\rho_{\boldsymbol{\pi}}, \boldsymbol{a}^{i_{1:m-1}}\sim \boldsymbol{\bar{\pi}}^{i_{1:m-1}}, a^{i_m}\sim \hat{\pi}^{i_m}}\big[ A_{\boldsymbol{\pi}}^{i_m}(s, \boldsymbol{a}^{i_{1:m-1}}, a^{i_m}) \big]$. This definition is just a slight step forward from the definition of multi-agent advantage: here, the agent $i_m$ wouldn't react to others with a specific action, but rather with a specific policy $\hat{\pi}^{i_m}$. Fortunately, in this "bigger" setting, another decomposition lemma holds: define $C = \frac{4\gamma \max_{s, \boldsymbol{a}} |A_{\boldsymbol{\pi}}(s, \boldsymbol{a})|}{(1-\gamma)^2}$. Then $\quad \quad \quad \quad \quad \quad \mathcal{J}(\boldsymbol{\bar{\pi}}) \geq \mathcal{J}(\boldsymbol{\pi}) + \sum\limits_{m=1}^{n}\big[ L_{\boldsymbol{\pi}}^{i_{1:m}}(\boldsymbol{\bar{\pi}}^{i_{1:m-1}}, \bar{\pi}^{i_m}) - CD_{\text{KL}}^{\text{max}}(\pi^{i_m}, \bar{\pi}^{i_m})\big]$. This lemma provides a lower bound on the performance of the new joint policy of agents; a lower bound that is decemposed among agents. Such a decomposition allows the agents to, one by one, improve the guarantee on the performance of the next joint policy. It is just tailored for the *sequential update scheme*: the agents update their policies to solve $\quad \quad \quad \quad \quad \quad \quad \bar{\pi}^{i_m} = \text{argmax}_{\hat{\pi}^{i_m}} \ L_{\boldsymbol{\pi}}^{i_{1:m}}(\boldsymbol{\bar{\pi}}^{i_{1:m-1}}, \hat{\pi}^{i_m}) - CD_{\text{KL}}^{\text{max}}(\pi^{i_m}, \hat{\pi}^{i_m})$. As the maximization update is at least as good as no update (for which the above objective is zero), every agent guarantees that $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad L_{\boldsymbol{\pi}}^{i_{1:m}}(\boldsymbol{\bar{\pi}}^{i_{1:m-1}}, \bar{\pi}^{i_m}) - CD_{\text{KL}}^{\text{max}}(\pi^{i_m}, \bar{\pi}^{i_m}) \geq 0$. Together with the above decomposition inequality, the agents following this protocol achieve the **monotonic improvement property**: $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \mathcal{J}(\boldsymbol{\bar{\pi}}) \geq \mathcal{J}(\boldsymbol{\pi})$. Hurrah!!! 🧨🎆 We've done it! We figured out how to make the agents improve the joint return; all with a small update guarantee due to the $D_{\text{KL}}^{\text{max}}(\pi^{i_m}, \bar{\pi}^{i_m})$ penalty. You may wonder *if any order of updates in the sequential update scheme guarantees monotonic improvement, which order should we use?* Indeed, it's a good point. Let's ask ourselves a question, *what the behavior of agents would be like at convergence to the optimal joint policy?* Unfortunately, just like in the case of trust region learning in RL, we cannot implement this protocol in games with large state spaces. But don't worry!, we can easily approximate it and boost with neural networks, as we describe below. ## Deep Algorithms Now you will learn about and get excited by the new state-of-the-art deep MARL algorithms: *Heterogeneous-Agent Trust Region Policy Optimization* (**HATRPO**) and *Heterogeneous-Agent Proximal Policy Optimization* (**HAPPO**). ### HATRPO So what we'd like to do is to make every agent $i_m$ solve $\quad \quad \quad \quad \quad \quad \quad \quad \bar{\pi}^{i_m} = \text{argmax}_{\hat{\pi}^{i_m}} \ L_{\boldsymbol{\pi}}^{i_{1:m}}(\boldsymbol{\bar{\pi}}^{i_{1:m-1}}, \hat{\pi}^{i_m}) - CD_{\text{KL}}^{\text{max}}(\pi^{i_m}, \hat{\pi}^{i_m})$ one after another. This is very similar to the objective of single-agent trust region learning: $L_{\pi}(\hat{\pi}) - CD_{\text{KL}}^{\text{max}}(\pi, \hat{\pi})$. This, in case of neural network policies, is solved approximately by constrained objective of TRPO $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \bar{\pi} = \text{argmax}_{\hat{\pi}} \ L_{\pi}(\hat{\pi}), \ \text{s.t.} \ \overline{D}_{\text{KL}}(\pi, \hat{\pi}) \leq \delta\\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad =\text{argmax}_{\hat{\pi}} \ \mathbb{E}_{s\sim\rho_{\pi}, a\sim\pi}\Big[ \frac{\hat{\pi}(a|s)}{\pi(a|s)} A_{\pi}(s, a)\Big], \ \text{s.t.} \ \overline{D}_{\text{KL}}(\pi, \hat{\pi}) \leq \delta,$ where $\delta$ is a small constraint parameter controlling the size of update. The key to making this objective solvable is that the distribution over which the expectation is taken entirely from the "old" policy $\pi$, allowing us to automatically differentiate it. Unfortunately, our approximate multi-agent trust-region objective (written out) $\quad \quad \quad \quad \quad \mathbb{E}_{s\sim\rho_{\boldsymbol{\pi}}, \boldsymbol{a}^{i_{1:m-1}}\sim\boldsymbol{\bar{\pi}}^{i_{1:m-1}}, a^{i_m}\sim\hat{\pi}^{i_m}}\big[ A^{i_m}_{\boldsymbol{\pi}}(s, \boldsymbol{a}^{i_{1:m-1}}, a^{i_m}) \big], \ \text{s.t.} \ \overline{D}_{\text{KL}}(\pi^{i_m}, \hat{\pi}^{i_m}) \leq \delta$ not only involves the old and the candidate policies $\pi^{i_m}$ and $\hat{\pi}^{i_m}$, but also the joint just-updated policy of agents $i_{1:m-1}$, 😭 so hard!!! Not really! We can boost importance sampling to make the esimtation of this objective feasible. Recall that the agents $i_{1:m-1}$ have already made their update. Then, $i_m$ is free to compute the ratio $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \frac{ \hat{\pi}^{i_m}(a^{i_m}|s)}{ \pi^{i_m}(a^{i_m}|s)} \cdot \frac{ \boldsymbol{\bar{\pi}}^{i_{1:m-1}}(\boldsymbol{a}^{i_{1:m-1}}|s) }{ \boldsymbol{\pi}^{i_{1:m-1}}(\boldsymbol{a}^{i_{1:m-1}}|s) }$. So let's the agent compute it, and plug it in what it has, which is the data coming from the old joint policy $\boldsymbol{\pi}$. It turns out that the multi-agent trust-region objective can be equivalently written as $\quad \quad \quad \quad \quad \mathbb{E}_{s\sim\rho_{\boldsymbol{\pi}}, \boldsymbol{a}\sim\boldsymbol{\pi} }\Big[ \frac{ \hat{\pi}^{i_m}(a^{i_m}|s)}{ \pi^{i_m}(a^{i_m}|s)} \cdot \frac{ \boldsymbol{\bar{\pi}}^{i_{1:m-1}}(\boldsymbol{a}^{i_{1:m-1}}|s) }{ \boldsymbol{\pi}^{i_{1:m-1}}(\boldsymbol{a}^{i_{1:m-1}}|s) } A_{\boldsymbol{\pi}}(s, \boldsymbol{a}) \Big], \ \text{s.t.} \ \overline{D}_{\text{KL}}(\pi^{i_m}, \hat{\pi}^{i_m}) \leq \delta.$ Yes, $A_{\boldsymbol{\pi}}(s, \boldsymbol{a})$ is the joint advantage function. The agents don't have to train special critics to compute the multi-agent advantage; all they have to do is to maintain a joint advantage estimator, like GAE. Furthermore, for simplicity, we can write $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad M_{\boldsymbol{\pi}}(s, \boldsymbol{a}) = \frac{ \boldsymbol{\bar{\pi}}^{i_{1:m-1}}(\boldsymbol{a}^{i_{1:m-1}}|s) }{ \boldsymbol{\pi}^{i_{1:m-1}}(\boldsymbol{a}^{i_{1:m-1}}|s) } A_{\boldsymbol{\pi}}(s, \boldsymbol{a})$, which transforms our problem to the well-known TRPO objective! 🎆🎇🎈 $\quad \quad \quad \quad \quad \quad \quad \mathbb{E}_{s\sim\rho_{\boldsymbol{\pi}}, \boldsymbol{a}\sim\boldsymbol{\pi} }\Big[ \frac{ \hat{\pi}^{i_m}(a^{i_m}|s)}{ \pi^{i_m}(a^{i_m}|s)} M_{\boldsymbol{\pi}}(s, \boldsymbol{a}) \Big], \ \text{s.t.} \ \overline{D}_{\text{KL}}(\pi^{i_m}, \hat{\pi}^{i_m}) \leq \delta$. Having turned the multi-agent trust-region problem to the above objective, the agent $i_m$ (with a neural network policy $\theta^{i_m}$) can maximize it by performing a step of TRPO $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \theta^{i_m}_{\text{new}} = \theta^{i_m}_{\text{old}} + \alpha^j \sqrt{ \frac{2\delta}{\boldsymbol{g}^{i_m}(\boldsymbol{H}^{i_m})^{-1} \boldsymbol{g}^{i_m}}} (\boldsymbol{H}^{i_m})^{-1} \boldsymbol{g}^{i_m}$. Here, just like in TRPO, $\boldsymbol{g}^{i_m}$ is the gradient of ${i_m}$'s objective, and the matrix $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \boldsymbol{H}^{i_m} = \nabla^{2}_{\theta^{i_m}}\overline{D}_{\text{KL}}\big( \pi^{i_m}_{\theta^{i_m}_{\text{old}}}, \pi^{i_m}_{\theta^{i_m}}\big)$ is the Hessian of the average KL-divergence from the old policy. As you may know from TRPO, $\alpha^j$ is a step size at the power $j$. We choose $j$ to be the smallest such $j\in\mathbb{N}$ which makes the update improve the objective estimated from the data. In this way, we can enforce the update to be as good as we can get from available data 🤩. ### HAPPO You may not like the second-order differentiation that HATRPO uses, ok. It is harder to code up, and also is more computationally expensive. Sometimes we want to quickly implement and execute an algorithm. Because of such concerns, we also developed an implementation of multi-agent trust-region learning through proximal policy optimization (PPO). As the constrained HATRPO objective has the same algebraic form as TRPO, it can be implemented with the *clip objective*. $\quad \quad \quad \quad \mathbb{E}_{s\sim\rho_{\boldsymbol{\pi}}, \boldsymbol{a}\sim\boldsymbol{\pi} }\Big[ \text{min}\Big( \frac{ \hat{\pi}^{i_m}(a^{i_m}|s)}{ \pi^{i_m}(a^{i_m}|s)} M_{\boldsymbol{\pi}}(s, \boldsymbol{a}), \text{clip}\big(\frac{ \hat{\pi}^{i_m}(a^{i_m}|s)}{ \pi^{i_m}(a^{i_m}|s)}, 1\pm \epsilon \big) M_{\boldsymbol{\pi}}(s, \boldsymbol{a}) \Big)\Big]$ The clip operator replaces the policy ratio with $1-\epsilon$ or $1+\epsilon$ depending on whether it exceeds the thresholds interval $1\pm\epsilon$ from below or above. If this is not the case, the ration remains unchanged. So for example, $\text{clip}(1.2, 1\pm 0.1) = 1.1$, and $\text{clip}(0.8, 1\pm 0.1) = 0.9$. This makes sure that large policy updates are discouraged. The clip objective is differentiable with respect to the policy parameters, so all we have to do is to initialize $\theta^{i_m} = \theta^{i_m}_{\text{old}}$, and a few times to this $\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \boldsymbol{g}^{i_m}_{\text{HAPPO}} \gets \nabla_{\theta^{i_m}}L^{\text{HAPPO}}(\theta^{i_m})\\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \theta^{i_m} \gets \theta^{i_m} + \alpha \boldsymbol{g}^{i_m}_{\text{HAPPO}}.$ So quick and convenient! 😍 ### Domination (verified empirically) And how do these algorithms perform? Well, HAPPO itself outperforms SOTA methods like MAPPO, IPPO, and MADDPG. HATRPO, however, completely dominates all of them (including HAPPO) and establishes the new SOTA! 💥 Here you have some plots from Multi-Agent MuJoCo---the hardest MARL benchmark. ![](https://i.imgur.com/Uo32wJN.png) So yeah, the key take-away of multi-agent trust-region learning is that the large number of agents does not have to imply conflicts in learning. Quite the opposite, a large team of learners willing to cooperate can get very far 🗻! Are you wondering how can they do it safely? Read our next article! Thanks for reading this article; myself (Kuba) I am really happy for your interest in MARL, and so are my co-authors: Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang.

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template is not available.
Upgrade
All
  • All
  • Team
No template found.

Create custom template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

How to use Slide mode

API Docs

Edit in VSCode

Install browser extension

Get in Touch

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Upgrade to Prime Plan

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

No updates to save
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Upgrade

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Upgrade

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully