aakashlahoti
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICLR Reviews ## Global Response ### Experiment Protocol #### Sample Complexity Experiments In our first experiment, we fix the patch dimension $d$ at 20 and vary the number of patches $k$ across the range $[10, 15, 20, 25, 30]$. For each $(k,d)$ pair, we plot the sample complexity for both CNNs and LCNs. We evaluate the sample complexity via the following steps: 1. Target Loss Evaluation: We compute the optimal loss based on the ground truth and add a fixed tolerance of $0.03$ to establish the target loss. 2. Determining Sample Range: Through trial and error, we determine that a maximum of $1000$ samples is sufficient for any model across all $k$ values. 3. Binary Search Method: To find the minimum number of samples required to reach the target loss, we perform a binary search. In each step, a grid search over the learning rates for weights $[10^{-1}, 10^{-2}, 10^{-3}]$ and biases $[10^{-2}, 10^{-3}, 10^{-4}]$ is conducted, with the selection of the model that exhibits the lowest test error. 4. Repetitions for Reliability: We repeat the steps (1-3) five times, plotting the mean and standard deviation of the sample complexities. ![image](https://hackmd.io/_uploads/HkUvd7dET.png) ![image](https://hackmd.io/_uploads/r1KtF7uEa.png) Recalling the theoretical sample complexity for CNNs at $O(k+d)$ and for LCNs at $O(k(k+d)), \Omega(kd)$, our observations align well with these expectations. We note a linear dependence on $d$ for both CNNs and LCNs, consistent with the theory. Furthermore, LCNs require significantly more samples than CNNs—about 10 to 20 times more—which corresponds with the theoretical multiplicative $d$ factor in LCNs' sample complexity. --- In our second experiment, we set the number of patches $k$ at 20 and vary the patch dimension $d$ across the range $[10, 15, 20, 25, 30]$. The same steps (1-4) are repeated for this setup. ![image](https://hackmd.io/_uploads/Sk4VKm_Va.png) ![image](https://hackmd.io/_uploads/S1CNKQuEa.png) We observe that LCNs exhibit a linear dependence on $k$, while CNNs show a sublinear dependence on $k$, aligning with the theoretical bounds. This provides evidence for our claim that LCNs need to learn each of the $k$ patches independently, whereas CNNs benefit from weight sharing. As was the case with our findings of in the first experiment, LCNs require approximately 20 times more samples than CNNs, reflecting the multiplicative $k$ factor in their sample complexity. --- ## Reviewer 1 **Formal Response:** We genuinely thank the reviewer for their positive evaluation and feel encouraged by their recognition of our work's significance, originality, and clarity. We now address their comments below. **Comment 1: Can experiments be added that validate the theory?** Answer: We agree with the reviewer that empirical validation of our bounds can help us connect them to practice. We have added detailed experimental protocols and results in the supplementary material, and we plan to incorporate them more fully in the next version of our manuscript. **Comment 2: Much of the theoretical development is relegated to the appendix.** Answer: We agree with the reviewer's concern about the placement of our theoretical development. In response, we will include a more comprehensive proof outline in our paper's forthcoming revision. Additionally, we have provided a concise outline for the proof of Theorem 6.1 in an [official comment](https://openreview.net/forum?id=AfnsTnYphT&noteId=zesOXLpzPT), which maybe of interest to the reviewer. **Comment 3: Notational inconsistency in vectors indexing on page 4.** Answer: We are thankful to the reviewer for pointing out this notational inconsistency, and we will rectify it in our revised submission. We sincerely hope that our responses have addressed the concerns raised by the reviewer. ## Review 2 [Link](https://openreview.net/forum?id=AfnsTnYphT&noteId=8RIt8f8W3U) **Formal Response:** We are grateful for the reviewer's positive feedback and we address their questions below. **Comment 1: Can the framework be extended to ReLU activation function, given its popularity over LSA activation function?** Answer: We note that the LSA activation function can be written as the sum of two ReLU activation functions as $\phi_b(x) = \text{ReLU}(x-b) - \text{ReLU}(-x-b)$. This decomposition allows us to adapt our framework for ReLU through the following architectural modifications: 1. We augment an additional hidden node for each patch in LCNs and CNNs and additional $k$ hidden nodes for FNNs. 2. We include a bias term for each patch. 3. We fix the second layer to $\{-1,1,..,-1,1\} \in \mathbb{R}^{2d}$ The idea is that the two hidden nodes corresponding to each patch learn $\mathbf{w}^\star$ and $-\mathbf{w}^\star$ respectively. Together with the bias and the fixed $\pm 1$ values from the second layer, we can simulate the thresholding effect of the LSA activation function. We also note that the LSA activation function, also known as the "soft-thresholding function", is extensively used in high-dimensional sparse recovery problems (Section 18.2 [Hastie]). Since our task involves recovering the sparse mean vector $\mathbf{w}^\star$, it justifies our use of the LSA activation function. **Comment 2: Why is the sample complexity of LCNs different in section 6 and 7? Is this due to learning $k$ "nearly-independent" and $k$ "independent" subtasks respectively?** Answer: Our conjecture is that the difference in the sample complexity for LCN upper bound of $O(k(k+d))$ and the LCN lower bound of $\Omega(kd)$, stems primarily from the computational efficiency of the algorithms employed. The upper bound is derived for a computationally efficient (polynomial time) equivariant gradient descent style algorithm. In contrast, the lower bound is derived from minimax principles and is applicable to both computationally efficient and inefficient equivariant algorithms. It is generally the case that computationally inefficient algorithms have lower sample complexity as compared to their efficient counterparts, which possibly explains the difference in sample complexity. **Comment 3: Are there settings where the sample complexity of FCNs, LCNs and CNNs are similar?** There are indeed scenarios where the sample complexities of FCNs, LCNs, and CNNs converge. These include: 1. Consider the setting where the signal can only be present in a smaller subset of $\tilde{k} < k$ patches, implying that the label-determining signal is "less mobile". In this case, our analysis can be readily extended to show that the sample complexity for FCNs, LCNs, and CNNs scale approximately as $\tilde{k}kd$, $\tilde{k}d$, and $d$ respectively. Therefore, as $\tilde{k}$ approaches 1, LCNs and CNNs will perform similarly, and FCNs' sample complexity will decrease, despite unchanged number of trainable parameters. 2. In the setting when $d$ is relatively large as compared to $k$, that is if the signal is not too-sparse, our bounds show that the advantages of CNNs and LCNs over FNNs become less pronounced. For instance, if we choose $k$ to be a constant $c$, then the sample complexity of FCNs, LCNs and CNNs scales as $c^2 d, cd, d$ respectively and therefore the three models have similar sample efficiency. We hope that our explanations addressed the reviewer's inquiries comprehensively. ## Review 3 - Summary [Link](https://openreview.net/forum?id=AfnsTnYphT&noteId=OtjifWU7VA) **Formal Response:** We thank the reviewer for their valuable comments and we appreciate their insightful feedback. We now take this opportunity to address their queries below. **Comment 1: Can experiments be added that validate the analysis?** Answer: We completely agree with the reviewer's suggestion that empirical validation would strengthen the credibility of our theoretical analysis. We request the reviewer to refer to the newly added supplementary material, which includes the detailed experimental protocol and results. We also plan to incorporate them in the forthcoming revision. **Comment 2: Can our framework also provide bounds for transformers, given their prevalance in machine learning?** Answer: Regarding the potential application of our framework to transformers, we can consider a transformer architecture where each input patch corresponds to an input token. If we assume that the keys $W_K$, queries $W_Q$, values $W_V$ matrices are initialized using standard isotropic gaussian distribution, then this transformer, trained with SGD, is equivariant to the application of the same $U \in \mathcal{O}(d)$ to each patch of the input. Ignoring the issues of semi-metricness, this leads to a $\Omega(\sigma^2d)$ lower bound. The upper bound, however, may be a little challenging as it requires gradient descent analysis for the softmax-based attention mechanism. Nonetheless, we fully agree with the reviewer that this is an important direction for future work. **Comment 3: Misplaced QED symbol after the conclusion** Answer: We thank the reviewer for pointing out the typo, and we will fix in the next revision. We sincerely hope that our responses clarified the questions raised by the reviewer. ## Review 4 - Summary [Link](https://openreview.net/forum?id=AfnsTnYphT&noteId=LU7mrnXNe7) **Formal Response** We thank the reviewer for their thorough and insightful comments, and we address the issues raised below. **Comment 1: "Novelty in our work over Wang et al."** Answer: We agree with the reviewer's observation on our use of global equivariance for FCNs, as in Li et al. and Wang et al., and local equivariance for LCNs, following Wang et al. Our work, however, extends these techniques to a more realistic task, which yields justifications for colloquial intuitions and practical insights. Moreover, given the non-rotationally invariant nature of our task, our novel analysis diverges from standard information-theoretic approaches used in Li et al. and Wang et al. Additionally, our use of gradient descent offers important theoretical advantages over ERM analysis: it establishes a separation between computationally-efficient equivariant algorithms and it circumvents the potential non-equivariance issues in ERM. We elaborate on these aspects below. *1. Realistic task and practical insights* We reemphasize the realistic nature of the task, mirroring real-world image classification, where outputs are determined by local signals embedded within uninformative noise. Specifically, we represent the label-determining signals by $\pm \mathbf{w}^\star \in \mathbb{R}^d$ against a backdrop of i.i.d Gaussian noise with variance $\sigma$, and the $k$ patches denote the potential areas for the signal to translate. In this context, our analysis rigorously brings forth the insight that FCNs incur a threefold cost: a multiplicative factor of $k$ due to the need of independently learning $k$ patches, a cost of $k$ for isolating each patch, and a cost of $d$ for learning the sparse signal. LCNs avoid the isolation cost, while CNNs only incur a cost of $d$ to learn the sparse signal. Our analysis offers another insight in the setting where the signal may only appear in a smaller subset of $\tilde{k} < k$ patches. This implies that the label-determining signal is "less mobile". Our method can be readily extended to show that the sample complexity for FCNs, LCNs, and CNNs scale approximately as $\tilde{k}kd$, $\tilde{k}d$, and $d$ respectively. As $\tilde{k}$ nears 1, the sample complexity of LCNs and CNNs converges, and the sample complexity of FCNs decreases, even though the number of trainable parameters remain unchanged. Furthermore, we can also infer from our bounds that when $d$ is relatively large as compared to $k$, that is if the signal is not "very sparse", the advantages of CNNs and LCNs over FNNs become less pronounced. Additionally, our analysis proves the intuitve fact that lower noise levels uniformly enhance the sample efficiency across all models. In contrast, the task is defined in Li et al., $f(\mathbf{x}) = \sum_{i=1}^d x_i^2 - \sum_{i=d+1}^{2d} x_i^2$, and in Wang et al., $g(\mathbf{x}) = (\sum_{i=1}^d x_{2i}^2 - x_{2i+1}^2) (\sum_{i=d+1}^{2d} x_{2i}^2 - x_{2i+1}^2)$, do not possess the necessary structure to yield the insights mentioned above. Specifically, they cannot show how sample complexity changes with varying the degree of locality and translation invariance in the input, nor can they lay out conditions on the input under which conditions differences between CNNs, LCNs, and FCNs are observed or absent. Furthermore, as highlighted in our paper, the workhorse for the FCN lower bound in both Li et al. (Lemma D.2) and Wang et al. (Appendix H.2) is the quadratic interaction between the first \(d\) coordinates and the subsequent \(d\) coordinates. This signifies an interaction phenomenon in their tasks, which is distinct from what we capture in our task. While this interaction is an interesting phenomenon, it is not the primary characteristic of locality and translation invariance found in images. Nonetheless, we view incorporating signal-signal interaction into our framework represents a promising avenue for future research. *2. Technical challenges and significance of our analysis* Our approach diverges from the approaches used in Li et al. and Wang et al. because the marginal over $\mathcal{X}$ is not rotationally invariant. Specifically, this deviation precludes the use of Benedek Itai style bounds from Li et al., nor do we enjoy the semi-metricness of $l_2$ or $0 \text{-} 1$ loss under an invariant distribution. Recognizing this, Li et al. have previously [noted](https://openreview.net/forum?id=uCY5MuAxcxU&noteId=eEtVx0qv-36) the need for new proof techniques for such distributions as an area for future exploration and our work takes the first step in this direction. To address the above challenge, we developed a novel reduction technique that leveraged input-independent randomization and equivariance to show that both FCNs and LCNs must learn each patch independently from every other patch. This technique not only aids in computation, but also provides a theoretical justification for the intuition for the '$k$ cost' associated with learning each patch independently. Despite the fact that our reduced problem is relatively simpler, it still does not enjoy the semi-metricness on the $l_2$ loss function. To overcome this issue, we introduce a novel variant of Fano's Lemma for input-independent randomized algorithms. This allows to derive the lower bounds without requiring semi-metricness to hold across the entire space. We respectfully disagree with the reviewer's perspective on the significance of our methods. First, our reduction technique through randomization alongwith the tailored variant of Fano's Lemma have broad applicability, and we believe it should be of independent mathematical interest. Additionally, given the widespread use of Gaussian Mixture Models (GMMs), our techniques could offer valuable tools for researchers in this area. Lastly, our techniques could serve as a foundation for future extensions of our work, such as incorporating the aforementioned interaction effects. *3. Benefits of gradient descent analysis over ERM* We acknowledge the reviewer's observation regarding the simplicity of our gradient descent analysis on a one-hidden layer neural network with two iterations. However, our choice of gradient descent over an ERM-style argument was deliberate because of two key theoretical reasons: (1) The gradient descent demonstrates a sample complexity separation between FCNs, LCNs, and CNNs for computationally-efficient (polynomial time) equivariant algorithms. This distinction is crucial because while a separation may exist for computationally inefficient algorithms, the separation might disappear under constraints of computational efficiency. Given that models in practice are trained using efficient algorithms, we believe establishing a separation for efficient algorithms is theoretically important. (2) It is important to ensure that both the upper and lower bounds are derived for equivariant algorithms. This is because having lower bounds for equivariant algorithms and upper bounds for non-equivariant ones does not lead to a valid sample complexity separation as non-equivariant algorithms could potentially be more sample-efficient than their equivariant counterparts. Since, the equivariance of ERM is not clearly established and Wang et al. do not discuss this aspect in their work, we believe a gradient descent analysis is more appropriate. **Comment 2: Shouldn't the current FCN lower bound hold for any width? Why is the width of FCN set to $k$?"** Answer: Our current lower bound may not trivially hold for any width. This is because, unlike the tasks in Li et al. and Wang et al., the lack of rotational invariance of the marginal over $\mathcal{X}$ means that we cannot assume that the $l_2$ loss function is a semimetric on the space of all functions. This limitation prevents us from applying standard Fano's Lemma as in Wang et al. Consequently, we developed a variant of Fano's Lemma, that only requires the semimetric property to hold when two out of three functions are from the set of "hard instances". We had to explicitly establish this semi-metric property for the underlying FCN function class with width equal to $k$. Therefore, we cannot immediately say that the lower bound holds for all widths. We set the width of the FCNs to $k$ so that like LCNs and CNNs, FCNs could also benefit from the denoising capability of the $k$ non-linear filters. **Comment 3: "Why is the norm of $\mathbf{w}_i \le 1$?"** Answer: We imposed the constraint $\mathbf{w}_i \le 1$ primarily to simplify our analysis. We use it to prove the technical requirement of our Fano's Lemma which is that the semimetric property should hold when two out of three functions belong to the set of 'hard instances'. We particularly selected the constant $1$ to enable the filters to capture the unit norm signal $\mathbf{w}^\star$. While this choice might appear somewhat artificial, it does not impact the main takeways of our theory. **Comment 4: "Can the proof outline for step 1 of theorem 6.1 be elaborated more technically?"** Answer: We appreciate the reviewer's request and we will definitely include a more technical explanation in our revised draft. We now outline step 1 of the proof, focusing on the key aspects while avoiding overly intricate details for better understanding. We denote the DSD distrbution by $P$ and its marginal for the $i^{\text{th}}$ patch as $Q_i$. We denote the mean of the positive marginal of each $Q_i$ as $\mathbf{\mu}_i$. The proof begins by noting that the risk of the algorithm, $\bar{\theta}_n$, on $\mathbf{U} \circ P$ is given by the average of the risk of $\bar{\theta}_n$ over each $\mathbf{U} \circ Q_i$. We then use equivariance to show that the risk of $\bar{\theta}_n$ over each $\mathbf{U} \circ Q_i$ is exactly the same, $$\mathbb{E}_{S^n \sim (\mathbf{U} \circ P)^n} \left[ R\left( \bar{\theta}_n, \mathbf{U} \circ Q_i \right) \right] = \mathbb{E}_{S^n \sim (\mathbf{U} \circ P)^n} \left[ R\left( \bar{\theta}_n, \mathbf{U} \circ Q_j \right) \right]. $$ This implies that even though we are "training" the model using $\mathbf{U} \circ P$, it is enough to "test" the model on $\mathbf{U} \circ Q_1$. Since this holds for all $\mathbf{U}$ and $\bar{\theta}_n$, $$\mathbb{E}_{S^n \sim P^n} \left[ R\left( \bar{\theta}_n, P \right) \right] \ge \inf_{\bar{\theta}_n} \sup_{\mathbf{U}} \mathbb{E}_{S^n} \left[ R\left( \bar{\theta}_n, \mathbf{U} \circ Q_1 \right) \right]. $$ We denote $R( \bar{\theta}_n, \mathbf{U} \circ Q_1 ) = R_1( \bar{\theta}_n )$ for brevity. We now begin a series of reductions, first observe that the minimax risk can only decrease if the algorithm was instead supplied with patch separated input data $\{S^{n_i} \sim \mathbf{U} \circ Q_i\}$. This is because the algorithm can simply ignore the patch-id, which is strictly extra information. Please note that with high probability $n_i \approx n/k$, $$ \ge \inf_{\bar{\theta}_n} \sup_{\mathbf{U}} \mathbb{E}_{\{S^{n_i}\}} [ R_1( \bar{\theta}_n ) ]. $$ Next, we provide the algorithm with $S^{n_1}$ and all the required quantities so that it can internally construct $S^{n_2},..S^{n_k}$. It is easy to see that this can only reduce the minimax risk. These quantities include the means $\mathbf{U}\mathbf{\mu}_2, .., \mathbf{U}\mathbf{\mu}_k$ and presampled gaussian noise vector, label vector, and $n_2,..,n_k$. We refer to the presampled quantities by $\xi$. $$ \ge \inf_{\bar{\theta}_n} \sup_{\mathbf{U}} \mathbb{E}_{\{S^{n_1}, \xi\}} [ R_1( \bar{\theta}_n(\{\mathbf{U}\mathbf{\mu}_2, .., \mathbf{U}\mathbf{\mu}_k\}) ) ]. $$ The final step is to note that since $\mathbf{U}\mathbf{\mu}_1, .., \mathbf{U}\mathbf{\mu}_k$ are mutually orthogonal, the only information about $\mathbf{U}\mathbf{\mu}_1$ we get from $\mathbf{U}\mathbf{\mu}_1, .., \mathbf{U}\mathbf{\mu}_k$ is that the former lies in a subspace of size $kd-k+1 \approx kd$. This information is too little and can effectively be ignored, $$ \ge \inf_{\bar{\theta}_n} \sup_{\mathbf{U}} \mathbb{E}_{S^{n_1}} [ R_1( \bar{\theta}_n ) ]. $$ We have now reduced learning $\mathbf{U} \circ P$ with $n$ samples to learning $\mathbf{U} \circ Q_1$ with $n/k$ samples. This marks the end of Step 1. In step 2, we show that $n/k \approx \sigma^2kd$, which proves the result. We sincerely hope that our responses provided clarity on the points raised by the reviewer, particularly regarding the novelty and significance of our work.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully