Tanli Hsu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 100 Day Challenge - ML Advanced Topics ## Paper Note ### [Neural Architecture Search: A Survey](https://arxiv.org/pdf/1808.05377.pdf) ###### `NAS` **NAS Overview** - All possible neural network architectures form a space. - NAS searchs through this space for optimal architecture for a given dataset and task. - NAS Abstract Overview ![](https://i.imgur.com/pUwYKgG.png) - Three Major Components: - **Search Space** - **Search Strategy** - **Performance Estimation** **Search Space** - The search space defines which architectures can be represented in principle. - Prior knowledge for a specific task helps to reduce search space, but introduces human bias, preventing from finding novel architecture. - Common Search Spaces: 1. Chain-Structured Networks 2. Multi-Branch Networks 3. Cell/Block-Based Networks - How to choose **meta-architectures** ? **Search Strategy** - The search strategy details how to explore the search space. - Convergence need to be quick and avoid local optimal - Historical Search Strategies 1. Random search 2. Bayesian optimization 3. Evolutionary methods 4. Reinforcement learning(RL) 5. Gradient-based methods. 6. Hierachical-based search **Performance Estimation Strategy** - the process of estimating found architecture's predictive performance on unseen data. - The simplest way of doing this is to train A on training data and evaluate its performance on validation data. -> Computationally expensive - Solution 1: Lower Fidelities of the actual performance - shorter training times - training on a subset of the data - lower-resolution images - less filters per layer. - Drawback: introduce human bias and change the ranking on architectures. - Solution 2: Learning Curve Extrapolation - extrapolate initial learning curves and terminate those predicted to perform poorly to speed up the architecture search process. - Solution 3 - Network Morphisms - Initialize the weights of novel architectures based on weights of other architectures that have been trained before. - Allow search spaces without an inherent upper bound on the architecture’s size - Strict network morphisms can only make architectures larger and may thus lead to overly complex architectures. - Solution 4 - One-Shot Architecture Search - treats all architectures as different subgraphs of a supergraph (the one-shot model) and shares weights between architectures that have edges of this supergraph in common. --- ### [DARTS: Differentiable Architecture Search](https://arxiv.org/pdf/1806.09055.pdf) ###### `NAS` - Purpose: Find a search strategy - Method: continuous relaxation -> can apply gradient-based search - Three Major Methods for NAS search: - Evolution - Reinforced Learning - Continuous Relaxation (Proposed in this paper) - [Code](https://github.com/quark0/darts) - Search for a computation cell as the building block of the final architecture. - A cell is a directed acyclic graph consisting of an ordered sequence of N nodes. - Continuous Relaxation: ![](https://i.imgur.com/I3YuH8j.png) - Differential Architecture Search Algorithm: ![](https://i.imgur.com/wHvN5Sc.png) --- ### [Accelerating neural architecture search using performance prediction](https://arxiv.org/pdf/1705.10823.pdf) ###### `NAS` - Learning curve-based approach > Human experts are quite adept at recognizing and terminating suboptimal model configurations by inspecting their partial learning curves. In this paper we seek to emulate this behavior and automatically identify and terminate subpar model configurations in order to speedupboth meta-modeling and hyperparameter optimization methods. - Find learning curve for networks obtained through NAS Search, and early-terminate training if learning curve is not good. - Some interesting related topics: - **Neural Network Performance Prediction**: How to predict performance "DURING" training process? - Meta-modeling: define meta-modeling as an algorithmic approach for designing neural network architectures from scratch. - Hyperparameter Optimization: e define hyperparameter optimization as an algorithmic approach for finding optimal values of design-independent hyperparameters such as learning rate and batch size, along with a limited search through the network design space. - This research tried to do NN Performance Prediction by modeling learning curves. How? ![](https://i.imgur.com/V5RfPQG.png) --- ### [Understanding and simplifying one-shot architecture search](http://proceedings.mlr.press/v80/bender18a/bender18a.pdf) ###### `NAS` - Also focus on NAS search strategies - Aim to understand weight sharing for one-shot architecture search - Can efficiently identify promising architectures without either hypernetworks or RL. - Weight Sharing: rather than training thousands of separate models from scratch, one can train a single large network capable of emulating any architecture in the search space. - Ex. ![](https://i.imgur.com/5V9PCzF.png) - As the example above shown, we can apply either a 3x3 convolution, a 5x5 convolution, or a max-pooling layer at a particular position in the network. - Instead of training three separate models, we can train a single model containing all three operations (the one-shot model). - Size of choice for one-shot model grows linearly. - Only used to rank architectures in the search space. - One-Shot Model - working steps: 1. Design a search space that allows us to represent a wide variety of architectures using a single one-shot model. 2. Train the one-shot model to make it predictive of the validation accuracies of the architectures. 3. Evaluate candidate architectures on the validation set using the pre-trained one shot model. 4. Re-train the most promising architectures from scratch and evaluate their performance on the test set - One-shot Diagram: ![](https://i.imgur.com/IjCpyGJ.png) - Argue that NAS search only need gradient descent, don't need hypernetwork or reinforced learning. - Main comparison: the SMASH paper. - It is really too much simplified one-shot model... --- ### [SMASH: one-shot model architecture search through hypernetworks](https://arxiv.org/pdf/1708.05344.pdf) ###### `NAS` - "HyperNet" approach: Use a neural net to generate weights of the main model. - One-Shot HyperNet: Not fully train a candidate until ranking is done to prevent high computational cost. - Limitation: Only consider connectivity patterns and unit per layer. Cannot address other hyperparameters such as regularization, learning rate, weight initialization, or data augmentation. - Training process: > At each training step, we randomly sample a network architecture, generate the weights for that architecture using a HyperNet, and train the entire system end-to-end through backpropagation. - SMASH Algorithm: ![](https://i.imgur.com/qNgxzdv.png) - Two core components: - Method for sampling architecture: - Use a memory bank view of feed-forward networks that permits sampling complex, branching topologies, and encoding said topologies as binary vectors. ![](https://i.imgur.com/iqzBJdR.png) - Method for sampling weights for a given architecture (HyperNet) - A HyperNet is a neural net used to parameterize the weights of another network, the main network. --- ### [Efficient architecture search by network transformation](https://pdfs.semanticscholar.org/65d5/5e12ce62d8f2fef6e450d3afb815af81a591.pdf) ###### `NAS` - Reinforced learning approach: use a reinforcement learning agent as the meta-controller for growing the network depth or layer width with function-preserving transformations. - Limitation: experiment only on architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) - In theory EAS can explore space other than plain CNN (with skip-connections etc) but not tested in experiment - EAS RL-based Meta-controller overview: ![](https://i.imgur.com/F89yUUc.png) - Actor Networks: - Net2Wider Network: ![](https://i.imgur.com/KZAI3ei.png) - Net2Deeper Network: ![](https://i.imgur.com/hLAK2D8.png) --- ### [Understanding Locally Competitive Networks](https://arxiv.org/abs/1410.1165) ###### `Activation` - Activation Functions mentioned in this paper: - rectified linear (ReL (Glorot et al., 2011)) - maxout (Goodfellow et al., 2013a) - LWTA (Srivastava et al., 2013) - Depart from the conventional wisdom in that they are not continuously differentiable (and sometimes non-continuous) and are piecewise linear. - Something good about them... - Can be trained faster and better than sigmoidal networks - Not requiring unsupervised training for weight initialization (Glorot et al., 2011) - Better gradient flow (Goodfellow et al., 2013a) - Mitigation of catastrophic forgetting (Srivastava et al., 2013; Goodfellow et al., 2014) ![](https://i.imgur.com/wVCEXi5.png) - Figure 1: Comparison of rectified linear units (ReLUs), local winner-take-all (LWTA), and maxout activation functions. - This paper treat them as 'Model of Models' - Locally Competitvie Networks: Neural networks with activation functions like rectified linear, maxout and LWTA are locally competitive: local competition among units in the network decides which parts of it get activated or trained for a particular input example. --- ### [Fast and Accurate Deep Network Learning by Exponential Linear Units](https://arxiv.org/abs/1511.07289) ###### `Activation` - ELU: Exponential Linear Unit - Formula: ![](https://i.imgur.com/ku2IJvf.png) - Comparing with other activation functions, e.g. rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized ReLUs (PReLUs): - Similarity: 1. Can alleviate gradient vanishing via the identity for positive values. - Difference: 1. In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero like batch normalization but with lower computational complexity. 2. Mean shifts toward zero speed up learning by bringing the normal gradient closer to the unit natural gradient because of a reduced bias shift effect. 3. LReLUs and PReLUs have negative values, too, but they do not ensure a noise-robust deactivation state. 4. ELUs saturate to a negative value with smaller inputs and thereby decrease the forward propagated variation and information. 5. In experiments, ELUs lead not only to faster learning, but also to significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. --- ### [Group Normalization](https://arxiv.org/abs/1803.08494) ###### `Normalization` - Group Normalization (GN) is proposed as an alternative to Batch Normalization in scenarios where small batch size is required. - Drawback of BN: - BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. ![](https://i.imgur.com/7ingqpQ.png) - In applications that require small batch size due to memory constrain, BN performance is limited. - GN Basic Idea: - divides the channels into groups and computes within each group the mean and variance for normalization. - Independent to batch size, and its accuracy is stable in a wide range of batch sizes. - Other approaches to avoid small batch size problem: - [Layer Normalization (LN)](https://arxiv.org/abs/1607.06450) - [Instance Normalization (IN)](https://arxiv.org/abs/1607.08022) - Formula: ![](https://i.imgur.com/EsbIBGh.png) - Python(Tensorflow) Code: ``` def GroupNorm(x, gamma, beta, G, eps=1e−5): # x: input features with shape [N,C,H,W] # gamma, beta: scale and offset, with shape [1,C,1,1] # G: number of groups for GN N, C, H, W = x.shape x = tf.reshape(x, [N, G, C // G, H, W]) mean, var = tf.nn.moments(x, [2, 3, 4], keep dims=True) x = (x − mean) / tf.sqrt(var + eps) x = tf.reshape(x, [N, C, H, W]) return x ∗ gamma + beta ``` --- ### [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) ###### `Normalization` - Problem - internal covariate shift: the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. - Proposed solution: normalizing layer inputs for each training mini-batch. - Good: higher learning rates, less careful about initialization - Internal Covariate Shift: the change in the distributions of internalnodes of a deep network, in the course of training. - Batch Normalization Transform: ![](https://i.imgur.com/JNdOZoo.png) - Training Batch-Normalized Network: ![](https://i.imgur.com/47SQcip.png) --- ### [Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks](https://arxiv.org/abs/1807.07362) ###### `HYPERPARAMETER` - Main idea: using a lower-dimensional representation of the original data to quickly identify promising areas in the hyperparameter space. - Use different search algorithms for hyper-parameter optimization: - IIS (Main proposed approach) - Random search - Tree of parzen estimators (TPEs) - Sequential model-based algorithm configuration (SMAC) - Genetic algorithm (GA) - Argument: hyperparameter optimization problems usually have a low effective dimensionality: even though there is a signi¯cant number of hyperparameters, often only a subset of them has a measurable impact on the performance. - This work focus on CNN - IIS Approache ![](https://i.imgur.com/0LzEPGB.png) --- ### [A Walk with SGD](https://arxiv.org/pdf/1802.08770.pdf) ###### `GradientDescent` - Study based on empirical observations. - Study the DNN loss surface along the trajectory of SGD - Discovery: - The loss interpolation between parameters before and after each training iteration’s update is roughly convex with a minimum (valley floor) in between for most of the training. - 'Bouncing between walls at a height' mechanism: - For most of the training update steps, SGD moves in valley like regions of the loss surface by jumping from one valley wall to another at a height above the valley floor. - Helps SGD traverse larger distance for small batch sizes and large learning rates - A small batch size injects noise facilitating exploration. This mechanism is crucial for generalization. - Question: > Currently there is little theory proving they help in improving generalization in DNNs. This raises the question of whether optimization algorithms that are designed with the goal of solving the aforementioned high dimensional problems also help in finding minima that generalize well, or put differently, what attributes allow optimization algorithms to find such good minima in the non-convex setting. - SGD trajectory: 1. The loss interpolation between parameters before and after each iteration’s update is roughly convex with a minimum (valley floor) in between. SGD bounces off walls of a valley-like-structure at a height above the floor. 2. Learning rate controls the height at which SGD bounces above the valley floor. 3. Batch size controls gradient stochasticity which facilitates exploration (visible from larger parameter distance from initialization for small batch-size). 4. The valley floor along SGD’s path has many ups and downs (barriers) which may hinder exploration. Thus using a large learning rate helps avoid encountering barriers along SGD’s path by maintaining a large height above valley floor (thus moving over the barriers instead of crossing them). - Some discovery from related paper: - The stochastic fluctuation in the stochastic differential equation simulated by SGD is governed by the ratio of learning rate to batch size. - Consider SGD as a diffusion process where SGD is running a Brownian motion in the parameter space. (interesting...) - Difference: > Contrary to these claims, we find that interpolating the loss surface traversed by SGD on a per iteration basis suggests SGD almost never crosses any significant barriers for most of the training. --- ### [Neural Machine Translation and Sequence-to-sequence Models: A Tutorial](https://arxiv.org/abs/1703.01619) #### Machine Translation - Formalization: Source sentence: $F=f_1,\dots,f_J=f_1^{\mid F \mid}$ Target sentence: $E=e_1,\dots,e_I=e_1^{\mid E \mid}$ Any type of f translation system can be defined as: $\hat{E}=mt(F)$ - Statistical machine translation: Maximize probability E given $F$, $P(E \mid F;\theta)$ $\hat{E}=argmax_E P(E \mid F; \theta)$ $\theta$ ~ parallel corpora, learned from data. #### n-gram Language Models - n-gram Model: the probability of a sentence $E=e_1^T$: $P(E)=P(|E|=T,e_1^T)=\prod_{t=1}^{T+1}P(e_t \mid e_1^{t-1})$ - Count-based n-gram Language Models: Count the words/strings in training data. Maximum Likelihood Estimation: $P_{ML}(e_t \mid e_1^{t-1})=\frac{c_{prefix}(e_1^t)}{c_{prefix}(e_1^{t-1})}$, where $c_{prefix}(\bullet)$ is the count of the number of times this particular word string appeared at thebeginning of a sentence in the training data. Only count in previous $n-1$ window: $P(e_t \mid e_1^{t-1}) \approx P_{ML}(e_t \mid e_{t-n+1}^{t-1})$ The $\theta$ in this model: $\theta_{e_{t-n+1}^t}=P(e_t \mid e_{t-n+1}^{t-1})$ - Dealing with two-word strings that has never appeared in the training corpus: - smoothing probability - Standard: [Modified Kneser-Ney smoothing](https://arxiv.org/pdf/cmp-lg/9606011.pdf) - Handling Unknown Words: - Assume closed vocabulary - Interpolate with an unknown words distribution - Add an \<unk\> word: - Other Topics: - Large-scale language modeling - Language model adaptation - Longer-distance language count-based models - Syntax-based language models #### Log-linear Language Models - Like n-gram, calculating the probability of particular word $e_t$ given a particular context $e_{t-n+1}^{t-1}$, but using the following procedure: 1. Calculating features 2. Calculating scores 3. Calculating probabilities - Features: $Context \rightarrow \phi(e_{t-n+1}^{t-1}) \rightarrow \mathbf{x} \in \mathbb{R}^{N}$ where $\mathbf{x}$ is one-hot vector - Scores: score $s = W\mathbf{x}+\mathbf{b}=\sum_{j:x_j\neq0}W_{\centerdot,j}\centerdot x_{j}+\mathbf{b}$ where $W=\mathbb{R}^{|V| \times N }$ is a weight matrix, and $\mathbf{b} \in \mathbb{R}^{|V|}$ is a bias vector. - Probabilities: $p_{j}=\frac{exp(s_{j})}{\sum_{\tilde{j}}exp(s_{\tilde{j}})} \Rightarrow \mathbf{p}=softmax(\mathbf{s})$ - Model parameters learnt from data: Loss function: $l(\varepsilon_{train}, \theta)=-log(\varepsilon_{train}|\theta)=-\sum_{E \in \varepsilon_{train}}logP(E|\theta)$ Loss per-word: $l(e_{t-n+1}^{t}, \theta)=logP(e_{t}|e_{t-n+1}^{t-1})$ Learning rate $\eta$ and $\theta$: $\theta \leftarrow \theta - \eta \frac{dl(e_{t-n+1}^{t}, \theta)}{d\theta}$ #### Neural Networks and Feed-forward Language Models - A tri-gram model example: ![](https://i.imgur.com/SHvbbL8.png) $\mathbf{m} = concat(M_{., e_{t-2}}, M_{.,e_{t-1}})$ $\mathbf{h}=\tanh(W_{mh}\mathbf{m}+\mathbf{b}_{h})$ $\mathbf{s}=W_{hs}\mathbf{h}+\mathbf{b}_{s}$ $\mathbf{p}=softmax(\mathbf{s})$ <br/> Loss function: $l = -log(p_{e_{t}})$ #### Recurrent Neural Network Language Models

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully