Samarth Bhatia
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- type: slide slideOptions: transition: 'slide' --- <style> :root { --r-background-color: #ffdee9; --r-main-font: Lato, Ubuntu; --r-main-font-size: x-large; --r-main-color: #222222; /* --r-heading-font-size: ; */ --r-heading-color: #ff84ac; --r-code-font: Monolisa; --r-selection-color: #ff0077; --r-selection-background-color: #36e1e6aa; } .reveal code { color: #f07; background-color: #ff007744; border-radius: 10px; } .reveal pre code { color: #ccc; background-color: #2d2d2d; font-family: Hack; max-height:500px; } .reveal { font-family: var(--r-main-font) !important; } </style> <!-- .slide: data-background="#ff84ac" style=":#fff;" --> # **Language-based Semantic Segmentation** <div align=right> Samarth Bhatia </div> --- # Open Vocabulary Scene Parsing ### What is it? Model predictions not being limited to a fixed set of categories (`COCO`: 80 classes), and instead being part of a large open dictionary (`WordNet`: 100,000 synsets). ---- Example : If the model has never seen `tricycle`, it still should give a plausible prediction as `vehicle`. They take each class in `ADE20K` dataset and relate it with a synset(synonym set) from `WordNet`, end up with 2019 unique synsets forming a DAG with `entity` being the common root. ---- ![image-20220609200132741](https://i.ibb.co/pj25Kr0/image-20220609200132741.png =800x) *<center> Part of the `concept map` created (The leaves are the specific objects and inner nodes are general concepts). The root is `entity`, since everything is an entity. </center>* --- ### Problem Settings 1. **Supervised**: Testing on the 150 training classes, pixel embedding is compared with all 150 concept embeddings and highest rank 2. **Zero-shot**: Tested on unseen validation classes, taken classes above a threshold to be predictions. (this threshold is determined before testing from 100 validation images) --- ### Framework overview ![image-20220609194718896](https://i.ibb.co/JdJPLMD/image-20220609194718896.png) ---- A `max-margin` loss is used to learn the embedding function $f(\cdot)$ for mapping the concept space to joint embedding space. They argue that since label retrieval is a ranking problem, negative labels should be introduced to push scores of positive labels to be larger than those of negative. Initially, they use a `max-margin` loss for learning the mapping $g(\cdot)$ from pixel feature space to the joint embedding space, but find that using `softmax` in the form of a triplet loss performs better. ---- $$ \begin {align} \newcommand\ddfrac[2]{\frac{\displaystyle #1}{\displaystyle #2}} \mathcal{L}_{image}(x_{i,j}) &= -\log\bigg(\ddfrac{e^{S_{image}(f(y_{i,j}), g(x_{i,j}))}}{e^{S_{image}(f(y_{i,j}), g(x_{i,j}))} + \sum_{y'_{i,j}}{e^{S_{image}(f(y'_{i,j}), g(x_{i,j}))}}}\bigg) \\ where,\ x_{i,j} &= pixel\ features\ of\ the\ (i,j)^{th}\ pixel\\ y_{i,j} &= label\ of\ the\ (i,j)^{th}\ pixel\\ y'_{i,j} &= negative\ labels\ for\ the\ (i,j)^{th}\ pixel \end{align} $$ ---- Their **'Image Stream'** uses an adapted version of VGG-16 (to make the embeddings have a dimension equal to the word concept embeddings). Also, (in the latent space), they fix the norms of the image embedding pixels to `30` to improve numerical stability (since pixel embeddings are the most specific concept in the joint embedding space). The **'Concept Stream'** is trained first and the trained word embeddings are used as initializations for training loop. --- ### Metrics They use standard metrics (per-pixel accuracy, mean accuracy, mIOU, weighted IOU), alongwith 1. open vocab metrics like **hierarchical precision, recall and F-score** which depends on the **depth of the word concept** in the whole concept map. 2. Information content ratio: defined as $-log(probability)$, (probability is taken as the **frequency** of that **concept and its hyponyms** in the whole dataset) --- ### Results/Conclusion **Supervised**: They were not able to beat the baseline score of multi class classification using the same CNN (`Softmax`). Interestingly, another baseline (`Conditional Softmax`) which was specifically designed for hierarchical classification was also less than `Softmax`. Only standard metrics (accuracy, mean accuracy, mIOU, wIOU) were used to compare models. ---- **Zero-shot**: Here, however, they were able to consistently perform better than the baselines. They also find that using the **asymmetric** scoring function gives a significant improvement. Only hierarchical metrics and information content ratio were used for comparisons here. ---- Qualitatively, they show that in places where the model is **unsure** of the specific object, it correctly predicts a more **general concept**. *For example,* in a rocking chair, the top part looks like a **chair** so it classifies that correctly, but the bottom part is **not like a normal chair**, and since it hasn't seen that particularly, it classifies it as **' furniture'**, which is plausible and human-like. ---- They also do a 'concept search' in the embedding space to show that though baseline models can learn specific objects equally well, when **more abstract terms are 'searched'** for in the joint embedding space, their **model is still able** to detect them in images whereas **baseline models aren't**. ![image-20220612031602792](https://i.ibb.co/cx6zg0k/image-20220612031602792.png) ---- They also show that because objects like `chair` and `bench` are close in the joint embedding space, so by looking in the vicinity of `chair`, they hypothesize that they will find `sittable` objects. --- End. --- # CLIP (Contrastive Language-Image Pretraining) ### What is it They show that transformers are not good at zero-shot learning. So, they improve it by employing a bag-of-words objective and employ a contrastive objective, showing improvements over simply predictive objective. They pretrain a large scale model that can perform multiple tasks. ---- ![image-20220613033036647](https://i.ibb.co/NpDHy6V/image-20220613033036647.png) --- ### Contrastive Objective ![image-20220613044900004](https://i.ibb.co/6BtJrf9/image-20220613044900004.png =500x) In a batch of $N$ (image, text) pairs, they take all possible pairings of images and text ($N^2$) and train `CLIP` to predict which out of those possible pairings actually occurred. ---- They do this by maximizing the agreement (via cosine similarity) of the $N$ correct pairs, and, pushing away/reducing agreement between the $N^2 - N$ negative pairs. --- ### Framework Overview ![image-20220613044900004](https://i.ibb.co/6BtJrf9/image-20220613044900004.png) They train `CLIP` from scratch on their `WebImageText` dataset containing ~400 million images. ---- **Image Encoder**: Because of the wide variety of architectures and designs available, they ended up choosing two architectures. - One is based on **ResNet50**, with a modifications to the layers and **replacing global average pooling with 'attention pooling'**. They mention > 'transformer-style' multi head QKV attention, query is conditioned on the global average-pooled representation of the image - The other one is based on the recent **Vision Transformer (ViT)**. They make only minor changes to this architecture. ---- - They argue that for the ResNet based encoders, increasing one dimension alone (either depth, width or resolution) is less beneficial than increasing all dimensions together (keeping the computing resources same). **Text Encoder**: The text encoder is taken as a transformer with some previously published modifications. They only scale the width of this encoder as they find that `CLIP` is less sensitive to the text encoder. ---- ### Training Details **ResNets**: They train 5 models (`ResNet50`,`ResNet101`,`"Efficient-Net" style RN50x4`,`RN50x16`,`RN50x64`) **ViT**: They train 3 models (`ViT-B/32`,`ViT-B/16`,`ViT-L/14`) > The largest ResNet model, `RN50x64`, took 18 days to train on 592 V100 GPUs while the largest Vision Transformer took 12 days on 256 V100 GPUs. --- ### Zero-shot performance They use this pre-trained (on `WebImageText` dataset) `CLIP` model and test the zero-shot transfer ability on other CV datasets like `ImageNet`, `aYahoo` and `SUN`, showing a significant improvement above `Visual N-Grams`. Also, they test it against a fully supervised logistic regression trained on the features of `ResNet50` and beat it on 16 out of 27 datasets. They note that `CLIP` performs worse in more specialized datasets like satellite images, lymph node tumors, traffic sign recognition, etc. Further, they also compare their zero shot results with few-shot linear probes and show that they outperform them. ---- ![image-20220613054023910](https://i.ibb.co/5nyM12s/image-20220613054023910.png =x800) --- ### Discussion They talk about natural distribution shift and deep models exceeding accuracy on ImageNet, while in reality, more robust/better metrics show that that is not the case. They also use **Effective robustness** and **Relative robustness**, which are made to measure improvements in accuracy under distribution shift, and out-of-distribution accuracy respectively. They also argue that because a zero-shot model cannot exploit the patterns of a specific dataset/ distribution, they empirically have more **effective robustness** than few shot models. They showed that the overlap in the datasets was also very low (average 3.2%), and the maximum improvement in accuracy is only 0.6%, which is in line with other large scale pre-trained models. Other than this, they briefly talk about the societal impact and privacy/risk implications because of `CLIP` etc. --- # LSeg : Language-Driven Semantic Segmentation **Problem setting: Zero-shot segmentation** **One-line approach:** Use the text encoder of models like `CLIP`, train a separate visual encoder to produce pixel embeddings close to the label embeeddings in a joint embedding space. ![image-20220617054058843](https://i.ibb.co/g9wX3bj/image-20220617054058843.png) **Advantage: flexibility**, i.e. being able to segment different classes within the same image given a different label set. (It can also segment with a label that is close to another label in the embedding space, i.e. given `pet` as a label, it classifies the `dog` as `pet`) --- ### Framework They use only the text part of CLIP, discarding the image encoder and training their own image encoder architecture based on `Dense Prediction Transformers (DPT)`. <!-- ![image-20220617055000901](https://i.ibb.co/9v9ScpL/image-20220617055000901.png) --> ![image-20220618073026429](https://i.ibb.co/f20BD4v/image-20220618073026429.png) ---- $F$ is calculated as the dot product of the image embeddings $I$ and label embeddings $T$. $$ \begin {align} F_{i,j,k} &= I_{i.j}\cdot T_{k} \\ dimensions:\ I_{i,j} &\in \mathbb{R}^{C},\ \{i,j\}\ represent\ pixels\\ \ T_{k} &\in \mathbb{R}^{C},\ k \in \{1..N\} \\ \ F_{i,j} &\in \mathbb{R}^{N} \end {align} $$ So, they want to maximize the dot product $F_{i,j,k}$ for those pixels $\{i,j\}$ where $y_{i,j} = k$ (GT label). They do this by applying softmax over $k$ on $F_{i,j,k}$ and taking a `CrossEntropy` loss. ---- For the final step, The softmaxed feature block $F$ (equivalent to predictions) is then 'spatially regularized' using a `DepthwiseBlock`(Depthwise Conv) or a `BottleneckBlock`(Depthwise Conv augmented with max-pooling), and is upsampled to the input image's resolution using bilinear interpolation. **Training Details:** They use pretrained weights on `ImageNet` for `ResNet` and `ViT` image encoders, and take random initialization for `DPT`. They freeze the text encoder(the `ViT-B/32` from `CLIP`) while training. They show results that are comparable with 1-shot state-of-the-art(`HSNet`) results, and significantly higher than previous zero-shot models on `PASCAL-5i` and `COCO-20i`. They outperform `HSNet` on `FSS-1000`. ---- They use different text encoders from `CLIP` and compare them. (The text encoder is always a simple `Transformer`, the difference is the image encoder it is co-trained with in the `CLIP` pretraining step). ![image-20220618064113894](https://i.ibb.co/hBjwQWQ/image-20220618064113894.png) --- ### Qualitative Analysis ---- ### Related but unseen labels They show that `LSeg` is able to predict objects belonging to unseen classes close to the points in embedding space. ![image-20220618071709334](https://i.ibb.co/WkgG4qs/image-20220618071709334.png) They show the same behavior with hierarchical unseen labels (i.e. being able to predict correctly when a parent category is present in label set instead of the specific object). ---- ### Failure Cases They mention that since `LSeg` is trained only on positive examples of classes (unlike `CLIP` which had a contrastive objective), it can give wrong predictions sometimes. For example, ![image-20220618072010538](https://i.ibb.co/phwdVN6/image-20220618072010538.png =400x) In this image, it predicts the `dog` as `toy` (when only `toy` and `grass` are provided) because a `dog` is probably closer to a `toy` than `grass` visually and semantically. --- # RegionCLIP Extract regions and their text descriptions from images and use language-image training similar to `CLIP` on these (contrastively). ### Need? Acc. to authors, we cannot directly apply `CLIP` to regions and have it work well because there is a major domain shift (?) and thus has unsatisfactory performance. This is because `CLIP` is trained to **match an image with its image-level description**, and does not know about the **alignment between local image regions and text descriptions of those regions**. ![image-20220704094919156](https://i.ibb.co/Z1SzN3f/image-20220704094919156.png) --- ### Problems: 1. Fine grained alignment between image regions and text is not usually available, expensive to annotate. 2. Image-level descriptions might leave out the description of some objects in the image. ### Solution: ​ Bootstrap from a pretrained language-vision model (CLIP) and fill in the missing region descriptions and then align them with proposed regions based on a metric. ![image-20220704095011074](https://i.ibb.co/D7bD0Gn/image-20220704095011074.png) ---- # Framework ![](https://i.imgur.com/ojzQ3yC.png) ---- They make region descriptions by filling 'object concepts'(from concept pool) into prompts and then, using a `teacher` model $\mathcal{V}_t$ (from `CLIP`), and sees which region (proposed by the Region Proposal Network, `RPN`, pretrained) aligns with the region description the most, and assigns it to that. Once these region-text pairs are generated, the new encoder can be contrastively trained on these, similar to `CLIP`'s contrastive language-image pretraining. They use `RoIAlign` to extract the region's visual features from the encoder $\mathcal{V}$, which pools regional features from the image's feature map using interpolation. $\mathcal{V}$ takes initial weights from $\mathcal{V}_t$ for a good start in the visual-semantic space. ---- ### Details The `CC3M` (contextual captions dataset) was used for training. The region descriptions are made by filling the concepts from concept pool into prompts, i.e. `kite` is filled into the prompt `a photo of a ....` to make the description `a photo of a kite`. These are then passed through the pretrained language encoder (`CLIP`) to get the semantic text embedding. `Cosine Similarity` is used as the metric of how much the region proposed aligns with some region description for the contrastive loss between region-text pairs $L_{cntrst}$. ---- ### Losses They use a distillation loss $L_{dist}$ in addition to a contrastive loss, which is defined as: $$ L_{dist} = \frac{1}{N} \sum_{i}{L_{KL}(q_i^t, q_i)} $$ where, $q_i^t$ is a 'soft target' = $softmax_j(distance(v_i^t, l_j))$, $v_i^t$ is region's visual features from teacher $\mathcal{V}_t$ and, $v_i$ is region's visual features from $\mathcal{V}$ They also added the contrastive loss at image level $L_{cntrst-img}$ (with negative samples being labels of different images), like `CLIP` to their final loss. So, final loss is $$ L = L_{cntrst} + L_{dist} + L_{cntrst-img} $$ --- # Extensions to object detection and open-vocabulary object detection They extend this framework to object detection by simply using the `RPN` to generate regions and find which one matches the target object class the most, and simple output that as the localization/bounding box for the object. However, no work is done in segmenting the target object. For openvoc object detection, they evaluate the model on 48 base and 17 novel categories for `COCO` and 866 base and 337 novel categories from `LVIS` (general classes are termed as base and specific object classes are termed as novel). --- ![image-20220704113127572](https://i.ibb.co/w7rm0tc/image-20220704113127572.png) --- # OPEN-SET RECOGNITION: A GOOD CLOSED-SET CLASSIFIER IS ALL YOU NEED? **ICLR '22** ---- They show that the closed set accuracy is *highly correlated* to the open set performance. ![](https://i.imgur.com/5wOD2tk.png) ---- - Performed multiple experiments using a variety of models: `ViT`, `ResNet`, `EfficientNet`, `VGG`. - `ViT` doesn't overfit its representation to the training classes and outperforms other methods. Good closed-set performance => Better `OSR` ---- To enhance the closed set performance, they leverage **existing techniques** from **image recognition**: - label smoothing - longer training times - better augmentations - better LR schedules ---- They also try changing the open set scoring rule to `Maximum Logit Score (MLS)`. Using `MLS` gives better performance in `OSR` but softmax normalization is better in combined (`OSCR`) (because softmax normalization cancels the effect of the feature norm) --- # Extract Free Dense Labels from CLIP **ECCV '22** ---- # Using CLIP features for dense prediction - **Failure**: Fine-tuning the image encoder of `CLIP` for segmentation tasks. - Performance is good on seen classes but modified `DeepLabv2` in conjunction with `CLIP`'s text fails to segment novel classes. - Reasons: - The visual-language association of CLIP features should remain intact for best performance. - Loss of generality => Additional mapper trained on seen classes. ---- # MaskCLIP ![](https://i.imgur.com/RsAgVNd.png) **Doesn't modify the CLIP feature space** --- # Comparative Analysis between MaskCLIP and Our Results ![](https://i.imgur.com/5ENdq3v.png) ---- ## Base Class Performance | Image | Ground Truth | Ours (PSPNet) | MaskCLIP (w/o PD and KS) | MaskCLIP (w/ PD and KS) | | ------------------------------------------ | ------------------------------------------ | ------------------------------------------ | ------------------------------------------ | --- | | ![](https://i.imgur.com/dpe5qPz.jpg =350x) | ![](https://i.imgur.com/Ld2JIyo.png =350x) | ![](https://i.imgur.com/H9ZoT2V.png =250x) | ![](https://i.imgur.com/lAN58VO.jpg =350x) | ![](https://i.imgur.com/xqitgmG.jpg =350x)| | ![](https://i.imgur.com/OMOmlcs.jpg =350x)| ![](https://i.imgur.com/qG6EEYZ.png=350x)| ![](https://i.imgur.com/JC9CUti.png =250x)| ![](https://i.imgur.com/HeelpuM.jpg=350x)| ![](https://i.imgur.com/eIlusJi.jpg =350x)| ---- ![](https://i.imgur.com/jqAil0t.png) ---- ## Novel Class Performance | Image | Ground Truth | Ours (PSPNet) | MaskCLIP (w/o PD and KS) | MaskCLIP (w/ PD and KS) | |:------------------------------------------:|:------------------------------------------:|:------------------------------------------:|:------------------------------------------:|:------------------------------------------:| | ![](https://i.imgur.com/qXVlgIh.jpg =300x) | ![](https://i.imgur.com/rf67Q8m.png =300x) | ![](https://i.imgur.com/0zDIV7o.png =x120) | ![](https://i.imgur.com/wCvDHXS.jpg =300x) | ![](https://i.imgur.com/UQDTHq9.jpg =300x) | | ![](https://i.imgur.com/pr2M2sL.jpg =300x) | ![](https://i.imgur.com/vArHWAe.png =300x) | ![](https://i.imgur.com/PfKdnKl.png =x120) | ![](https://i.imgur.com/AiOtQhB.jpg =300x) | ![](https://i.imgur.com/YPRk0J0.jpg =300x) | | ![](https://i.imgur.com/1T7tZoh.jpg =x150) | ![](https://i.imgur.com/Afs7Jwi.png =x150) | ![](https://i.imgur.com/nyD3mVJ.png =x150) | ![](https://i.imgur.com/WNi3F83.jpg =x150) | ![](https://i.imgur.com/vNvyQEo.jpg =x150) | ---- | Class | IoU | Acc | Prec | |-------------|-------|-------|-------| | aeroplane | 90.65 | 99.87 | 90.75 | | bicycle | 55.04 | 94.25 | 56.95 | | bird | 92.39 | 94.18 | 97.98 | | boat | 52.58 | 94.06 | 54.38 | | bottle | 56.82 | 83.66 | 63.92 | | bus | 90.02 | 95.26 | 94.24 | | car | 83.61 | 93.85 | 88.46 | | cat | 84.9 | 87.19 | 97.0 | | chair | 17.4 | 18.73 | 71.15 | | cow | 53.38 | 64.41 | 75.72 | | diningtable | 57.32 | 86.57 | 62.91 | ---- | Class | IoU | Acc | Prec | |-------------|-------|-------|-------| | dog | 79.62 | 86.45 | 90.97 | | horse | 59.05 | 96.59 | 60.31 | | motorbike | 71.93 | 86.76 | 80.8 | | person | 40.78 | 43.7 | 85.93 | | pottedplant | 59.96 | 78.03 | 72.13 | | sheep | 66.82 | 84.0 | 76.56 | | sofa | 50.45 | 92.7 | 52.54 | | train | 82.8 | 94.33 | 87.13 | | tvmonitor | 64.51 | 91.8 | 68.45 | Summary: | aAcc | mIoU | mAcc | mPrec | |-------|------|-------|-------| | 77.78 | 65.5 | 83.32 | 76.42 | <!-- --- # Timeline | Time | Target | Description | |:-------------------:|:------:|:-----------:| | 26th Oct - 10th Nov | | | | 25th Nov - 8th Dec | | | | 8th Dec - 22nd Dec | | | | 22nd Dec - 6th Jan | | | | 6th Jan - 20th Jan | | | -->

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully