RamiAw1
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Segmentation-Assisted U-Net: Enhancing Depth Estimation with SAM | **Name** | **Student Number** | |------------------|--------------------| | Joris Weeda | 5641551 | | Rami Awad | 5416892 | | Simon Gebraad | 4840232 | # Recommended Sources * Segment anything model : https://arxiv.org/pdf/2304.02643.pdf * High Quality Monocular Depth Estimation via Transfer Learning : https://arxiv.org/pdf/1812.11941.pdf * NYU depth V2 dataset: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html ## Introduction Depth perception is an important task in many robotics applications, like autonomous driving. Traditionally, this is accomplished using expensive or bulky sensors, like LiDAR or depth cameras. However, with the rise of deep learning, depth estimation can be learned using images from a single camera: monocular depth estimation. This has the opportunity to reduce costs and allows depth perception to be implemented in many more devices equipped with just a single camera. Various models have been proposed for monocular depth estimation, with the main classes being CNN or Transformer based. CNN-based models rely on sliding kernels to extract local image features to estimate depth. However, they are mostly limited to local image features to estimate depth, ignoring the rich contextual information available in the scene, limiting accuracy. The more modern Transformer-based models use the self-attention mechanism to increase the receptive field and extract both local and global information to improve depth estimation. However, Transformers are more expensive to train and introduce additional trainable parameters, and thus require large amounts of data. This increases training times and energy consumption in an era where the impact of Deep Learning on the environment is put into question. Hence, the need arises for a model that is lightweight whilst incorporating contextual information to improve depth estimation accuracy. In transfer learning, a trained model is repurposed for another task, inherently being more efficient through reuse. Significant advancements are being made in computer vision that could be valuable for monocular depth estimation, like ‘Segment Anything’ (SAM) which offers a new powerful segmentation model. Retraining CNN-based networks with additional input from Segment Anything (SAM) could improve their accuracy by providing them with more spatial information, without significantly increasing model size and trainability. In this blog, it is investigated whether the input from SAM improves depth estimation of a CNN-based model compared to the same model without SAM. We base ourselves on the work of Alhashim & Wonka (2018), who proposed a U-net based model based on DenseNet. This model was selected as it has been shown to run and produce acceptable results with limited training times and computational resources. To enable easy collaboration, Google Colab was used. ## Segment anything model (SAM) In this project, the Segment Anything Model (SAM) is utilized for generating masks to segment objects in the RGB images. SAM is a pre-trained model that can automatically generate masks for different objects in an image. The SAM model is based on the Vision Transformer (ViT) architecture, specifically the "vit_h" variant. ViT models have shown great performance in various computer vision tasks, including image classification and object detection. SAM leverages the power of ViT to perform segmentation at the pixel level. The image at the bottom of this section provides a short interpretation of the output of the model. Important to mention is that the masks are colored, the output of SAM can also be transformed to grayscale meaning a single channel. The authors of SAM specifically intend for it to be used in transfer learning. Hence, SAM can be run fairly easily in online environments like Google Colab. Because computational resources were quite limited, it was decided to pre-process all images from the dataset using SAM, rather than using it online during training. This is explained more in the next section. ![](https://hackmd.io/_uploads/rJbqK-JPh.png) *Figure 1: Segment anything model output* ## Dataset The dataset utilized for this project is the NYU Depth V2 dataset, which contains RGB images and their corresponding depth maps. These samples capture various indoor scenes with different objects and structures. This dataset is widely used for depth estimation tasks in computer vision research, and was also used by Alhashim & Wonka (2018). For this project, a specific subset of the NYU Depth V2 dataset was used, which includes fully annotated label maps. The decision to work with this subset was driven by two primary reasons. Firstly, the NYU Depth V2 dataset itself is considerably large, requiring approximately 423 GB of disk space for the raw dataset. The labelled dataset however is roughly 2.8 GB containing more information per image and less diskspace. Due to the limited RAM capacity of the Google Colab environment, it was not feasible to load and process the entire dataset. Hence, the subset of labelled data was chosen to accommodate the memory constraints. Secondly, the selection of the subset was based on the expectation that the available labels would be valuable for evaluating the performance of the model. By utilizing the annotated label maps from the subset, it becomes possible to assess the model's accuracy and effectiveness in depth estimation tasks ### Preprocessing This subset contained 1449 images. First, a set of 199 test images were randomly selected. The remaining 1250 images were then randomly split 80/20 into a training and validation set. Due to the limited computational resources, the dataset was first expanded with the output of SAM. The output of SAM is a dictionary containing various information, like the masks but also the confidence score. Therefore, the outputs of SAM were processed by flattening all masks in different shades of gray into a single channel image. This image was added to the dataset, thereby expanding it with additional information, which could potentially be useful for later use. An example of the information contained in this dataset is shown in the figure below. Note that the label Map is the ground truth, human annotated label map, whereas SAM is generated by a model. ![](https://hackmd.io/_uploads/BkAoC79L3.png) *Figure 2: Example of the images in the dataset* With this expanded dataset, each RGB and SAM image was resized to 640 x 480, which is a requirement for the U-Net model. Like in the paper, each depth map was resized to 320 x 240. Subsequently, RGB, SAM and depth images were normalized between 0 and 1. Then, the SAM was stacked on top of the RGB image to obtain a 4-channel image. Of course, this was not done for the baseline. ## U-net model and training A U-Net model architecture based on Alhashim & Wonka (2018) was utilized for the task of depth estimation in this project. The U-Net model is a popular and effective convolutional neural network (CNN) architecture for image segmentation tasks, first proposed by Ronneberge et al. (2015). The U-Net model employed in this project follows the traditional U-Net architecture with slight modifications to suit the depth estimation task. In essence, the model consists of an encoder pathway that captures the contextual information from the input RGB image and a decoder pathway that recovers the spatial information to generate the predicted depth map. ### Encoder The encoder pathway utilizes a pre-trained DenseNet169 as the backbone. The DenseNet169 is a deep CNN model that has been trained on the ImageNet dataset for image classification tasks. By utilizing the pre-trained DenseNet169, the model can leverage its learned features and benefit from transfer learning. In the encoder pathway, the input RGB image is passed through the DenseNet169 backbone, resulting in 1664 15x20 feature maps. ### Decoder The extracted features from the encoder are then passed to the decoder pathway. The decoder pathway consists of a series of upsampling blocks that progressively recover the spatial information and refine the predictions. Each upsampling block performs bilinear upsampling to increase the resolution and concatenates the upsampled features with the corresponding features from the encoder pathway. The decoder pathway gradually reduces the number of filters as the spatial resolution increases. This reduces the model's complexity while retaining the necessary features for accurate depth estimation. The number of filters used in the decoder pathway decreases from 1664 to 832, 416, 208, and finally to 104. The last layer of the decoder pathway is a convolutional layer with a sigmoid activation function. This layer generates the final predicted depth map. The sigmoid activation ensures that the predicted values are within the range of [0, 1], representing the estimated depth values. ### Loss function In their paper, Alhashim & Wonka (2018) propose a loss function that consists of three parts: * **L_depth**: Point-wise L1 loss on the depth values * **L_grad**: L1 loss defined over the image gradient of the depth image * **L_SSIM**: Structural Similarity loss They reason that the model should not only learn to predict correct depth values, but also correct object boundaries, as depth maps often have distinct edges at the edges of products rather than smooth gradients. This loss function was also used in this project. To evaluate the model's performance, two metrics are used: accuracy and loss. The accuracy function quantifies the degree of similarity between the predicted depth map and the ground truth depth map (where the ground truth depth map represents the actual depth values). The accuracy metric provides an assessment of how well the model captures the true depth information. On the other hand, the loss function serves as a guiding mechanism for the model, minimizing the discrepancy between the predicted depth map and the ground truth depth map. The loss function aims to optimize the model's performance by reducing the difference between the predicted and actual depth maps. ### Training For training of all models, like in the paper, the AdamW optimizer is used with a learning rate of 0.0001 and weight decay of 1e-6. Due to limited GPU memory, batch size first had to be reduced from 8 to 1. Reducing batch size may negatively impact training as the gradient estimation will be more unstable, as well as reducing generalization. However, after some optimizations in the dataloader function, batch size could be increased to 8. We shortly compare the results between batch sizes as well. During training, the input images were also randomly flipped horizontally to provide data augmentation. In the paper, the authors explain that other common augmentation techniques, like vertical flips and rotations, may not contribute to the learning of useful properties of depth. Hence, only horizontal flips were used. For the baseline training, only the RGB image was provided to the model. The weights of the DenseNet encoder were initialized using the pretrained weights on ImageNet, whereas the weights of the decoder were initialized randomly. Still, all layers were trainable. Hence, the encoder was finetuned whilst the decoder was trained from scratch. To evaluate the input of SAM, some modifications to the network were required. The DenseNet backbone requires an input with 3 channels, whereas adding SAM would create a 4 channel input. Two approaches were considered, illustrated in the picture below: 1. Adding a single convolutional layer before the input to the encoder, downsampling 4 channels to 3 channels 2. Adding SAM after the encoder, by passing it through a small convolutional network that downsamples the (640,480,1) input from SAM to (20,15,C), where C is the number of channels (64 or 256). It is then concatenated it with the output of the encoder. This featuremap is then fed into the decoder. ![](https://hackmd.io/_uploads/r1C6y-1D3.png) *Figure 3: Different model configurations* ## Results Here we present the results of our experiments using the SAM and U-Net model for depth estimation. We evaluated different variations of the models, described in the previous section, and analyzed their performance in terms of accuracy and loss. The following table provides an overview of each configuration and its corresponding results. Each model is shortly discussed afterwards *Table 1: Model Performance Metrics (batch size 8)* | Model | Configuration | Accuracy | Loss | | ----- | ---------------------------------| -------- | ------ | | 1 | Baseline | 0.8098 | 0.1253 | | 2 | SAM as Extra Channel | 0.8018 | 0.1296 | | 3 | SAM after Encoder (256 Channels) | 0.8139 | 0.1247 | | 4 | SAM after Encoder (64 Channels) | 0.8110 | 0.1253 | * **Model 1 | Baseline** The baseline model, trained with a batch size of 8, achieved an accuracy of 0.8098 and a loss of 0.1253. This configuration serves as the reference point for comparing the performance of the other variations. * **Model 2 | SAM as Extra Channel** In this variation, an extra channel was added to the SAM model. Despite the additional information provided by the extra channel, the accuracy slightly decreased to 0.8018, and the loss increased to 0.1296 compared to the baseline. This suggests that the extra channel did not significantly improve the depth estimation performance. * **Model 3 | SAM after Encoder (256 Channels)** Using the SAM model as input after the encoder using 256 channels exhibited slightly improved accuracy of 0.8137 and a loss of 0.1258 after the second training round. This variation shows indicates a potential in enhancing the depth estimation performance compared to the baseline. As this configuration showed the best result among all the configurations we decided to execute another testing round, with similar results. The increased channel capacity of the input after the encoder potentialy enables the model to capture more detailed features and produce more accurate depth predictions. * **Model 4 | SAM after Encoder (64 Channels)** Alternatively, we explored the configuration where the output of SAM is used after the encoder which utilizes 64 channels. The model achieved an accuracy of 0.8110 and a loss of 0.1253. While the accuracy is slightly lower than that of the after-encoder configuration with 256 channels, it still demonstrates a slight improvement compared to the baseline. This suggests that even with a reduced channel capacity, the after-encoder input of SAM can effectively capture more relevant information with the current training configuration for depth estimation. ### Influence of Batch size In the initial configurations, the batch size for training was limited to 1 due to the large dataset and models combined with limited training resources. However, through improvements in code efficiency, we were able to increase the batch size up to 8. By comparing the performance of the model using different batch sizes during a training session, we observed that the model faced challenges in generalizing when using low batch sizes, and such batch sizes were more susceptible to overfitting. Visually, the image below highlights the notable differences between the two batch sizes. It is evident that the increased batch size not only demonstrates better generalization but also converges faster during training. ![](https://hackmd.io/_uploads/B1vUyz1P3.png) *Figure 4: Difference in batch sizes* ### Influence of Encoder complexity To test the influence of the encoder complexity on the benefit of adding SAM, the best performing model was also tested using a smaller encoder, namely DenseNet-121. It can be hypothesized that models with smaller encoders will benefit more from additional spatial information from SAM, as smaller encoders may be less able to extract useful spatial information on their own. The results of this evaluation are shown in the table below. *Table 2: Comparing different encoders* | Encoder | Configuration | Accuracy | Loss | | -----------------| ---------------------------------| -------- | ------ | | Densenet-169 | Baseline | 0.8098 | 0.1253 | | Densenet-169 | SAM after Encoder (256 Channels) | 0.8139 | 0.1247 | | | *Difference* | *+0.0041* | *-0.0006*| | Densenet-121 | Baseline | 0.7965 | 0.1270 | | Densenet-121 | SAM after Encoder (256 Channels) | 0.8041 | 0.1307 | | | *Difference* | *+0.0076* | *-0.0037*| As expected, the overall performance drops slightly with a less complex encoder. However, the difference between baseline and SAM-model increases with a simpler encoder, suggesting that the extra spatial information provided by SAM has more of an effect. Overall, our experimental results demonstrate the effectiveness of incorporating SAM and U-Net DepthNet for improved depth estimation. The SAM model after the encoder using 256 channels showcased the highest accuracy among the variations, closely followed by the same configuration but with 64 channels. These findings indicate that increasing the channel capacity in the after encoder configurations slightly enhances the model's ability to capture intricate depth features. Additionally, the baseline results and the SAM model with an extra channel provide valuable insights into the impact of different architectural choices on the depth estimation performance. ## Qualitative analysis By visually inspecting and analyzing the model's output with the ground truth and examining different model configurations, we seek to assess its capability to accurately capture depth information and fine-grained details. We are especially interested in the fact that the SAM-segmentation allows for better grouping of the pictures meaning it should be better in distinguishing objects and therefore also the depth to those objects. We introduce two examples where the baseline is shown and the output of the configuration with SAM as an extra channel and SAM added after the encoder, which have been previously explained. ![](https://hackmd.io/_uploads/BkvmL1GD2.png =350x250)![](https://hackmd.io/_uploads/SkktKJzPn.png =350x250) *Figure 5: Baseline and RGB-image, image example 1 (the closet)* ![](https://hackmd.io/_uploads/S1ZTLyfvh.png) *Figure 6: SAM as extra channel, image example 1 (the closet)* ![](https://hackmd.io/_uploads/r1NaIJMwn.png) *Figure 7: SAM after encoder, image example 1 (the closet)* The baseline model exhibits limited representation of depth details, particularly in the case of the closest object visible in the RGB image. However, both configurations incorporating the SAM output as an input demonstrate significantly enhanced detail. Notably, the edges of the closet and bed appear much sharper, indicating that SAM effectively outlines these objects. ![](https://hackmd.io/_uploads/BkYJT1Mv3.png =350x250)![](https://hackmd.io/_uploads/HyTlpkGw2.png =350x250) *Figure 8: Baseline and RGB-image, image example 2 (the chair)* ![](https://hackmd.io/_uploads/HyHNakGvh.png) *Figure 9: SAM as extra channel, image example 2 (the chair)* ![](https://hackmd.io/_uploads/r1SUpJMv2.png) *Figure 10: SAM after encoder, image example 2 (the chair)* In the second example, our attention is drawn to the chair and the bookshelf situated on the left-hand side of the image. Once again, the baseline model encounters difficulties in accurately distinguishing between the sections of the bookshelf and capturing the lower portion of the chair. Interesting to see is the SAM output also exhibits challenges in representing the lower side of the chair. However, despite these challenges, both SAM-configurations perform considerably well, as evidenced by the increased level of detail observed in the bookshelf and chair regions. ## Discussion Though the results show some improvement with regards to depth estimation, the difference with the baseline is small. There are various potential causes for this. Firstly, the limited size of the dataset. To enable input of SAM into the model, extra model layers were added which also require training. Compared to the DenseNet encoder, these layers have to be trained from scratch. The limited size of the dataset could mean that these layers are unable to extract the most useful features from the extra SAM input, limiting the ability of the model to use the additional information. This is compounded by the limiting compute power available, meaning the size of the extra models was limited. For example, increasing the number of channels of SAM to 512 was not possible due to limited GPU memory. Additionaly, it was not possible to train on large datasets as training times would be very long. Another possibility is that the choice of the encoder, Densenet-169, has limited the added benefit of SAM. Densenet-169 is a very deep Conv-Net, going from (640x480x3) to (20x15x1664). Potentially, this allows it to extract spatial features very well already, meaning that the spatial information SAM adds is redundant. A simpler encoder, Densenet-121 was also evaluated, and it was found that the added benefit of SAM increased. However, it should be noted that Densenet-121 is still a very large model which potentially still limited the benefit of SAM. Hence, future research could focus on assessing the benefit of SAM with much smaller encoders. Additionally, given the promise of method 2 for increasing accuracy of the model, it would be interesting to test whether further increasing the size of the ConvNet applied to SAM also further improves results. This would probably require a larger dataset, to ensure this additional model is trained properly. ## Conclusion In conclusion, incorporating the Segment-Anything Model (SAM) into the U-Net model to enhance depth estimation showed small improvements in some tested variations. The configuration where SAM was applied post-encoder with 256 channels,showed the overall best results with the highest accuracy and lowest loss score, and thus despite minor enhancements, surfaced as the most accurate model outperforming the baseline model. Constraints, such as limited dataset size and computational resources, presented significant challenges during our study. A clear example of these challenges was the need to reduce the batch size initially to circumvent GPU memory constraints. However, subsequent optimization allowed for an increase in batch size to 8. Our results highlight the benefits of combining SAM with U-Net for depth estimation applications. The versions with SAM after the encoder showed improvements over the baseline, albeit slight ones, which indicate the combined approach's potential. In particular, the SAM implementation enhanced the model's capacity to predict depth by supplying useful spatial information. However, the modest improvements show that SAM and U-Net integration's full potential is still unrealized. Future work should investigate the effects of bigger channel sizes post-encoder, experiment with other encoder configurations, and further optimize the implementation of SAM within the U-Net model. It may be possible to improve the model's performance and generalizability by combining this with more powerful computing capabilities and bigger training datasets. When looking at the results visually, we can conclude that the visual analysis of the model's output and comparison of different configurations demonstrate the efficacy of incorporating SAM for depth estimation. The baseline model exhibits limitations in capturing fine-grained details, particularly in objects that are closer to the camera. However, both SAM configurations,with SAM added after the encoder and especially SAM as an extra channel, showcase improved performance. The SAM-enhanced models effectively outline objects, resulting in sharper edges and increased detail compared to the baseline.Where the output generated by SAM as an extra channel even show more details in the depth image as observed in figure 8. These findings support the hypothesis that SAM aids in better grouping and distinguishing objects, thereby enhancing depth estimation. Reflecting on these results, it is clear that the incorporation of SAM into the U-Net model presents opportunities for improving depth estimates. Despite the slight benefits found in this study, the improvements across several variants points in the direction of a promising area for further investigation. The increased benefit of SAM for a smaller encoder shows a promising direction for combining SAM with small, lightweight models, which can enable accurate depth estimation on simple hardware. With further refinement and resource augmentation, we are optimistic that the integration of SAM and U-Net could yield significant advancements in the field of depth estimation tasks. ## References Alhashim, Ibraheem & Wonka, Peter. (2018). High Quality Monocular Depth Estimation via Transfer Learning. Ronneberger, O., Fischer, P., Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science(), vol 9351. Springer, Cham. https://doi-org.tudelft.idm.oclc.org/10.1007/978-3-319-24574-4_28

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully