Lyttonkeepgoing
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # HTR Proposal ## Motivation * In the field of Handwritten Text Recognition (HTR), the availability of labeled data remains scarce. Achieving better performance with transformer models requires either a substantial amount of data or pre-trained models. * HTR methodologies are not directly applicable to the Scene Text Recognition (STR) domain. The distinct challenges posed by scene texts, such as varying backgrounds, fonts, and orientations, necessitate tailored approaches. * Conventional Connectionist Temporal Classification (CTC) Loss enables monotonic alignments, suitable for word-level and line-level HTR tasks. However, this restriction limits research opportunities for paragraph-level or page-level tasks. Furthermore, languages without explicit word delimiters, like Chinese and Japanese, pose challenges to detectors attempting to separate characters. * Transformer-based models often encounter efficiency challenges, including large parameter counts and prolonged training and inference times, as observed in models like TrOCR. * While Transformer-based models, apart from TrOCR, show potential, their performance in HTR tasks still falls short of expectations. * Handwritten images contain valuable ink information, surrounded by redundant information. Is it feasible to develop a tailored mask strategy specialized for HTR, thus reducing redundancy? * Certain HTR models require the incorporation of a Language Model (LM). An intriguing avenue of exploration is whether we can design models that alleviate the dependence on such an auxiliary component. * Could a single Transformer architecture potentially replace the need for traditional CNN backbones in HTR applications, leading to simplified architectures with promising performance? ## Related work ### Handwritten Text Recognition **1.Rethinking Text Line Recognition Models** https://arxiv.org/pdf/2104.07787.pdf ![](https://hackmd.io/_uploads/S1bq6ndh3.png) ![](https://hackmd.io/_uploads/Bk7LD2_n2.png) ![](https://hackmd.io/_uploads/BkpaGxc2h.png) **2.Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition** * Encoder/Decoder:LSTM/LSTM w/Attn https://arxiv.org/pdf/1903.07377.pdf ![](https://hackmd.io/_uploads/B1P35n_2n.png) ![](https://hackmd.io/_uploads/Hky8Qg922.png) **3.Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition** * Encoder/Decoder:GCRNN/CTC http://www.tbluche.com/files/icdar17_gnn.pdf ![](https://hackmd.io/_uploads/Hy1tjhdn2.png) ![](https://hackmd.io/_uploads/B1HVNec23.png) ![](https://hackmd.io/_uploads/HyREVgcnh.png) **4.Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition** * Transformer https://arxiv.org/pdf/2005.13044.pdf ![](https://hackmd.io/_uploads/SJ9Rjnd23.png) ![](https://hackmd.io/_uploads/SkDpElqhh.png) ![](https://hackmd.io/_uploads/SJwCVeqh3.png) ![](https://hackmd.io/_uploads/SJI6Bg522.png) **5.Decoupled attention network for text recognition** https://arxiv.org/pdf/1912.10205.pdf * Encoder/Decoder:FCN / GRU ![](https://hackmd.io/_uploads/rJ_LyaO22.png) ![](https://hackmd.io/_uploads/r1GXUg52n.png) ![](https://hackmd.io/_uploads/B19DIecn2.png) **6.TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models** https://arxiv.org/pdf/2109.10282.pdf * Transformer ![](https://hackmd.io/_uploads/r1faladn2.png) ![](https://hackmd.io/_uploads/ByEAA19hh.png) ![](https://hackmd.io/_uploads/rkseyx92n.png) **7.A Scalable Handwritten Text Recognition System** * GRCL block https://arxiv.org/pdf/1904.09150.pdf ![](https://hackmd.io/_uploads/Sy1FB6O3n.png) ![](https://hackmd.io/_uploads/ryniKgq3h.png) **8.Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network** * GFCN block https://arxiv.org/pdf/2012.04961.pdf ![](https://hackmd.io/_uploads/HJp3vp_nn.png) ![](https://hackmd.io/_uploads/SkYa5lq3h.png) ![](https://hackmd.io/_uploads/BkQ-jxcnh.png) **9.DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition** * Encoder/Decoder:FCN / Transformer decoder https://arxiv.org/pdf/2203.12273.pdf ![](https://hackmd.io/_uploads/H1Hhd6On3.png) ![](https://hackmd.io/_uploads/HJYRsx52h.png) **10.End-to-end handwritten paragraph text recognition using a vertical attention network** * Encoder/Decoder:FCN / LSTM https://arxiv.org/pdf/2012.03868.pdf ![](https://hackmd.io/_uploads/rkX7KTOn2.png) ![](https://hackmd.io/_uploads/H1UXyWqn3.png) ![](https://hackmd.io/_uploads/rkN4k-chn.png) ![](https://hackmd.io/_uploads/S1jteW922.png) **11.Are multidimensional recurrent layers really necessary for handwritten text recognition?** * CNN + BLSTM http://www.elvoldelhomeocell.net/pubs/jpuigcerver_icdar2017.pdf ![](https://hackmd.io/_uploads/H1S89T_nn.png) **12.Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks** * GFCN https://arxiv.org/pdf/1812.11894.pdf ![](https://hackmd.io/_uploads/SJfC5pOnh.png) ![](https://hackmd.io/_uploads/HJYS-Wcn2.png) ![](https://hackmd.io/_uploads/SJ3WGWq23.png) **13.OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold** https://openaccess.thecvf.com/content_CVPR_2020/papers/Yousef_OrigamiNet_Weakly-Supervised_Segmentation-Free_One-Step_Full_Page_Text_Recognition_by_learning_CVPR_2020_paper.pdf ![](https://hackmd.io/_uploads/ry05iTd32.png) **14.Best practices for a handwritten text recognition system** https://www.cse.uoi.gr/~sfikas/DAS2022-Retsinas-BestpracticesHTR.pdf ![](https://hackmd.io/_uploads/rJOh5JY2n.png) ![](https://hackmd.io/_uploads/H1xFSW53h.png) **15.CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition** https://arxiv.org/pdf/2303.09347.pdf ![](https://hackmd.io/_uploads/HyX52ktnh.png) ![](https://hackmd.io/_uploads/r1jMuW9h3.png) **16.Attention-based fully gated cnn-bgru for russian handwritten text** https://arxiv.org/pdf/2008.05373.pdf * cnn-bgru ![](https://hackmd.io/_uploads/rkEDTJY23.png) ![](https://hackmd.io/_uploads/B1QvY-c32.png) ![](https://hackmd.io/_uploads/HJF_KW93h.png) **17.Watch your strokes:Improving handwritten text recognition with deformable convolutions** https://iris.unimore.it/bitstream/11380/1204119/2/2020_ICPR_HTR_CR.pdf ![](https://hackmd.io/_uploads/rJRnAkth2.png) ![](https://hackmd.io/_uploads/SysRbfqnn.png) **18.Mask Guided Selective Context Decoding for Handwritten Chinese Text Recognition** ![](https://hackmd.io/_uploads/rkDxOlthn.png) ![](https://hackmd.io/_uploads/H1w4uxF23.png) **19.Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention** https://arxiv.org/pdf/1604.03286.pdf ![](https://hackmd.io/_uploads/BkibWZt22.png) **20.A Light Transformer-Based Architecture for Handwritten Text Recognition** https://hal.science/hal-03685976/file/A_Light_Transformer_Based_Architecture_for_Handwritten_Text_Recognition.pdf ![](https://hackmd.io/_uploads/S1WUf-Y2n.png) ![](https://hackmd.io/_uploads/rJKYQz5nh.png) **21.SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition** https://arxiv.org/pdf/2102.08742.pdf ![](https://hackmd.io/_uploads/ByO-LWF3n.png) ![](https://hackmd.io/_uploads/HJb64z52n.png) **22.An Efficient End-to-End Neural Model for Handwritten Text Recognition** https://arxiv.org/pdf/1807.07965.pdf ![](https://hackmd.io/_uploads/rJeAwXK2h.png) **24.Handwriting Recognition in Low-resource Scripts using Adversarial Learning** https://openaccess.thecvf.com/content_CVPR_2019/papers/Bhunia_Handwriting_Recognition_in_Low-Resource_Scripts_Using_Adversarial_Learning_CVPR_2019_paper.pdf ![](https://hackmd.io/_uploads/Bk7IfaYn3.png) **25.StackMix and Blot Augmentations for Handwritten Text Recognition** https://arxiv.org/pdf/2108.11667.pdf ![](https://hackmd.io/_uploads/ByIqI19nh.png) **26.PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition** https://arxiv.org/pdf/2207.14807.pdf ![](https://hackmd.io/_uploads/S15k5M92n.png) --- --- ### Scene text recognition **1.Self-supervised Implicit Glyph Attention for Text Recognition(CVPR2023)** https://openaccess.thecvf.com/content/CVPR2023/papers/Guan_Self-Supervised_Implicit_Glyph_Attention_for_Text_Recognition_CVPR_2023_paper.pdf ![](https://hackmd.io/_uploads/B1BRq-t23.png) ![](https://hackmd.io/_uploads/HkpyjZF23.png) **2.Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition(The most recent TPAMI)** ![](https://hackmd.io/_uploads/Sk2u3WF33.png) ![](https://hackmd.io/_uploads/B1epFn-t3h.png) **3.Levenshtein ocr** https://arxiv.org/pdf/2209.03594.pdf ![](https://hackmd.io/_uploads/ByDnkztnn.png) **4.Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition** https://openaccess.thecvf.com/content/CVPR2021/papers/Fang_Read_Like_Humans_Autonomous_Bidirectional_and_Iterative_Language_Modeling_for_CVPR_2021_paper.pdf ![](https://hackmd.io/_uploads/r1EKeMK2n.png) ![](https://hackmd.io/_uploads/SyW9xzt2n.png) ![](https://hackmd.io/_uploads/BkEigzt32.png) ![](https://hackmd.io/_uploads/rJ2oxzF2h.png) **5.Multi-Granularity Prediction for Scene Text Recognition** https://arxiv.org/pdf/2209.03592.pdf ![](https://hackmd.io/_uploads/B1Gf-GF32.png) ![](https://hackmd.io/_uploads/HyxQ-zK23.png) **6.An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition** https://arxiv.org/pdf/1507.05717.pdf ![](https://hackmd.io/_uploads/HJ4NEpYnh.png) --- --- ### Mask strategy **1.Good helper is around you: Attention-driven masked image modeling** ![](https://hackmd.io/_uploads/Hk-BEGFh2.png) ![](https://hackmd.io/_uploads/ryb8VGt33.png) **2.Semmae: Semanticguided masking for learning masked autoencoders** https://proceedings.neurips.cc/paper_files/paper/2022/file/5c186016d0844767209dc36e9e61441b-Paper-Conference.pdf ![](https://hackmd.io/_uploads/HyPpVfYnn.png) ![](https://hackmd.io/_uploads/BJZ04fFh2.png) **3.What to Hide from Your Students: Attention-Guided Masked Image Modeling** https://arxiv.org/pdf/2203.12719.pdf ![](https://hackmd.io/_uploads/SJhaBfK2n.png) ![](https://hackmd.io/_uploads/ryEWLMt3n.png) **4.Improving masked autoencoders by learning where to mask** https://arxiv.org/pdf/2303.06583.pdf ![](https://hackmd.io/_uploads/HJcq8fFhh.png) ![](https://hackmd.io/_uploads/H1Ai8GYhh.png) **5.Hard Patches Mining for Masked Image Modeling** https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Hard_Patches_Mining_for_Masked_Image_Modeling_CVPR_2023_paper.pdf ![](https://hackmd.io/_uploads/BkL7vfKnn.png) ![](https://hackmd.io/_uploads/r1r8vfFh2.png) **6.Adversarial masking for self-supervised learning** https://proceedings.mlr.press/v162/shi22d/shi22d.pdf ![](https://hackmd.io/_uploads/r1AIYMF22.png) ![](https://hackmd.io/_uploads/S1gRKzYh3.png) ![](https://hackmd.io/_uploads/H1VQqMFnn.png) **7.Uniform masking: Enabling mae pre-training for pyramid-based vision transformers with locality** https://arxiv.org/pdf/2205.10063.pdf ![](https://hackmd.io/_uploads/H1-suMK3n.png) ![](https://hackmd.io/_uploads/rJ0d9fK32.png) ![](https://hackmd.io/_uploads/BJLt5GY2n.png) ![](https://hackmd.io/_uploads/B1Pq5MFn3.png) **8.Evolved Part Masking for Self-Supervised Learning** https://openaccess.thecvf.com/content/CVPR2023/papers/Feng_Evolved_Part_Masking_for_Self-Supervised_Learning_CVPR_2023_paper.pdf ![](https://hackmd.io/_uploads/rkq3hfY3n.png) ![](https://hackmd.io/_uploads/B1Nl6MF22.png) **9.MaskGIT: Masked Generative Image Transformer** https://arxiv.org/pdf/2202.04200.pdf ![](https://hackmd.io/_uploads/SkRxIEKnh.png) ### Datasets (Todo) IAM READ2016 LAM HKR Synthetic data: MJSynth (MJ) https://www.robots.ox.ac.uk/~vgg/data/text/ SynthText (ST) https://www.robots.ox.ac.uk/~vgg/data/scenetext/ Internal data Scene Text recognition Dataset CASIA ICDAR2013 Competition Dataset SCUT-HCCDoc MTHv2 ## Contributions * Unified Recognition of Chinese and Non-Chinese Handwritten Text: We achieve a unified approach for recognizing both Chinese and non-Chinese handwritten text, addressing the challenges posed by different script systems. * Novel Mask Strategy for Handwritten Text Recognition (HTR): We introduce an innovative masking strategy tailored for the Handwritten Text Recognition (HTR) task, enhancing the model's capability to focus on relevant textual features. * Robust Performance on Large and Small-Scale Datasets: Our proposed model demonstrates robust performance across a wide range of dataset scales, ensuring consistent accuracy on both large and small-scale data. * Efficiency-Enhancing Attributes: Our approach retains an efficient profile while delivering state-of-the-art performance, catering to the demands of real-time and resource-constrained applications. * Unified Excellence in HTR and Scene Text Recognition (STR) Tasks: We showcase the versatility of our model by achieving exceptional performance in both Handwritten Text Recognition (HTR) and Scene Text Recognition (STR) tasks, unifying the capabilities of addressing both domains ## Experiments ### Experimental approach 1. Initial Architecture Exploration: We commence our study by building upon a foundational encoder-decoder architecture. This versatile architecture forms the basis for our investigation, offering applicability to both Chinese and Latin script datasets, and accommodating Scene Text Recognition (STR) tasks. For assessing the model's effectiveness, we first perform an in-depth comparison on the IAM dataset, which serves as a benchmark for handwritten text recognition. Subsequently, we consolidate results from the remaining three handwritten script datasets in a comprehensive tabular format. 2. Refinement through Masking Strategies and Network Adjustments: In our pursuit of enhancing performance, we delve into the exploration of various mask strategies and subtle network architecture adjustments. The introduction of novel masking strateg ies and fine-tuned network modifications aims to capitalize on the contextual information within the input data, thereby elevating recognition accuracy. 3. Ablation Study and Parameter Analysis: To gain deeper insights into the contributions of individual components, we conduct ablation experiments. These experiments encompass an in-depth analysis of the impact of the chosen mask strategies and model hyperparameters on the overall performance. ### Experimental details 1. Recording Model Efficiency: Throughout our experimental process, we meticulously document the efficiency metrics of our models. This includes quantifying factors such as inference time and computational resources utilized during training and inference. By consistently recording these metrics, we ensure a comprehensive understanding of the trade-offs between model performance and computational requirements. 2. Incorporating Statistical Analysis: For each set of experimental results obtained from our model, we employ a robust statistical methodology. Specifically, we conduct multiple runs of each experiment and calculate the standard deviation to account for variability. This practice guarantees the reliability of our reported performance metrics by presenting the average performance along with its associated variation. 3. Visualization and Interpretability: Visualizations play a pivotal role in presenting our experimental findings. We ensure that each experiment's results are complemented by visual representations that aid in understanding and interpretation. 4. Fair Comparisons on the IAM Dataset: When evaluating our model's performance on the IAM dataset, we conduct a series of fair comparisons. Specifically, we implement experiments with different conditions: With Language Model (LM): We compare the model's performance with and without the integration of a language model, showcasing the impact of contextual language information on recognition accuracy. With Synthetic Data: Our experimentation includes the integration of synthetic data to get a fair comparisons with TrOCR and others model. With Pretrained Model: We examine the benefits of utilizing a pretrained model as an initialization point ### Experimental tables **IAM Results** | Method | VAL CER | VAL WER | TEST CER | TEST WER | Params|Training Time| Inference time | Memory| | -------- | ---------- | ---------- | -------- | -------- | -----|-----|-----| -----| | | Text | Text | | || |1D-LSTM(No LM)\cite{11} | 3.8[3.4-4.3] | 5.8[5.3-6.3] |13.5[12.1-14.9]| 18.4[17.4-19.5]| 9.3M|3.8min/epoch|| 10.5G Bs16| |1D-LSTM(word LM)\cite{11} | 2.9[2.5-3.3] | 4.4[3.9-4.8] |9.2[8.1-10.2]| 12.2[11.4-13.2]|9.3M |3.8min/epoch||10.5G Bs16| |LSTM/LSTM w/Attn \cite{2} | - | - |4.87| -|| |GCRNN(7-gram LM) \cite{3} | - | - |3.2|10.5|725K| |Transformer \cite{4} | - | - |7.62| 24.54|100M|202.5s/epoch |Transformer(Synth) \cite{4} | - | - |4.67| 15.45|100M|202.5s/epoch |FCN / GRU \cite{5}| ||6.4|19.6|| |TrOCR_small(Synth) \cite{6}| ||4.22||62M||348.4s 8.37 sentences/s 89.22 tokens/s| |TrOCR_base(Synth) \cite{6}| ||3.42||334M||633.7s 4.60 sentences/s 50.43 tokens/s| |TrOCR_large(Synth) \cite{6}| ||2.89||558M||666.8s 4.37 sentences/s 47.94 tokens/s| |GRCL(Internal) \cite{7}| - | - |4.0| 10.8|10.6M| |GFCN \cite{8} | - | - |7.99|28.61| 1.4M|13.75min/epoch| 74ms/sample| |FCN+LSTM \cite{10} | - | - |4.97|14.31|2.7M| |GFCN \cite{12} | - | - |4.9||3.4M| |OrigamiNet \cite{13} | - | - |4.8||115.3M| |CNN\BLSTM+CTC shortcut \cite{14}| ||4.62|15.89|| |CSSL-MHTR \cite{15}| - | - |4.9| -| 10.2M| |CNN-BGRU \cite{16}| - | - |5.79| 15.85| 885K| |Deform-CNN \cite{17}| - | - |4.6| 19.3| | |light Transformer \cite{20}| - | - |5.7| 18.86| 6.9M| |light Transformer(Synth) \cite{20}| - | - |4.76| 16.31| 6.9M| |Span \cite{21}| - | - |4.82| 18.17| 19.2M| ||| |Ours | - | - |-| -| -| ||| |Ours with Synth | - | - |-| -| -| ||| |Ours with LM | - | - |-| -| -| ||| |Ours with pretrained | - | - |-| -| -| ||| **READ2016 Results** | Method | VAL CER | VAL WER | TEST CER | TEST WER | Params|Training Time| Inference time | Memory| | -------- | -------- | -------- | -------- | -------- | -----|-----|-----| -----| | | Text | Text | | || |CNN+BLSTM \cite{2} | - | - |4.66| -| | |CNN+RNN \cite{ICFHR2016} | - | - |5.1|21.1| | |FCN+LSTM \cite{10} | - | - |4.1| 16.29| | |FCN / Transformer decoder \cite{9}| - | - |4.1| 17.64|7.6M| |Span \cite{21}| - | - |4.56| 21.07| 19.2M| ||| | Ours | | | | || **LAM Results** | Method | VAL CER | VAL WER | TEST CER | TEST WER | Params|Training Time| Inference time | Memory| | -------- | -------- | -------- | -------- | -------- | -----|-----|-----| -----| | | Text | Text | | || |1D-LSTM\cite{11} (No LM)| - |3.7|-| 12.3|9.3M |3.8min/epoch||10.5G Bs16| |1D-LSTM (w/ DefConv) \cite{17} | - | - |3.5|11.6|9.6M | |CRNN (w/ DefConv) \cite{17} | - | - |3.3|11.3|18.5M| |GFCN \cite{8} | - | - |5.2| 18.5| 1.4M| |OrigamiNet \cite{13} | - | - |3.0|11.0|115.3M| |CSSL-MHTR \cite{15}| - | - |5.1| -| 10.2M| |Transformer \cite{4}| - | - |10.2|22.0| 54.7M| |TrOCR \cite{6}| - | - |3.6|11.6| 385.0M| | Ours | | | | || **HKR Results** | Method | VAL CER | VAL WER | TEST CER | TEST WER | Params|Training Time| Inference time | Memory| | -------- | -------- | -------- | -------- | -------- | -----|-----|-----| -----| |1D-LSTM\cite{11} (No LM)| - |-|Test1(43.4) Test2(54.7)| Test1(76.8) Test2(82.9)| 9.6M|3.8min/epoch||10.5G Bs16| |GCRNN \cite{3} | - | - |Test1(16.1)Test2(10.1)|Test1(59.6) Test2(37.4)|728K | |G-CNN-BGRU \cite{25} | - | - |Test1(4.13) Test2(6.31)|Test1(18.91) Test2(23.69)|885K| |StackMix \cite{26} | - | - |3.49 ± 0.08|13.0 ± 0.3|| |CSSL-MHTR \cite{15}| - | - |2.9| -| 10.2M| | Ours | - | - | | | | ### Visualization results(Todo)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully