# ASR progress report(Wang, Yi-Ling) ## Server | Server | IP | |:------:|:---------------------------------:| | 1 | ssh lenny@140.113.170.49 -p 20203 | | 3 | ssh lenny@140.113.170.46 -p 10201 | | ML-public | [Click](https://drive.google.com/drive/folders/0B-10EC-cPLL1V2huNmFteUsxb2s?usp=sharing) | ## Running - [x] [SpeechTransformer](https://github.com/foamliu/Speech-Transformer) ICAASP2018 ![](https://i.imgur.com/eH03mux.png) ![](https://i.imgur.com/Umx9pxj.png) ## Pending - [ ] [RNN-T](https://github.com/foamliu/Speech-Transformer) ## Structure - [ ] Transofrmer - [ ] RNN-T ## Reference Code |Name|Language|Developer| |:--:|:------:|:-------:| |[Speech-Transformer](https://github.com/foamliu/Speech-Transformer)|Pytorch|-| |[Speech-Transformer](https://github.com/kaituoxu/Speech-Transformer)|Pytorch&Kaldi|-| |[Online-Speech-Recognition(RNN-T)](https://github.com/theblackcat102/Online-Speech-Recognition?fbclid=IwAR29aeBtzC0RL2SU2a3MDFzRpsMwssLeo9-IcRL6pkaK6wSH_4z_yr98_OQ)|Pytorch|-| ## PaperWithCode | Title| Paper| Code | Source | | -- | -- | --- | --| | Longformer: The Long-Document Transformer | [Link](https://arxiv.org/abs/2004.05150) | [Pytorch](https://github.com/allenai/longformer) | ---| | Linformer: Self-Attention with Linear Complexity | [Link](https://arxiv.org/abs/2004.05150) | [Pytorch1](https://github.com/kuixu/Linear-Multihead-Attention)、[Pytorch2](https://github.com/tatp22/linformer-pytorch) | Facebook| | Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition[Link](https://ieeexplore.ieee.org/document/8462506) | [Pytorch1](https://github.com/xingchensong/Speech-Transformer-plus-2DAttention)、[Pytorch2](https://github.com/sooftware/Speech-Transformer) | ICASSP 2018 | | Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention | [Link](https://paperswithcode.com/paper/transformers-are-rnns-fast-autoregressive) | [Pytorch1](https://github.com/idiap/fast-transformers)、[Pytorch2](https://github.com/lucidrains/linear-attention-transformer) | ICML 2020 | | Fast Transformers with Clustered Attention | [Link](https://arxiv.org/abs/2007.04825) | [Pytorch](https://github.com/idiap/fast-transformers) | --- | | Star-Transformer | [Link](https://arxiv.org/abs/1902.09113) | [Pytorch](https://github.com/liujiarik/nlp_clip_pytorch) | NAACL 2019 | | Lite Transformer with Long-Short Range Attention | [Link](https://arxiv.org/abs/2004.11886) | [Pytorch](https://github.com/mit-han-lab/lite-transformer)、 | ICLR 2020 | | Encoding word order in complex embeddings | [Link](https://openreview.net/forum?id=Hke-WTVtwr) | [Pytorch](https://github.com/iclr-complex-order/complex-order) | ICLR 2020 | | Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks | [Link](https://arxiv.org/abs/1810.00825) | [Pytorch](https://github.com/juho-lee/set_transformer) | ICML 2019 | | Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing | [Link](https://arxiv.org/abs/2006.03236)、[中文版](https://yam.gift/2020/10/13/Paper/2020-10-13-FunnelTransformer/) | [Pytorch](https://github.com/laiguokun/Funnel-Transformer) | Google、CMU | | Adaptive Attention Span in Transformers | [Link](https://arxiv.org/abs/1905.07799) | [Pytorch](https://github.com/facebookresearch/adaptive-span)|Facebook、ACL 2019 | | Transformer-XL | [Link](https://arxiv.org/abs/1901.02860) | [Pytorch](https://github.com/lucidrains/memory-transformer-xl)|ACL 2019 | ## Techinque |Name|Publisher|| |:--:|:------:|:-------:| |[SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779)|Interspeech 2019|?| |[Semantic Mask for Transformer based End-to-End Speech Recognition](https://arxiv.org/abs/1912.03010)|Microsoft Research|?|| |[Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks](https://arxiv.org/abs/1506.03099)|Google|?| |[State-of-the-art Speech Recognition With Sequence-to-Sequence Models](https://arxiv.org/abs/1712.01769)|Google|ICASSP2019| |[SYNTHESIZER: Rethinking Self-Attention in Transformer Models](https://arxiv.org/abs/2005.00743)]|Google|ICLR2021| |[CP-GAN: CONTEXT PYRAMID GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT](https://ieeexplore.ieee.org/document/9054060)]|[Group meeting](https://hackmd.io/m1vXeUafRNSxJDc862tEvw)|ICASSP 2020| ## Dataset | Corpus| Language | | ------| -------- | |[AI Shell](http://www.aishelltech.com/kysjcp)|Chinese| |[Librispeech](http://www.openslr.org/12)|English| |[WSJ](https://catalog.ldc.upenn.edu/LDC2000T43)|English| |[Switchboard](https://www.isip.piconepress.com/projects/switchboard/)|English| ## Benchmark Result: | Corpus| |-------| |[Librispeech](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean)| ## ICML - [ ] [Learning to Encode Position for Transformer with Continuous Dynamical Model](https://paperswithcode.com/paper/learning-to-encode-position-for-transformer?fbclid=IwAR0SQ1osziEur1dg1Df-nSEKYN4kgEkM4TaUPvxokCuKTpdjg5JihL35e6Y) - [ ] [Improving the Gating Mechanism of Recurrent Neural Network](https://paperswithcode.com/paper/improving-the-gating-mechanism-of-recurrent-1) - [ ] [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for) - [ ] [Stabilizing Transformers for Reinforcement Learning](https://paperswithcode.com/paper/stabilizing-transformers-for-reinforcement-1) ## Interspeech - [ ] [Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration](https://www.semanticscholar.org/paper/Improving-Transformer-Based-End-to-End-Speech-with-Karita-Soplin/ffe1416bcfde82f567dd280975bebcfeb4892298) - [ ] [Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition](https://www.semanticscholar.org/paper/Automatic-Spelling-Correction-with-Transformer-for-Zhang-Lei/da890cfca17afb36e034f6bb94b25ada8b6e902f) ## Git - [ ] [Transformer-based online speech recognition system with TensorFlow 2](https://github.com/georgesterpu/Taris) ## Reference(Net) - [ ] [Attention_reform1](https://www.mdeditor.tw/pl/pK7f/zh-tw) - [ ] [Attention_reform2](https://my.oschina.net/u/4594481/blog/4520663) - [ ] [ICLR 2020 Trend of Transformer](https://blog.csdn.net/hwaust2020/article/details/106454443/?utm_medium=distribute.pc_relevant.none-task-blog-title-2&spm=1001.2101.3001.4242) - [ ] [IBM_transformer](https://developer.ibm.com/zh/technologies/deep-learning/articles/ba-lo-transformer-design-and-build-efficient-timing-models/) ## Mix https://blog.csdn.net/qq_37236745/article/details/107352273?utm_medium=distribute.pc_relevant.none-task-blog-title-6&spm=1001.2101.3001.4242 ~~Old~~ --- --- --- --- --- --- ## ~~Releated Paper~~ |Name|Instution|Source| |----|---------|------| |[Listen, attend and spell: A neural network for large vocabulary conversational speech recognition](https://ieeexplore.ieee.org/document/7472621)|CMU Google| ICASSP2016| |[A Comparison of Sequence-to-Sequence Models for Speech Recognition](https://research.google/pubs/pub46169/)|Google| Interspeech 2017| |[A comparable study of modeling units for end-to-end Mandarin speech recognition](https://ieeexplore.ieee.org/document/8706661) |Didi|ISCSLP2018*| ## ~~Reference Code~~ |Name|Language|Developer| |:--:|:------:|:-------:| |[LAS_Mandarin_PyTorch](https://github.com/jackaduma/LAS_Mandarin_PyTorch)|Pytorch|LAS| |[End-to-end-ASR-Pytorch](https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch)|Pytorch|LAS| ## ~~LAS Structure~~ ![](https://i.imgur.com/OsZw05C.jpg) ## ~~Structure~~ - [ ] LAS(Listen, Attend and Spell)
{"metaMigratedAt":"2023-06-15T12:55:32.429Z","metaMigratedFrom":"Content","title":"ASR progress report(Wang, Yi-Ling)","breaks":true,"contributors":"[{\"id\":\"b73cf3a4-f139-417f-ac5e-47772fea9e7a\",\"add\":25851,\"del\":17418}]"}
    575 views