# ASR progress report(Wang, Yi-Ling)
## Server
| Server | IP |
|:------:|:---------------------------------:|
| 1 | ssh lenny@140.113.170.49 -p 20203 |
| 3 | ssh lenny@140.113.170.46 -p 10201 |
| ML-public | [Click](https://drive.google.com/drive/folders/0B-10EC-cPLL1V2huNmFteUsxb2s?usp=sharing) |
## Running
- [x] [SpeechTransformer](https://github.com/foamliu/Speech-Transformer)
ICAASP2018


## Pending
- [ ] [RNN-T](https://github.com/foamliu/Speech-Transformer)
## Structure
- [ ] Transofrmer
- [ ] RNN-T
## Reference Code
|Name|Language|Developer|
|:--:|:------:|:-------:|
|[Speech-Transformer](https://github.com/foamliu/Speech-Transformer)|Pytorch|-|
|[Speech-Transformer](https://github.com/kaituoxu/Speech-Transformer)|Pytorch&Kaldi|-|
|[Online-Speech-Recognition(RNN-T)](https://github.com/theblackcat102/Online-Speech-Recognition?fbclid=IwAR29aeBtzC0RL2SU2a3MDFzRpsMwssLeo9-IcRL6pkaK6wSH_4z_yr98_OQ)|Pytorch|-|
## PaperWithCode
| Title| Paper| Code | Source |
| -- | -- | --- | --|
| Longformer: The Long-Document Transformer | [Link](https://arxiv.org/abs/2004.05150) | [Pytorch](https://github.com/allenai/longformer) | ---|
| Linformer: Self-Attention with Linear Complexity | [Link](https://arxiv.org/abs/2004.05150) | [Pytorch1](https://github.com/kuixu/Linear-Multihead-Attention)、[Pytorch2](https://github.com/tatp22/linformer-pytorch) | Facebook|
| Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition[Link](https://ieeexplore.ieee.org/document/8462506) | [Pytorch1](https://github.com/xingchensong/Speech-Transformer-plus-2DAttention)、[Pytorch2](https://github.com/sooftware/Speech-Transformer) | ICASSP 2018 |
| Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention | [Link](https://paperswithcode.com/paper/transformers-are-rnns-fast-autoregressive) | [Pytorch1](https://github.com/idiap/fast-transformers)、[Pytorch2](https://github.com/lucidrains/linear-attention-transformer) | ICML 2020 |
| Fast Transformers with Clustered Attention | [Link](https://arxiv.org/abs/2007.04825) | [Pytorch](https://github.com/idiap/fast-transformers) | --- |
| Star-Transformer | [Link](https://arxiv.org/abs/1902.09113) | [Pytorch](https://github.com/liujiarik/nlp_clip_pytorch) | NAACL 2019 |
| Lite Transformer with Long-Short Range Attention | [Link](https://arxiv.org/abs/2004.11886) | [Pytorch](https://github.com/mit-han-lab/lite-transformer)、 | ICLR 2020 |
| Encoding word order in complex embeddings | [Link](https://openreview.net/forum?id=Hke-WTVtwr) | [Pytorch](https://github.com/iclr-complex-order/complex-order) | ICLR 2020 |
| Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks | [Link](https://arxiv.org/abs/1810.00825) | [Pytorch](https://github.com/juho-lee/set_transformer) | ICML 2019 |
| Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing | [Link](https://arxiv.org/abs/2006.03236)、[中文版](https://yam.gift/2020/10/13/Paper/2020-10-13-FunnelTransformer/) | [Pytorch](https://github.com/laiguokun/Funnel-Transformer) | Google、CMU |
| Adaptive Attention Span in Transformers | [Link](https://arxiv.org/abs/1905.07799) | [Pytorch](https://github.com/facebookresearch/adaptive-span)|Facebook、ACL 2019 |
| Transformer-XL | [Link](https://arxiv.org/abs/1901.02860) | [Pytorch](https://github.com/lucidrains/memory-transformer-xl)|ACL 2019 |
## Techinque
|Name|Publisher||
|:--:|:------:|:-------:|
|[SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779)|Interspeech 2019|?|
|[Semantic Mask for Transformer based End-to-End Speech Recognition](https://arxiv.org/abs/1912.03010)|Microsoft Research|?||
|[Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks](https://arxiv.org/abs/1506.03099)|Google|?|
|[State-of-the-art Speech Recognition With Sequence-to-Sequence Models](https://arxiv.org/abs/1712.01769)|Google|ICASSP2019|
|[SYNTHESIZER: Rethinking Self-Attention in Transformer Models](https://arxiv.org/abs/2005.00743)]|Google|ICLR2021|
|[CP-GAN: CONTEXT PYRAMID GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT](https://ieeexplore.ieee.org/document/9054060)]|[Group meeting](https://hackmd.io/m1vXeUafRNSxJDc862tEvw)|ICASSP 2020|
## Dataset
| Corpus| Language |
| ------| -------- |
|[AI Shell](http://www.aishelltech.com/kysjcp)|Chinese|
|[Librispeech](http://www.openslr.org/12)|English|
|[WSJ](https://catalog.ldc.upenn.edu/LDC2000T43)|English|
|[Switchboard](https://www.isip.piconepress.com/projects/switchboard/)|English|
## Benchmark Result:
| Corpus|
|-------|
|[Librispeech](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean)|
## ICML
- [ ] [Learning to Encode Position for Transformer with Continuous Dynamical Model](https://paperswithcode.com/paper/learning-to-encode-position-for-transformer?fbclid=IwAR0SQ1osziEur1dg1Df-nSEKYN4kgEkM4TaUPvxokCuKTpdjg5JihL35e6Y)
- [ ] [Improving the Gating Mechanism of Recurrent Neural Network](https://paperswithcode.com/paper/improving-the-gating-mechanism-of-recurrent-1)
- [ ] [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for)
- [ ] [Stabilizing Transformers for Reinforcement Learning](https://paperswithcode.com/paper/stabilizing-transformers-for-reinforcement-1)
## Interspeech
- [ ] [Improving Transformer-based End-to-End Speech Recognition with
Connectionist Temporal Classification and Language Model Integration](https://www.semanticscholar.org/paper/Improving-Transformer-Based-End-to-End-Speech-with-Karita-Soplin/ffe1416bcfde82f567dd280975bebcfeb4892298)
- [ ] [Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition](https://www.semanticscholar.org/paper/Automatic-Spelling-Correction-with-Transformer-for-Zhang-Lei/da890cfca17afb36e034f6bb94b25ada8b6e902f)
## Git
- [ ] [Transformer-based online speech recognition system with TensorFlow 2](https://github.com/georgesterpu/Taris)
## Reference(Net)
- [ ] [Attention_reform1](https://www.mdeditor.tw/pl/pK7f/zh-tw)
- [ ] [Attention_reform2](https://my.oschina.net/u/4594481/blog/4520663)
- [ ] [ICLR 2020 Trend of Transformer](https://blog.csdn.net/hwaust2020/article/details/106454443/?utm_medium=distribute.pc_relevant.none-task-blog-title-2&spm=1001.2101.3001.4242)
- [ ] [IBM_transformer](https://developer.ibm.com/zh/technologies/deep-learning/articles/ba-lo-transformer-design-and-build-efficient-timing-models/)
## Mix
https://blog.csdn.net/qq_37236745/article/details/107352273?utm_medium=distribute.pc_relevant.none-task-blog-title-6&spm=1001.2101.3001.4242
~~Old~~
---
---
---
---
---
---
## ~~Releated Paper~~
|Name|Instution|Source|
|----|---------|------|
|[Listen, attend and spell: A neural network for large vocabulary conversational speech recognition](https://ieeexplore.ieee.org/document/7472621)|CMU Google| ICASSP2016|
|[A Comparison of Sequence-to-Sequence Models for Speech Recognition](https://research.google/pubs/pub46169/)|Google| Interspeech 2017|
|[A comparable study of modeling units for end-to-end Mandarin speech recognition](https://ieeexplore.ieee.org/document/8706661) |Didi|ISCSLP2018*|
## ~~Reference Code~~
|Name|Language|Developer|
|:--:|:------:|:-------:|
|[LAS_Mandarin_PyTorch](https://github.com/jackaduma/LAS_Mandarin_PyTorch)|Pytorch|LAS|
|[End-to-end-ASR-Pytorch](https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch)|Pytorch|LAS|
## ~~LAS Structure~~

## ~~Structure~~
- [ ] LAS(Listen, Attend and Spell)
{"metaMigratedAt":"2023-06-15T12:55:32.429Z","metaMigratedFrom":"Content","title":"ASR progress report(Wang, Yi-Ling)","breaks":true,"contributors":"[{\"id\":\"b73cf3a4-f139-417f-ac5e-47772fea9e7a\",\"add\":25851,\"del\":17418}]"}