ASR 開源 - HackMD

# ASR 開源 - [MASR](https://github.com/ieee820/masr) - [End2ENdASR](https://github.com/gentaiscool/end2end-asr-pytorch) - [fairseq](https://github.com/facebookresearch/fairseq) - [ASRT](https://github.com/nl8590687/ASRT_SpeechRecognition) 聲音=>羅馬拼音=>簡體中文 - [PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech) - [kaldi](https://kaldi-asr.org/) - [SpeechBrain](https://github.com/speechbrain/speechbrain/) - # 一些ASR的Colab - [SpeechBrain](https://colab.research.google.com/drive/1mhvu5eyEzNBBhNC6sknG-kgD2ZQ6NEyI?usp=sharing#scrollTo=sKem2vASoBYd) - [SpeechBrain ASR](https://colab.research.google.com/drive/1aFgzrUv3udM_gNJNUoLaHIm78QHtxdIz?usp=sharing#scrollTo=9To_-2fej2SA) - [Wav2Vec2-Large-XLSR-53-tw-gpt](https://colab.research.google.com/drive/1e_z5jQHYbO2YKEaUgzb1ww1WwiAyydAj?usp=sharing#scrollTo=pfl4sIUIHLVm) # Hugging face - [Wav2Vec2-Large-XLSR-53-tw-gpt](https://huggingface.co/voidful/wav2vec2-large-xlsr-53-tw-gpt) # Dataset - [Aishell](https://www.openslr.org/33/) (2.4 hours) - [Mozilla](https://commonvoice.mozilla.org/zh-CN/datasets) (30 hours) # 聲音處理 ### VAD - [python](https://github.com/marsbroshok/VAD-python) - [pyvad](https://pypi.org/project/pyvad/) ### 驗證規格 - 聲音格式： .wav - Sample rate: 16000 Hz - ASR 模型驗證 - 一小時音檔，每段大約二十秒去驗證準確度。 - WER為20%以下(一段句子中正確率為80%)。 - Model params: 28,768,256 - 8855MiB / 11264MiB ## Daily work - 6/22用本是直接一段文字下去做sentencepiece(訓練embedding切分文字以及字典)，目前嘗試先把一串文字作分詞，在去訓練embedding，目前還在訓練。