# End-to-End ASR
### Get Data
Data Download Link: https://drive.google.com/file/d/1daFU8tPPUyhN7Fc6JUTohEfHXIn6ZDgq
extraction: tar zxvf DLHLP.tar.gz if you encounter errors like Cannot change ownership to uid, please add --no-same-owner.
Data fields
### train/
audio files from 000001.wav to 008000.wav.
transcription text train/bopomo.trans.txt
format: each line contains <index><space><ㄅㄆㄇㄈ transcription>
### dev/
audio files from 008001.wav to 009000.wav.
transcription text dev/bopomo.trans.txt
format: each line contains <index><space><ㄅㄆㄇㄈ transcription>
### test/
audio files from 009001.wav to 010000.wav.
transcription text test/bopomo.trans.txt is still attached for consistency, but all transcription are replaced with fake transcription.
format: each line contains <index><space><ㄅㄆㄇㄈ fake transcription>
sample_submission.txt
id - an audio file id.
answer - the bopomofo sequence of the according wav audio file.