cd /opt/kaldi/egs/sevil/
conda activate base
python ./demo/main.py
There are 2 directories that are required to preprocess the data for training/inference:
cd /opt/kaldi/egs/sevil
/prepare contains the code to download the data from s3 and preprocess it. There is users.txt file which is the id of users that are extracted from the s3.
./data/local. After the preprocess is done we need to have corpus.txt file, which is the collection of all possible inputs of ASR. Also we need to update ./data/local/dict/lexicon.txt file with all the word tokens in corpus.txt.