# Multi sample rate codec
## Step 1 : Resynthesize speech
1. Create a on HuggingFace
2. Get speech data from xxx
3. Clone [Codec-SUPERB](https://github.com/dlion168/AudioDecBenchmark/tree/superb_main) from this branch to resynthesize speech with codec. You need to write a dataset loading script at [here](https://github.com/voidful/Codec-SUPERB/tree/main/SoundCodec/dataset) to load data from disk. You may need to use [audiofolder](https://huggingface.co/docs/datasets/audio_load) to load data into huggingface dataset format. [Example 1](https://github.com/dlion168/AudioDecBenchmark/blob/superb_main/SoundCodec/dataset/PODCAST.py).[Example 2](https://github.com/dlion168/AudioDecBenchmark/blob/superb_main/SoundCodec/dataset/IMPROV.py). Note that some dataset with "train", "test" in their filenames may not be loaded correctly.
4. Useful scripts
```
pip install -r requirements.txt
pip install git+https://github.com/voidful/AudioDec.git
pip install git+https://github.com/voidful/descript-audio-codec.git
pip install encodec
pip install git+https://github.com/voidful/FunCodec.git
CUDA_VISIBLE_DEVICES=2 python dataset_creator.py --dataset PODCAST --push_to_hub
# check if # of loaded file is the same as folder
ls -1 | wc -l
```
5. Check the resynthesized speech quality by hearing some sample on [Emo-Codec](https://huggingface.co/Emo-Codec).
**Codecs to run**:
* SpeechTokenizer,
* DAC 16k, 24k, 44k,
* Encodec 24k all 5 models,
* Funcodec en_16k_nq32ds320 & en_16k_nq32ds640, zh_en_16k_nq32ds320 & zh_en_16k_nq32ds640,
* AudioDec 24k, 48k_uni
* AcademiCodec 16k_320d_large_uni, 24k
* LanguageCodec chinese_8nq, paper_8nq
* FAcodec
20 models in total
```
from datasets import load_dataset, Audio, Dataset
from functools import partial
def gen_from_iterable_dataset(iterable_ds):
yield from iterable_ds
def load_data():
dataset = load_dataset("Codec-SUPERB/fsd50k_synth", split="original", streaming=True)
dataset = Dataset.from_generator(partial(gen_from_iterable_dataset, dataset), features=dataset.features)
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
return dataset
```
## Step 2 : Evaluate performance of resynthesized speech
1. Run [benchmarking.py](https://github.com/dlion168/AudioDecBenchmark/blob/main/benchmarking.py) for all metrics. Note:
* Skip PESQ, STOI, F0corr for audio data
* Skip F0corr for speech data
## Paper writing
* When you want to modify the current content, please preserve and comment the original version
* For each paragraph, write your name above
* After completing each line, you should press 'Enter' for a new line
* To fulfill consistency, use Table~\ref{} and Figure~\ref{}
* Pay attention to duplicated references
* Pay attention to the \label when making a table and alleviate duplicated \labels
* When draw the table, can leverage Chatgpt with detailed prompts