聲音生成 - HackMD

### BreezyVoice https://github.com/mtkresearch/BreezyVoice * DEMO https://huggingface.co/spaces/Splend1dchan/BreezyVoice-Playground 1. 創建環境 ``` bash conda create --name breezy python=3.10 conda activate breezy ``` * requirements.txt更改(Linux->windows) ``` bash --extra-index-url https://download.pytorch.org/whl/cu121 conformer==0.3.2 diffusers==0.32.0 gdown==5.1.0 gradio==4.32.2 grpcio==1.57.0 grpcio-tools==1.57.0 hydra-core==1.3.2 HyperPyYAML==1.2.2 inflect==7.3.1 librosa==0.10.2 lightning==2.2.4 matplotlib==3.7.5 networkx==3.1 omegaconf==2.3.0 onnxruntime-gpu==1.16.0 # Windows 版 openai-whisper==20231117 protobuf==4.25 pydantic==2.7.0 rich==13.7.1 soundfile==0.12.1 tensorboard==2.14.0 torch==2.3.1 torchaudio==2.3.1 wget==3.2 fastapi==0.111.0 fastapi-cli==0.0.4 opencc-python-reimplemented g2pw pyarrow datasets ``` 2. 安裝套件 3. 執行步驟 https://blog.csdn.net/qq_43907505/article/details/144860826 ``` bash conda install -c conda-forge pynini=2.1.6 pip install WeTextProcessing --no-deps pip install -r requirements.txt pip install https://github.com/daswer123/deepspeed-windows/releases/download/13.1/deepspeed-0.13.1+cu121-cp310-cp310-win_amd64.whl ``` 4. 執行 ``` bash python single_inference.py --content_to_synthesize "我的思考，我的聲音，還[:ㄏㄞ2]有我的形象，不知道感覺如何，科技進步實在是有夠快，唉，時代在變" --speaker_prompt_audio_path "./data/lee.wav" ``` --- ### ditto-talkinghead https://github.com/justinjohn0306/ditto-talkinghead-windows * 安裝過程 ``` bash git clone https://github.com/justinjohn0306/ditto-talkinghead-windows cd ditto-talkinghead conda activate ditto conda env create -f environment.yaml conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia ``` * 多需安裝 https://github.com/antgroup/ditto-talkinghead/issues/4 https://blog.csdn.net/qq_42681787/article/details/134577838 (TensorRT/.dll檔案) * 安裝模型 https://huggingface.co/justinjohn-03/ditto-talkinghead-windows/tree/main/ditto_trt_3090 * 指令 ``` bash conda activate ditto python inference.py --data_root "./checkpoints/ditto_trt_3090" --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" --audio_path "./example/audio.wav" --source_path "./example/image.png" --output_path "./tmp/result.mp4" ```