# BIIC x C-Media Report (Voice Conversion) ###### tags: `Internship`,`Machine Learning` ###### Date: 202207 [TOC] ## Problem Definition : Voice Conversion Deep Learning Modeling ![](https://i.imgur.com/be3M4r1.png) ![](https://i.imgur.com/yV4YGEy.png) ## Datasets ![](https://i.imgur.com/wVY3TmU.png) ![](https://i.imgur.com/Nl1J3kR.png) ## Methodology ### Overview 1. Autoencoder • Ada-VC: Chou, Ju-chieh, Cheng-chieh Yeh, and Hung-yi Lee. "One-shot voice conversion by separating speaker and content representations with instance normalization." arXiv preprint arXiv:1904.05742 (2019). • FragmentVC: Lin, Yist Y., et al. "Fragmentvc: Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. 2. StarGAN • StarGAN-VC: Kameoka, Hirokazu, et al. "Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks." 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2018. • StarGAN-VC-2 Kaneko, Takuhiro, et al. "StarGAN-VC2: Rethinking conditional methods for StarGAN-based voice conversion." arXiv preprint arXiv:1907.12279 (2019). ### Autoencoder – Ada-VC ![](https://i.imgur.com/H8y9gA1.png) ![](https://i.imgur.com/PqHf5em.png) ### StarGAN – VC ![](https://i.imgur.com/4WYqAcN.png) ![](https://i.imgur.com/oe2KonH.png) ![](https://i.imgur.com/Bm61rzY.png) ### StarGAN – VC2 ![](https://i.imgur.com/WutGmP9.png) ![](https://i.imgur.com/iQbpw78.png) ![](https://i.imgur.com/gDbjIdB.png) ![](https://i.imgur.com/WRjJR4B.png) ![](https://i.imgur.com/HJDs00U.png) #### Cold down mechanism ![](https://i.imgur.com/oKOpCiY.png) ## Evaluation metrics MOS, speaker embedding cosine similarity ![](https://i.imgur.com/c40Ghqp.png) ### Mean Opinion Score ![](https://i.imgur.com/pVpjZTY.png) ![](https://i.imgur.com/zV0C5xi.png) ![](https://i.imgur.com/aZVNLj0.png) ### Speaker embedding cosine similarity ![](https://i.imgur.com/vevJSB1.png) ![](https://i.imgur.com/0I4xdVP.png) ![](https://i.imgur.com/0CEuQ7s.png) ## Experiment Setup ![](https://i.imgur.com/LOqA7V5.png) ## Result ![](https://i.imgur.com/LrvzSlD.png) ![](https://i.imgur.com/iGEd87K.png)