# BIIC x C-Media Report (Voice Conversion)
###### tags: `Internship`,`Machine Learning`
###### Date: 202207
[TOC]
## Problem Definition : Voice Conversion Deep Learning Modeling


## Datasets


## Methodology
### Overview
1. Autoencoder
• Ada-VC:
Chou, Ju-chieh, Cheng-chieh Yeh, and Hung-yi Lee. "One-shot voice conversion by separating speaker and content representations with instance normalization." arXiv preprint arXiv:1904.05742 (2019).
• FragmentVC:
Lin, Yist Y., et al. "Fragmentvc: Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.
2. StarGAN
• StarGAN-VC:
Kameoka, Hirokazu, et al. "Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks." 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2018.
• StarGAN-VC-2
Kaneko, Takuhiro, et al. "StarGAN-VC2: Rethinking conditional methods for StarGAN-based voice conversion." arXiv preprint arXiv:1907.12279 (2019).
### Autoencoder – Ada-VC


### StarGAN – VC



### StarGAN – VC2





#### Cold down mechanism

## Evaluation metrics
MOS, speaker embedding cosine similarity

### Mean Opinion Score



### Speaker embedding cosine similarity



## Experiment Setup

## Result

