MuseGAN Paper - HackMD

# MuseGAN Paper ###### tags: `Meeting` `museGAN` ###### editor: Yu-Chun-Hung ###### paper source : [Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation](https://salu133445.github.io/bmusegan/pdf/bmusegan-ismir2018-paper.pdf) --- ## abstract 1. deep convolutional GAN * generate music by binary valued time-pitch matrices 2. **hard thresholding(HT) or Bernoulli sampling (BS)** * hard thresholding : 界線值 * Bernoulli sampling : 2項分布 3. the generator and the discriminator are pretrained. * using binary neurons instead of HT or BS indeed leads to better 4. deterministic binary neurons perform better than stochastic(=random) ones ## Introduction 1. data: multi-track piano-roll, regardless of tempo * challenging : 大量的possible active notes per time step and the involvement of multiple instruments * 不像旋律或和弦進行，可以看作一個序列音符/和弦事件，並由RNN建模 * ()RNN是擅長學習音樂的時間依賴性，CNNs) are usually considered better at learning **local patterns** * **five-level Likert scale** : 音樂評分機制 2. 有幾種方法可以改進這項先前的工作。 * 二值化的方法很容易導致過度碎片化的音符 * binarization of the output of the generator G in GAN is done only at test time not at training time (see Section 2.1 for a brief introduction of GAN). * the input to the discriminator D in GAN 有效地減少了模型空間 ## Background u $\rightarrow$ ceil function * DBN: deterministic binary neuron $DBN(x) = u(\sigma(x) - 0.5)$ * SBN: stochastic binary neuron $SBN(x) = u(\sigma(x) - v), v \sim U[0, 1]$ DBN is non-differentable, computing exact gradient for SBN is intractable ![](https://i.imgur.com/8hmnV7Y.png) ![](https://i.imgur.com/vTgV8ej.png) ## Proposed Model Refiner $R$: refine real-valued $\hat{x} = G(z)$ into binary type $\tilde{x}$, between generator and discriminator pretrain G and D, and then traind R with D (fixing G) * refiner : real-value to binary. * (a) raw predictions : 經過generator產生出來的，沒有經過任何處理 * (b) pretrained (+BS) (*c*) pretrained (+HT) (d) proposed (+SBNs) (d) proposed (+DBNs) : 用(a)做二值化分2階段訓練: * 第一階段為 Generator 與 Discriminstor 之 Pre-train * 第二階段為 Refiner 與 Discriminator 訓練及調整 ## Analysis Qualified Note Rate (QN): 計算符合長度的音符，若太小則表示太分散 Polyphonicity (PP): 在一時間內同時有多音之比例 Tonal Distance (TD): 調性的距離 * training strategies * joint: 第一階段先訓練 G, R，第二階段再加上 D 一起訓練 * end-to-end: 第一階段 G, R, D 三個共同訓練 ## 變奏 https://towardsdatascience.com/ai-music-generation-lead-sheet-composition-and-arrangement-b984208f8519 https://towardsdatascience.com/ai-music-generation-ii-lead-sheet-variation-1884133e4f1 https://github.com/liuhaumin/LeadsheetVAE #Run inference from a pretrained model ./scripts/run_inference.sh "./exp/default/" **"0"** :第0個gpu