MAAI reading - HackMD

MAAI reading === ## Music generation #### LEARNING A LATENT SPACE OF MULTITRACK MEASURES - Encoder: Two layer of bidirectional LSTM - State2Latent: 2 FC - Decoder: Two layer of unidirectional LSTM #### Chord2Vec: Learning Musical Chord Embeddings - Using Bilinear, auto-regressive and seq2seq model to embed a chord by predicting the given chord's content. #### [ISMIR 2017] Generating Nontrivial Melodies for Music as a Service - Conditional VAE (condition on the chord progression) - Split the melody/chord using handcrafted metric #### Song From PI: a musically plausible network for pop music generation - Simple hierarchy: stacked LSTM with higher level outputting chords and bottom level outputting keys. - Cluster chords, drum patterns and so on into clusters. - Some extension (applications). #### Cosiatec and Siateccompress: Pattern Discovery by Geometric Compression - Is used in MorpheuS - Uses a shift vector $v$ to group patterns ($\{p|p\in D\land p+v\in D\}$). Extract the best pattern (with a handcrafted metric) each time #### MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions - Conditional CNN GAN. - Uses feature matching to control the creativity. - Lots of pre-processing. #### MorpheuS: generating structured music with constrained patterns and tension - Tension model in Spiral Array - cloud diameter - cloud momentum - tensile strain - Combination optimization with heuristic solver (VNS) #### Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and translation #### Deep Learning for music #### C-RNN-GAN: Continuous recurrent neural networks with adversarial training #### Tuning Recurrent Neural Networks with Reinforcement Learning ## Sequence modeling #### Improved variational inference with inverse autoregressive flow - Normalizing flow with Inverse autoregressive flow. - A type of variant inference (normalizing flow) that can handles data with high dimension. #### Learning the base distribution in implicit generative models - Two stage training: autoencoder, encoded-space. - Confusing formula (5)(6)(7). Unclear definition of $p_{\phi}^0(\cdot)$. #### Unsupervised Learning of Sequence Representation by Auto-encoders - Use seq2seq model to capture the holistic feature, use the CharRNN model to capture the local feature. - Shared LSTM module as encoder for both models and decoder for CharRNN. - Use stop signal to keep track the time-step. #### Dilated RNN #### https://github.com/umbrellabeach/music-generation-with-DL ## Embedding #### http://ruder.io/word-embeddings-2017/ - "subword" - ConceptNet - ConceptNet 5.5: An Open Multilingual Graph of General Knowledge - Multi-lingual - A Survey of Cross-lingual Word Embedding Models Sebastian #### http://ruder.io/word-embeddings-1/ - Training embedding is of high computational complexity when there are a lot s of elements. - Embedding trained along with the model can be task-specific. - The second last layer is actually a kind of embedding for the output word. But with different embedding of the input layer. - C&W model - Replace probability with score and use hinge loss as the loss function - Using the context to predict the score of the middle word. Only takes previous words. - Word2vec - no non-linearity - no deep structure - more context - A lot of training strategies - Takes previous and the next context. - CBOW - Using the context to predict the center word - No orders in the context. - Skip-gram - Using the center word to predict the context. #### http://ruder.io/word-embeddings-softmax/index.html - To solve the overhead brought by the last decision layer. - Sampling - Notice: music elements have limited number of objects. So we don't have to accelerate the softmaxl layer. - Hierarchical Softmax (H-softmax) - Softmax as a sequence of softmax #### A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning - Use TDNN and lookup table to train the NLP model - Window approach might hurt the long-term dependencies. - Embedding is trained along with the entire model.