MAAI reading
===
## Music generation
#### LEARNING A LATENT SPACE OF MULTITRACK MEASURES
- Encoder: Two layer of bidirectional LSTM
- State2Latent: 2 FC
- Decoder: Two layer of unidirectional LSTM
#### Chord2Vec: Learning Musical Chord Embeddings
- Using Bilinear, auto-regressive and seq2seq model to embed a chord by predicting the given chord's content.
#### [ISMIR 2017] Generating Nontrivial Melodies for Music as a Service
- Conditional VAE (condition on the chord progression)
- Split the melody/chord using handcrafted metric
#### Song From PI: a musically plausible network for pop music generation
- Simple hierarchy: stacked LSTM with higher level outputting chords and bottom level outputting keys.
- Cluster chords, drum patterns and so on into clusters.
- Some extension (applications).
#### Cosiatec and Siateccompress: Pattern Discovery by Geometric Compression
- Is used in MorpheuS
- Uses a shift vector $v$ to group patterns ($\{p|p\in D\land p+v\in D\}$). Extract the best pattern (with a handcrafted metric) each time
#### MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions
- Conditional CNN GAN.
- Uses feature matching to control the creativity.
- Lots of pre-processing.
#### MorpheuS: generating structured music with constrained patterns and tension
- Tension model in Spiral Array
- cloud diameter
- cloud momentum
- tensile strain
- Combination optimization with heuristic solver (VNS)
#### Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and translation
#### Deep Learning for music
#### C-RNN-GAN: Continuous recurrent neural networks with adversarial training
#### Tuning Recurrent Neural Networks with Reinforcement Learning
## Sequence modeling
#### Improved variational inference with inverse autoregressive flow
- Normalizing flow with Inverse autoregressive flow.
- A type of variant inference (normalizing flow) that can handles data with high dimension.
#### Learning the base distribution in implicit generative models
- Two stage training: autoencoder, encoded-space.
- Confusing formula (5)(6)(7). Unclear definition of $p_{\phi}^0(\cdot)$.
#### Unsupervised Learning of Sequence Representation by Auto-encoders
- Use seq2seq model to capture the holistic feature, use the CharRNN model to capture the local feature.
- Shared LSTM module as encoder for both models and decoder for CharRNN.
- Use stop signal to keep track the time-step.
#### Dilated RNN
#### https://github.com/umbrellabeach/music-generation-with-DL
## Embedding
#### http://ruder.io/word-embeddings-2017/
- "subword"
- ConceptNet
- ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
- Multi-lingual
- A Survey of Cross-lingual Word Embedding Models Sebastian
#### http://ruder.io/word-embeddings-1/
- Training embedding is of high computational complexity when there are a lot s of elements.
- Embedding trained along with the model can be task-specific.
- The second last layer is actually a kind of embedding for the output word. But with different embedding of the input layer.
- C&W model
- Replace probability with score and use hinge loss as the loss function
- Using the context to predict the score of the middle word. Only takes previous words.
- Word2vec
- no non-linearity
- no deep structure
- more context
- A lot of training strategies
- Takes previous and the next context.
- CBOW
- Using the context to predict the center word
- No orders in the context.
- Skip-gram
- Using the center word to predict the context.
#### http://ruder.io/word-embeddings-softmax/index.html
- To solve the overhead brought by the last decision layer.
- Sampling
- Notice: music elements have limited number of objects. So we don't have to accelerate the softmaxl layer.
- Hierarchical Softmax (H-softmax)
- Softmax as a sequence of softmax
#### A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning
- Use TDNN and lookup table to train the NLP model
- Window approach might hurt the long-term dependencies.
- Embedding is trained along with the entire model.