MAAI reading

Music generation

LEARNING A LATENT SPACE OF MULTITRACK MEASURES

  • Encoder: Two layer of bidirectional LSTM
  • State2Latent: 2 FC
  • Decoder: Two layer of unidirectional LSTM

Chord2Vec: Learning Musical Chord Embeddings

  • Using Bilinear, auto-regressive and seq2seq model to embed a chord by predicting the given chord's content.

[ISMIR 2017] Generating Nontrivial Melodies for Music as a Service

  • Conditional VAE (condition on the chord progression)
  • Split the melody/chord using handcrafted metric

Song From PI: a musically plausible network for pop music generation

  • Simple hierarchy: stacked LSTM with higher level outputting chords and bottom level outputting keys.
  • Cluster chords, drum patterns and so on into clusters.
  • Some extension (applications).

Cosiatec and Siateccompress: Pattern Discovery by Geometric Compression

  • Is used in MorpheuS
  • Uses a shift vector
    v
    to group patterns (
    {p|p∈D∧p+v∈D}
    ). Extract the best pattern (with a handcrafted metric) each time

MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions

  • Conditional CNN GAN.
  • Uses feature matching to control the creativity.
  • Lots of pre-processing.

MorpheuS: generating structured music with constrained patterns and tension

  • Tension model in Spiral Array
    • cloud diameter
    • cloud momentum
    • tensile strain
  • Combination optimization with heuristic solver (VNS)

Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and translation

Deep Learning for music

C-RNN-GAN: Continuous recurrent neural networks with adversarial training

Tuning Recurrent Neural Networks with Reinforcement Learning

Sequence modeling

Improved variational inference with inverse autoregressive flow

  • Normalizing flow with Inverse autoregressive flow.
  • A type of variant inference (normalizing flow) that can handles data with high dimension.

Learning the base distribution in implicit generative models

  • Two stage training: autoencoder, encoded-space.
  • Confusing formula (5)(6)(7). Unclear definition of
    pϕ0(⋅)
    .

Unsupervised Learning of Sequence Representation by Auto-encoders

  • Use seq2seq model to capture the holistic feature, use the CharRNN model to capture the local feature.
  • Shared LSTM module as encoder for both models and decoder for CharRNN.
  • Use stop signal to keep track the time-step.

Dilated RNN

https://github.com/umbrellabeach/music-generation-with-DL

Embedding

http://ruder.io/word-embeddings-2017/

  • "subword"
  • ConceptNet
    • ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
  • Multi-lingual
    • A Survey of Cross-lingual Word Embedding Models Sebastian

http://ruder.io/word-embeddings-1/

  • Training embedding is of high computational complexity when there are a lot s of elements.
  • Embedding trained along with the model can be task-specific.
  • The second last layer is actually a kind of embedding for the output word. But with different embedding of the input layer.
  • C&W model
    • Replace probability with score and use hinge loss as the loss function
    • Using the context to predict the score of the middle word. Only takes previous words.
  • Word2vec
    • no non-linearity
    • no deep structure
    • more context
    • A lot of training strategies
    • Takes previous and the next context.
    • CBOW
      • Using the context to predict the center word
      • No orders in the context.
    • Skip-gram
      • Using the center word to predict the context.

http://ruder.io/word-embeddings-softmax/index.html

  • To solve the overhead brought by the last decision layer.
  • Sampling
  • Notice: music elements have limited number of objects. So we don't have to accelerate the softmaxl layer.
  • Hierarchical Softmax (H-softmax)
    • Softmax as a sequence of softmax

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning

  • Use TDNN and lookup table to train the NLP model
  • Window approach might hurt the long-term dependencies.
  • Embedding is trained along with the entire model.