Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance

# Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance ## Introduction **So, what are the differences between performer's interpretation of a given music score** * Subtle tempo changes between phrases or within phrases. * Dynamics/Articulation/Loudness/Texture *(!?)* of each note. Musical expressions are annotated in score by musical terminologies, like *Cabtabile*, *Giocoso*, or *Wondering about yourself*. **Need to model musical expressive preformance with computational methods.** * Rule-based approaches in ancient time: * Gaussian Process * Kalman Filter * Bayesion Network * Conditional Random Fields * More recent works: * RNN **Then we got to define the input structure of neural networks** 1. Use 1D sequential structure, where each note is ordered with time and pitch * Can't to handle polyphony. 2. Use 2D piano rolls and regard piano rolls as images. * Much higher dimensions * Thirty-second note, Tuplet → High time resolution :( Propose a model based on **Graph Neural Network**, and model a music score as nodes(notes) and edges(musical relations between notes). ## Networks ### Gated Graph Neural Network * A graph: $G = (\mathbf{\nu}, \varepsilon )$ * $\nu$ : Notes. * $\varepsilon$ : Edges. We define six types of edges. ![](https://i.imgur.com/Nb23JNR.jpg) 1. *Next* : Connect to the following notes. 2. *Rest* : Connect to the notes right after the rest ends. 3. *Onest* : Connect to notes that begin together. 4. *Sustain* : Connect to notes that appear between *Onset* and *Next*. 5. *Voice* : !? 6. *Slur* : !? * Use the propogation rule of GRU ![](https://i.imgur.com/MM0aDNY.jpg) ### Hierarchical Attention RNN * Why using HAN? 1. Hard to define $h_{t}^{0}$ when several notes played at the same time. 2. In RNN, $h_{t}$ includes informations in previous states $h_{0}...h_{t}$, while this assumption failed in the case of GNN. 3. Attention method saves the world. ![](https://i.imgur.com/yandrti.jpg) Compute measure vector using **multi-head attention**. Slice a normal attention model. <div style="text-align:center" markdown="1"> <img src="https://i.imgur.com/OchvXNJ.jpg"> </div> <p style="text-align: center"><font size="2" color="grey"><i>Attention Model.</i></font></p> <div style="text-align:center" markdown="1"> <img src="https://i.imgur.com/SPm1frP.jpg"> </div> <p style="text-align: center"><font size="2" color="grey"><i>Multi-head attention Model with cute equations.</i></font></p> Use measure vectors as inputs to a higher-level RNN to generate measure-level representations. ### Iterative Sequential Graph Network <div style="text-align:center" markdown="1"> <img src="https://i.imgur.com/oGvl2YP.jpg"> </div> <p style="text-align: center"><font size="2" color="grey"><i></i></font></p> Consider higher level hidden state when calculating lower level hiddeen state, so the model can learn the hidden state of a note with higher-level context, like phrases or measures. Concate measure-level representation with note-level data. ## Modules <div style="text-align:center" markdown="1"> <img src="https://i.imgur.com/NQ4BPqP.jpg"> </div> <p style="text-align: center"><font size="2" color="grey"><i>The confusing modules.</i></font></p> ## Experiment ### Data * **Input** : Embedded note-level feature, including informations about pitch, duration of note, amd dynamic marking. * **Output** : Tempo, MIDI velocity, onset deviation, articulation, and features to handle padal. ### Evaluation * **HAN** : Score encoder + performance encoder + performance decoder. * LSTM : Note-wise and voice-vise LSTMs for note-level representations. * HAN : Beat-level and measure-level representation * **BL** : Remove HANs and voice-wise LSTM. * **G-HAN** : Replace LSTMs with GGNN. <div style="text-align:center" markdown="1"> <img src="https://i.imgur.com/yGtV3Qy.jpg"> </div><p style="text-align: center"><font size="2" color="grey"><i>MSE and KLD</i></font></p> <div style="text-align:center" markdown="1"> <img src="https://i.imgur.com/kXuie1i.jpg"> </div><p style="text-align: center"><font size="2" color="grey"><i>Correlation.</i></font></p> <div style="text-align:center" markdown="1"> <img src="https://i.imgur.com/BGhLcKz.jpg"> </div><p style="text-align: center"><font size="2" color="grey"><i>Listening test.</i></font></p>