# Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance
## Introduction
**So, what are the differences between performer's interpretation of a given music score**
* Subtle tempo changes between phrases or within phrases.
* Dynamics/Articulation/Loudness/Texture *(!?)* of each note.
Musical expressions are annotated in score by musical terminologies, like *Cabtabile*, *Giocoso*, or *Wondering about yourself*.
**Need to model musical expressive preformance with computational methods.**
* Rule-based approaches in ancient time:
* Gaussian Process
* Kalman Filter
* Bayesion Network
* Conditional Random Fields
* More recent works:
* RNN
**Then we got to define the input structure of neural networks**
1. Use 1D sequential structure, where each note is ordered with time and pitch
* Can't to handle polyphony.
2. Use 2D piano rolls and regard piano rolls as images.
* Much higher dimensions
* Thirty-second note, Tuplet → High time resolution :(
Propose a model based on **Graph Neural Network**, and model a music score as nodes(notes) and edges(musical relations between notes).
## Networks
### Gated Graph Neural Network
* A graph: $G = (\mathbf{\nu}, \varepsilon )$
* $\nu$ : Notes.
* $\varepsilon$ : Edges. We define six types of edges.

1. *Next* : Connect to the following notes.
2. *Rest* : Connect to the notes right after the rest ends.
3. *Onest* : Connect to notes that begin together.
4. *Sustain* : Connect to notes that appear between *Onset* and *Next*.
5. *Voice* : !?
6. *Slur* : !?
* Use the propogation rule of GRU

### Hierarchical Attention RNN
* Why using HAN?
1. Hard to define $h_{t}^{0}$ when several notes played at the same time.
2. In RNN, $h_{t}$ includes informations in previous states $h_{0}...h_{t}$, while this assumption failed in the case of GNN.
3. Attention method saves the world.

Compute measure vector using **multi-head attention**.
Slice a normal attention model.
<div style="text-align:center" markdown="1">
<img src="https://i.imgur.com/OchvXNJ.jpg">
</div>
<p style="text-align: center"><font size="2" color="grey"><i>Attention Model.</i></font></p>
<div style="text-align:center" markdown="1">
<img src="https://i.imgur.com/SPm1frP.jpg">
</div>
<p style="text-align: center"><font size="2" color="grey"><i>Multi-head attention Model with cute equations.</i></font></p>
Use measure vectors as inputs to a higher-level RNN to generate measure-level representations.
### Iterative Sequential Graph Network
<div style="text-align:center" markdown="1">
<img src="https://i.imgur.com/oGvl2YP.jpg">
</div>
<p style="text-align: center"><font size="2" color="grey"><i></i></font></p>
Consider higher level hidden state when calculating lower level hiddeen state, so the model can learn the hidden state of a note with higher-level context, like phrases or measures.
Concate measure-level representation with note-level data.
## Modules
<div style="text-align:center" markdown="1">
<img src="https://i.imgur.com/NQ4BPqP.jpg">
</div>
<p style="text-align: center"><font size="2" color="grey"><i>The confusing modules.</i></font></p>
## Experiment
### Data
* **Input** : Embedded note-level feature, including informations about pitch, duration of note, amd dynamic marking.
* **Output** : Tempo, MIDI velocity, onset deviation, articulation, and features to handle padal.
### Evaluation
* **HAN** : Score encoder + performance encoder + performance decoder.
* LSTM : Note-wise and voice-vise LSTMs for note-level representations.
* HAN : Beat-level and measure-level representation
* **BL** : Remove HANs and voice-wise LSTM.
* **G-HAN** : Replace LSTMs with GGNN.
<div style="text-align:center" markdown="1">
<img src="https://i.imgur.com/yGtV3Qy.jpg">
</div><p style="text-align: center"><font size="2" color="grey"><i>MSE and KLD</i></font></p>
<div style="text-align:center" markdown="1">
<img src="https://i.imgur.com/kXuie1i.jpg">
</div><p style="text-align: center"><font size="2" color="grey"><i>Correlation.</i></font></p>
<div style="text-align:center" markdown="1">
<img src="https://i.imgur.com/BGhLcKz.jpg">
</div><p style="text-align: center"><font size="2" color="grey"><i>Listening test.</i></font></p>