GN1 Dev Tasks - HackMD

# GN1 Dev Tasks ## Tasks Please write your name if you are are going to investigate a task! #### 1. Very quick (will turn on by default): - [x] layer norm: added, appears to increase convergence - [x] droupout: seems to degrade performance so have disabled for now, can re-test when we increase the model size - [x] residual connections: added additional residual in first gatconv layer - [x] ReLU -> elu/silu/selu/mish/gelu? - note we are actually using ELU in the gat, for no particular reason. - This needs to be changed everywhere (MLPs and GAT). - Plus, in the init_params method of the gatconv class, you need to make sure the initialised parameters are using the correct gain. #### 2. Test one-by-one: - [x] dense node updates (already supported, not enabled) - [x] add a separate value projection (already supported, not enabled) - [ ] promote all projections to full MLPs, calculate attention weights with MLPs - [x] go deeper/narrower (3->6 layers, config change) - [x] increase number of heads 2 -> 8 - [x] concat jet pt, eta to any linear transformation/MLP [!83](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/GNNJetTagger/-/merge_requests/83) -- *Dmitrii* - [x] add LR scheduler (e.g. [torch.optim.lr_scheduler.OneCycleLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html)) - You might have to disable SWA in the trainer (or config file) as it may interfere with a scheduler. - Run with verbose=True to ensure it's doing what we want - [ ] Add label smoothing (just an argument to the loss) - [ ] Train for more epochs (up to 200) #### 3. Slightly more involved: - [ ] persisent edge features (this is not so bad) - [ ] persisent global features (this requires using a hetrograph, which is supported but deprecated) -- *Dmitrii*, I will try to add heterograph support ## Merge requests: Main MR into main will be collected in [!81]( https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/GNNJetTagger/-/merge_requests/81). Please open MRs to this branch (svanstro/model-updates) - [!82](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/GNNJetTagger/-/merge_requests/82): small improvements to dropout and layernorm - should speed up convergence - [!83](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/GNNJetTagger/-/merge_requests/83): draft version of pt/eta concatentaion -- *Dmitrii*