Toolkit Meeting Note

# 8/3 ## Test stat verification - [x] FastGCN <br></br> <img src="https://hackmd.io/_uploads/r1t6r8Oih.png" width="600"> - [x] SALIENT (matching paper) ![](https://hackmd.io/_uploads/r1yEfvds3.png) - [x] EGCN (match paper) ![](https://hackmd.io/_uploads/BJbeBpui3.png) - [x] GTS (5 epochs, approximately matching the paper) ![](https://hackmd.io/_uploads/ByRNdIds3.png) - [x] DAG-GNN ![](https://hackmd.io/_uploads/Sy-Hbvdjn.png) - [x] GANF <br></br> <img src="https://hackmd.io/_uploads/SyByUL_on.png" width="340"> ## Updates * Enabled parameter search for all models * Have an parameter search notebook example for FastGCN # 7/27 Q 1. Not sure about some salient training config parameter (for documentaiton purpose) ## Updates 1. Finished testing multiple gpus for SALIENT 2. Applied wandb to SALIENT 3. (Debug wandb usage) 4. Documentation for SALIENT 5. wandb parameter tuning in progress # 7/20 ## Jie - current idea is to bring concepts from two sides 1. architecture inspired by LLM (GPT, BERT), 2. graph part, i.e. deepwalk, node2vec to extract random walks from the graph so we can have walks to represent the graph. ## TODO - table of link to notebook and basic info - verify accuracy ## Updates 1. wrap egcn tasker 2. wandb & documentation for fastgcn, gts, ganf, dag-gnn 3. todo: egcn write users what to expect for dataset 4. todo: salient wandb + documentation ## Transformer Current way for positional encoding - use distance on the graph from first node as position encoding. - GraphGPS is very diff from our idea; use random walk in a diff way. Check out DGI (self supervised training for graphs) Current approach - language models take chunks to run transformers - given a graph, how to chunk? use random walk/node2vec - play around attention mask - training objective (ie. GPT2 use next token prediction, BERT uses masking, now contrastive learning) 1. Walk-Tranformer, [A Self-Attention Network based Node Embedding Model (ECML 2020)](https://arxiv.org/pdf/2006.12100.pdf) * employ a transformer self-attention network to iteratively aggregate vector representations of nodes in random walks * [code](https://github.com/daiquocnguyen/Walk-Transformer) 2. GraphGPS, [Recipe for a General, Powerful, Scalable Graph Transformer (NeurIPS 2022)](https://arxiv.org/pdf/2205.12454.pdf) * we propose a blueprint for building graph transformers with combining modules for features (pre)processing, local message passing, and global attention into a single pipeline * [blog post](https://towardsdatascience.com/graphgps-navigating-graph-transformers-c2cc223a051c) * [code](https://github.com/rampasek/GraphGPS) * come up with position encoding 3. LSPE, [Graph Neural Networks With Learnable Structual and Positional Representations (ICLR 2022)](https://arxiv.org/pdf/2110.07875.pdf) * "A major issue with arbitrary graphs is the absence of canonical positional information of nodes, which decreases the representation power of GNNs to distinguish e.g. isomorphic nodes and other graph symmetries. An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers. Possible graph PE are Laplacian eigenvectors. In this work, we propose to decouple structural and positional representations to make easy for the network to learn these two essential properties. We introduce a novel generic architecture which we call LSPE (Learnable Structural and Positional Encodings). We investigate several sparse and fully-connected (Transformer-like) GNNs, and observe a performance increase for molecular datasets, ..." # 7/17 ## Questions 1. GTS dataloading ## Project: graph transformer, diffusion model for graph generation - both generative - both have graph components ## Graph transformer - how can we scale transformer architecture for large graphs - main idea: use sequences to represent patches of a graph, b/c existing transformers are good at handling sequences, so we use them to process graph patches and learn representation for nodes and graphs - Phase: a phd student on it, exploring the idea. nothing working out of the box but has promising ideas. - My involvement: join the meetings with the phd student, twice a week. Get papers to read to get up to speed. I will brainstorm and contribute ideas, and find implementation works to do. i.e. implement new ideas and experimentations of baseline and existing methods. - baseline would be easy to start, but exploring ideas is welcome too. - Monday 3pm, Thursday 2pm - Background: - GPT (next token prediction), BERT (recover masked tokens) - mixed results; - use input embeddings independent of context, apply linear layer for predictions, but input embeddings are not useful enough. Want to try contextualized embeddings instead. - i.e. BERT tasks are about entire sequence, but we only want classify individual nodes - how do we do so? - Q1: What's input? - currently inputs are random walks - Q2: How to apply pretrained models to downstream tasks? - particularly we only want to predict a node. maybe random walk starting from that initial node. Then we do downstream task on that initial node. - looking into other self-supervised training technique (loss) for graphs - sequence representation, deepWalk, node2vec - explore training method/loss - **TODO** - Read ___ - identify simple idea that works for transformers - literature search using `random walk` + `transformer`; focus on recent 3 years in top ML conferences. ## Diffusion model - would like to generate large graphs - model learns from training data and map the data to a standard probability distribution, then we sample from the distribution then reverse generating new data. - explore how to extend the current diffusion model for images to the case of graphs, in particular graphs that has graph structure and node features. - Phase: no one is on it. Jie did some math for guiding implementation. Has some idea of how to start. # 7/13 ## Main updates 1. Finished SALIENT trainer 2. Finished GANF trainer 3. EGCN cleanup ## TODO - [ ] egcn, ganf, gts - consistent way for dataloader, expose load function ### EGCN - [ ] merge tasker and splitter - [ ] have a super dataset class, user needs to write function to make dataloader - [x] rewrite Trainer class # 7/10 ## Main updates 1. Finished FastGCN 2. Finished DAG_GNN 3. change as model_utils to as utils ### SALIENT - [x] changed trainer method to class? (priority) ### GANF - [x] take best model not last model - [x] changed trainer method to class? (priority) # 7/6 ### Main updates 1. Fixed all imports 2. Finished configs for all 3. Wrote `Trainer` class for DAG_GNN, did sanity check - Consider doing Trainer class for all codebases 4. Made `Trainer` class for FastGCN 5. Working on SALIENT trainer class and other minor adjustments ### General * enable checkpoints, load best, inference * make trainer method * change import path ### DAG_GNN (done) - [x] made config - [x] made and tested trainer class ### FastGCN (done) - [x] made trainer ### GTS - [ ] split up supervisor # 7/3 EGCN (elliptic) & SALIENT (arxiv) **Updates for GANF** 1. Changed `Water` dataset to `GANF_Dataset` under `GraphTK/datasets/GANF_Dataset/py`. Currently using a load_data() function in the notebook to get Dataloaders. Import as `from GraphTK.datasets import GANF_Dataset` 2. Added config to GANF 3. Changed `from GraphTK.models.GANF.GANF import GANF` to `from GraphTK.models import GANF` 4. Added trainer for GANF **Updates for FastGCN** 1. Changed `import GraphTK.utils.FastGCN.utils as fastgcn_utils` to `import GraphTK.utils.FastGCN as fastgcn_utils`. 2. Changed `from GraphTK.models.FastGCN.FastGCN import FastGCN` to `from GraphTK.models import FastGCN` 3. Simplified paramaters required by FastGCN by intaking data and args. 4. TODO: make a trainer **Configuration file conversion** - [x] FastGCN - [x] EGCN - [x] SALIENT - [x] GANF - [x] DAG_GNN - [x] GTS **Known issues & Questions:** ~~1. Elliptic out of bound issue~~ ~~2. fast_sampler needs to be installed seperately~~ ~~3. GANF's double training loop seems like doing inference at the same time?? How to seperate train/test/eval or do I need to seperate them? - move testing outside, load best model~~ **General guideline** 1. be clear to users how to deal with new dataset 2. let toolkit look well integrated and well organized rather than just 6 papers **TODO** - [x] 1. For GANF, change Water to some other unified name since it overlaps Traffic - [ ] 2. For EGCN, get ride of static data splitting. combine tasker and splitter, have it return dataloaders, and pass it to trainer - [ ] 3. Have a superior Dataset class that takes edges and nodes, pass it to combo of splitters and taskers that return dataloader ==> send to train - [ ] 4. save checkpoints in the training loop. before testing there will be loading checkpoints * for bayesian network no testing for now - [x] 5. change _init_ so import model doesn't repeat model name twice - [x] 6. make a folder called trainer, wrap training loop (don't do this until we have a decision on friday or next week) - [x] 7. Remove inner print statement of DAG-GNN train, keep the outer ones (edited) **STEPS [follow this]** - [x] 1. finish egcn - [x] 2. do SALIENT - [x] 3. do configs - [x] 4. reorg directory/import and utils - [ ] 5. do Trainer for each codebase - [ ] 6. merge splitters and tasker for egcn (unify dataset) - [ ] 7. add inference - [ ] 8. split up GTS supervisor class - [ ] 9. write visualization function - [ ] 10 ... **Note to self** 1. GANF eval doesn't take best model but last model; eval is already in trainer which is odd 2. GTS looks too compiled # 6/22 GTS, GANF, EvolveGCN EvolveGCN 1. `taskers_utils` `get_static_sp_adj(edges,weighted)` called `subset` that is commented out and relevant if/else is never used 2. Is distributed needed? ToDo: - [x] 1. finish sbm - [x] 2. do elliptic - [x] 3. do SALIENT - [x] 4. do config files and change code accordingly - [ ] 5. do clean up - [ ] 6. do documentation # 6/19 GTS 1. Is `data/sensor_graph/adj_mx.pkl` used? # 6/15 FastGCN & DAG-GNN Agenda 1. Set up package setup.py for its easy import using absolute directory 2. Drafted model.py and utils.py for `FastGCN` 3. Made an example notebook for `FastGCN` using above modules 4. Drafted model.py and utils.py for `DAG-GNN` - Since the codebase is very rushy, spend time on cleaning and removing unused variables, functions, classes, and function calling 5. The original code for `DAG-GNN` uses model as Encoder and Decoder seperately, and functions for training is all over the place; I am working on doing a model superclass `DAG-GNN` to call Encoder, Decoder, train(), save() on the model class instead. 6. Have an example notebook for `DAG-GNN` before the modification in (5). Had it run. 7. Creating a new example notebook that uses compiled model class instead. Next step: 1. Test out DAG-GNN 2. Do 1-2 new codebases next week (the rest is less structured than FastGCN and DAG-GNN) 3. After putting together model.py and utils.py for all 6, start simplifying and work on unifying/refactoring, as well as try to organize data input format. # 6/9 Meeting - [x] 1) FastGCN (cora) - [x] 2) GTS (METR-LA) - [x] 3) SALIENT (ogbn-arxiv) - [x] 4) GANF (SWaT) - [x] needed SWaT dataset, used Metr-LA instead - [x] 5) DAG-GNN (ECOLI70) - [ ] need ecoli70. the repo example is synthetic data - [x] 6) EvolveGCN (elliptic) Note: need to install `pip install pandas openpyxl` for GANF to convert xslv # 5/1, 5/28 IBM Meeting ###### tags: `IBM` - [x] 1) FastGCN - [x] 2) GTS - [x] 3) SALIENT (last yr) - [x] 4) GANF (last yr) - [x] needed SWaT dataset, used Metr-LA instead - [x] 5) DAG-GNN (3 yrs ago) - [x] 6) EvolveGCN (4 yrs ago) ### 1. Reproduce FastGCN on Cora [Done] ``` ========================= STARTING TRAINING ========================= TRAINING INFORMATION: [DATA] Cora dataset [FAST] using FastGCN? True [INF] using sampling for inference? False [FEAT] normalized features? False [DEV] device: cuda:0 [ITERS] performing 200 Adam updates [LR] Adam learning rate: 0.01 [BATCH] batch size: [256, 400] [STOP] early stopping at iteration: 50 RESULTS: [LOSS] minimum loss: 0.17657136917114258 [ACC] maximum micro F1 testing accuracy: 87.4 % [BATCH TIME] 0.0136 seconds [TOTAL TIME] 0.6817 seconds ========================== ENDING TRAINING ========================== ``` ### 2. Reproduce GTS [Done] 1. Change yaml.load() to yaml.safe_load() in train.py 2. `conda install scikit-learn` 3. `pip install chardet` ### 3. Reproduce SALIENT [Done] Note: there's SALIENT++ now 1. `pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116` 2. ``` pip install torch_geometric pip install torch_sparse -f https://data.pyg.org/whl/torch-2.0.1+cu116.html ``` 3. `pip install ogb` 4. ``` cd fast_sampler python setup.py install cd .. >>> import torch >>> import fast_sampler >>> help(fast_sampler) ``` 5. `conda install prettytable -c conda-forge` 6. `pip3 install torch-scatter -f https://data.pyg.org/whl/torch-2.0.1+cu116.html` 7. Change $HOME directory in `./example_single_machine.sh` ``` Example 4 Using 1 devices per node Loading dataset took 0.077871107 sec Fall into 1 node ddp Using DDP with 1 nodes dataset.share_memory_() took 0.041916989 sec Performing inference on trained model at /home/skunk/gnn_reproduce/SALIENT/job_output/example_single_machine/model_0_3.pt test (multi proc, showing main proc): 100%|████████████████████████████████████████| 48603/48603 [00:02<00:00, 23581.50it/s] Final test accuracy is: 0.6502067773594222 Performing inference on trained model at /home/skunk/gnn_reproduce/SALIENT/job_output/example_single_machine/model_1_3.pt test (multi proc, showing main proc): 100%|███████████████████████████████████████| 48603/48603 [00:00<00:00, 273150.48it/s] Final test accuracy is: 0.6681069069810506 ``` ### 4. GANF [Done] 1. Need to pip install torch, numpy, pandas, tables 2. `python train_traffic.py --data_dir ../GTS/data` 3. Default lr is too high For SWaT 1. Had to strip "timestamp" column name 2. need seaborn 3. Had to strip "timestamp" column entries like ``` def load_water(root, batch_size,label=False): data = pd.read_csv(root) data = data.rename(columns={"Normal/Attack":"label"}) data.label[data.label!="Normal"]=1 data.label[data.label=="Normal"]=0 data["Timestamp"] = data["Timestamp"].str.strip() # new changes data["Timestamp"] = pd.to_datetime(data["Timestamp"], format="%d/%m/%Y %I:%M:%S %p", dayfirst=True) # new changes ... ``` ### 5. DAG-GNN [Done] 1. using conda ganf environment, had to install scipy 2. `(ganf) skunk@sshpod-service-xieyi-40mit-2eedu-2505-75c796596d-62thf:~/gnn_reproduce/DAG-GNN/src$ python train.py --graph_linear_type=linear` ``` 0.014104243387784265 Epoch: 0297 nll_train: 0.0422882881 kl_train: 0.0067333996 ELBO_loss: 0.0490216876 mse_train: 0.0084576576 shd_trian: 0.0000000000 time: 1.7723s 0.016934153207722957 Epoch: 0298 nll_train: 0.0116544022 kl_train: 0.0088980811 ELBO_loss: 0.0205524833 mse_train: 0.0023308804 shd_trian: 0.0000000000 time: 1.6559s 0.013728886614096325 Epoch: 0299 nll_train: 0.0002578203 kl_train: 0.0085573916 ELBO_loss: 0.0088152119 mse_train: 0.0000515641 shd_trian: 0.0000000000 time: 1.5567s Optimization Finished! Best Epoch: 0296 0.006459404180986894 Epoch: 0000 nll_train: 0.0001374057 kl_train: 0.0079273946 ELBO_loss: 0.0080648003 mse_train: 0.0000274811 shd_trian: 0.0000000000 time: 1.6371s 0.006568287570560827 Epoch: 0001 nll_train: 0.0000414879 kl_train: 0.0072836856 ELBO_loss: 0.0073251734 mse_train: 0.0000082976 shd_trian: 0.0000000000 time: 1.6612s ``` ### 6. EvolveGCN 1. using conda ganf envionment, had to install PyYAML, matplotlib, sklearn, scikit-learn 2. change load to safe_load in util.py 3. go to data and run `tar -xvzf sbm_50t_1000n_adj.csv.tar.gz` 4. change np.float to np.float64 in logger.py (np.float is deprecated) ``` INFO:root:################ TRAIN epoch 27 ################### INFO:root:TRAIN mean losses tensor(0.0956, device='cuda:0') INFO:root:TRAIN mean errors 0.4273642897605896 INFO:root:TRAIN mean MRR 0.0 - mean MAP 0.20460759337553927 INFO:root:TRAIN tp {0: tensor(13911813, device='cuda:0'), 1: tensor(2678011, device='cuda:0')},fn {0: tensor(12223988, device='cuda:0'), 1: tensor(157179, device='cuda:0')},fp {0: tensor(157179, device='cuda:0'), 1: tensor(12223988, device='cuda:0')} INFO:root:TRAIN measures microavg - precision 0.5726 - recall 0.5726 - f1 0.5726 INFO:root:TRAIN measures for class 0 - precision 0.9888 - recall 0.5323 - f1 0.6920 INFO:root:TRAIN measures for class 1 - precision 0.1797 - recall 0.9446 - f1 0.3020 INFO:root:TRAIN measures@10 microavg - precision 0.6155 - recall 0.0000 - f1 0.0000 INFO:root:TRAIN measures@10 for class 0 - precision 0.9931 - recall 0.0000 - f1 0.0000 INFO:root:TRAIN measures@10 for class 1 - precision 0.2379 - recall 0.0000 - f1 0.0000 INFO:root:TRAIN measures@100 microavg - precision 0.6297 - recall 0.0001 - f1 0.0003 INFO:root:TRAIN measures@100 for class 0 - precision 0.9972 - recall 0.0001 - f1 0.0002 INFO:root:TRAIN measures@100 for class 1 - precision 0.2621 - recall 0.0003 - f1 0.0005 INFO:root:TRAIN measures@1000 microavg - precision 0.6268 - recall 0.0013 - f1 0.0025 INFO:root:TRAIN measures@1000 for class 0 - precision 0.9968 - recall 0.0011 - f1 0.0022 INFO:root:TRAIN measures@1000 for class 1 - precision 0.2568 - recall 0.0026 - f1 0.0052 INFO:root:TRAIN Total epoch time: 39.48963527940214 INFO:root:################ VALID epoch 27 ################### INFO:root:VALID mean losses tensor(0.0962, device='cuda:0') INFO:root:VALID mean errors 0.46178218722343445 INFO:root:VALID mean MRR 0.014196249475398657 - mean MAP 0.19908882547580786 INFO:root:VALID tp {0: tensor(2247848, device='cuda:0'), 1: tensor(443241, device='cuda:0')},fn {0: tensor(2280713, device='cuda:0'), 1: tensor(28198, device='cuda:0')},fp {0: tensor(28198, device='cuda:0'), 1: tensor(2280713, device='cuda:0')} INFO:root:VALID measures microavg - precision 0.5382 - recall 0.5382 - f1 0.5382 INFO:root:VALID measures for class 0 - precision 0.9876 - recall 0.4964 - f1 0.6607 INFO:root:VALID measures for class 1 - precision 0.1627 - recall 0.9402 - f1 0.2774 INFO:root:VALID measures@10 microavg - precision 0.6300 - recall 0.0000 - f1 0.0000 INFO:root:VALID measures@10 for class 0 - precision 1.0000 - recall 0.0000 - f1 0.0000 INFO:root:VALID measures@10 for class 1 - precision 0.2600 - recall 0.0000 - f1 0.0001 INFO:root:VALID measures@100 microavg - precision 0.6270 - recall 0.0001 - f1 0.0003 INFO:root:VALID measures@100 for class 0 - precision 1.0000 - recall 0.0001 - f1 0.0002 INFO:root:VALID measures@100 for class 1 - precision 0.2540 - recall 0.0003 - f1 0.0005 INFO:root:VALID measures@1000 microavg - precision 0.6214 - recall 0.0012 - f1 0.0025 INFO:root:VALID measures@1000 for class 0 - precision 0.9956 - recall 0.0011 - f1 0.0022 INFO:root:VALID measures@1000 for class 1 - precision 0.2472 - recall 0.0026 - f1 0.0052 INFO:root:VALID Total epoch time: 3.4931310461834073 ```