Action Plan for Affinity-VAE (Preperation for NeurIPS 2023)

# Action Plan for Affinity-VAE (Preperation for NeurIPS 2023) ## Software Engineering Actions - [ ] Include a module to enable calculation of similarity matrix - soap , FCS ### Unit tests - [x] Test that pose can be turned off ### configuration checks - [x] test that all classes in the molecule list are in affinity matrix. The other way around is not necessary as affinity matrix can include all classes. The user selects the classes they want to train for and affinity column of chosen classes is chosen from affinity file ### model - [x] Model_a check that pose can be turned off - [x] Filters as well as number of channels and depth: If Filters was not present require the channel and depth parameters and construct filters from that - [ ] Refactor model_b to accept 2D inputs as model_a - [ ] Model_b has batch normalisation , would model_a benefit from it ? investigate ### run - [x] add option to choose a model (we are keeping both models ) - [x] combine arguments received from command line to make it shorter or create a json file (input file) which contains all parametrs in clocked format (order of appearence should not affect) ### data - [ ] update the method of metadata gathering from pandas to Cudf (cuda df) - [ ] change ProteinDataset_rotatable dataloader to a subclass of the superclass ProteinDataset. Goes under the abstract class (This is for when we have only one image of each class and rotate many times to create dataset) - [ ] Ultimately we would like people with non standard/unusual dataset. Therefore We could add an abstract class which can be extended to loads more specific data types such as pngs - [ ] Tomogram Dataloader : - Read the tomogram in the dataloader init and create subtomograms save to disc and close. Then load from the saved files. (This essentially uns the script to save all the subtomograms) - For rotation which does not suffer from interpolation we can just flip n*90 degrees randomly in different directions. - Pytorch tranform has in build rotation. We can try this for random rotation. - Any rotation or transformation can be appended to transforms list in data.py. (Could create an extended class) - [x] In the case of automatic detection in a full tomogram, we would use the labeled annotated proteins for training and non labeled (only automatically detected) proteins for testing ## model state -> plots - [ ] Implement early stopping (early stopping of the training) if the log of the loss has not changed for threshhold number of epoch - [x] Add ability to select the pre-trained model state to do evaluation from - [x] for restarting training from a saved state, we would like to be able to change the training parameters for example learning rate - [x] plots appearence should be improved - larger fonts - more ditinct colouring - more distinct symbols ### Done - [x] change dataloader return to a tuple rather than a dictionary : Marjan - [x] Molecule list from CSV file - [x] run module to be seperated from theaffinity API. click arguments only. Contains calls to train and evalute. - [x] strip train evaluate model and dataloader from unnecessary arguments to be consistant for both models - [x] The loss to beseperated from the models - [x] Unify the functions for a single forward/backwards pass - [x] saved model include information about : Time saved/latent dimensions/Pose dimensions - [x] save the meta data as a pkl to be able to load the latent space for evaluation investigation - [x] create a config.py file which holds all global variables for visualisation - [x] seperate computation of display of accuracy - [x] seperate meta data gathering from training - [ ] add option to randomly rotate data - [x] Training module to be added - [x] Seperate the dynamic dispaly from static display - [x] pose set to true or not / can we set pose dimension to zero and if it is zero there is no pose - [x] confusion matrix (for every n epoch) and printing validation accuracy and train accuracy during training - [x] A nice readme file that shows how to use it as an API and a command line tool (Camila) ## work in progress #### Marjan - [x] save alpha numeric data for comparisons with other models - [x] Shrec dataset and benchmark against existing models (Full tomogram is included here) - [x] SOAP from atomic coordinate - [ ] possible other approaches for describing shape - [x] hardcoded affinity #### Jola - [x] benchmarking number of classes vs number of samples - [ ] Proof of where affinity is stronger: Data with various resolutions/confirmations should work better with Affinity-VAE in comparison with $\beta$-VAE - [ ] Pose disintanglemen - [ ] Rotations matrix - [x] class certainty (distribution of clustered classes) #### Nikolai - [x] To provide data to Jola which can potentially be used instead of resolution data #### To be assigned - [ ] deep subspace clustering comparison - find the state of the art methods - use their data using our model - [ ] other VAE variants comparison #### Suggestions from Alan - tests and pre-commit - one VAE model is enough - communicate on git through issues so that they can be reviewed by the other contributors (would be nice to have a living repository) # Figures for paper - side by side comparison of $\beta$-VAE and Affinity-VAE - turn pose off and $\beta=1$ and $\gamma = 0$. This is on the resolution data. ### Summary of action required based on reviewers comments also here's a summary of what we need to add from each reviewer: #### CTMg: * tetromino * no quantitative results * protein and tetramino data also must show comparisons with other methods * experiments: b-VAE, vanilla VAE, trained classifier, random network * why is affinity-VAE better than beta? - mainly affinity with pose - it's not just subspace clustering algo, it is removing within-class sources of variation from clustering * emphasize pose: it is capturing within-class variation (most of the variance in latent spaces actually captures within class variations - not between class variations i.e. the differences of what makes an object what it is) * normalising distances - we didn't normalise distances but similarity values * comparisons with methods from intro and state whether we do overcome issues they pose e.g. large amount of data * is z_i,j a latent rep of x_i,j? how do the indices relate? * explain hyperparameter selection * how are letters and numbers selected for AN data? how are they rendered in the image? how many original samples in figure 3? * show how the pose component encodes pose? - already shown in the paper figure. * use symbols for train/val/test instead of opacity * experiments on reproducibility in appendix #### BXNM: * quantitative results * novelty? is the design of this model better than [1],[2]? - show that it is because of pose discouraging most within-class variation out of the latent and compare with them #### R6EZ: * requires a priori knowledge which might be unavailable in some domains and difficult to establish for large datasets - this is not correct, affinity matrix is calculated by pairwise comparisons with a target function and thus remains completely unsupervised * contrastive learning methods, do they reduce the impact and novelty? is it much better than the baseline? #### xo7f: * the difference between affinity and beta is only affinity loss whcih is heuristic - incorrect, it is also pose that together with affinity discourages any within-class variance from the latent representation * how does the pose influence clustering, no corresponding discussion - it discourages within-class variation from the latent and thus improves accuracy * experimental set up not described in detail limiting reroducibility - appendices? * statistical experimental analysis lacking in terms of widelu ised metrics such as NMI, ARI for clustering task and accuracy for classification task * comparison experiments with other methods with state-of-the-art deep clustering * ### further investigations 1- final activiation function (sigmoid or relu) make sure the output is compatibale with input file 2- ask Joel and an Nikolai to present their data