List of experiments

# List of experiments Important : * In **bold**, the name of the project in weight and biases. * The mentioned scripts are the one to use with the most recent version of the gitlab repo, sometimes the experiments were originaly made with a different script (check the information of the run in wandb : script and gitlab version). # History : ## Reproduce Yamil's results with BasaiaNetLite * model : BasaiaNetLite * scripts : `train.py` * datasets : MIA and IMAGINA (31 and 43 patients) * results : the model works with the dataset MIA but overfit very quickly on IMAGINA, we cannot reproduce an accuracy of 70%. I think this is due to the variability of the model : it is possible to achieve 70% accuracy but not if the results are averaged over multiple replication. Transfer learning from MIA to IMAGINA didn't helped, with or without freezing the weights. * **MIA_pt** : 96% Acc, 99% ROC AUC with 4 replicates of Cross-Validation (4 CV-Rep). It uses the 31 original patients * **MIA_pt43** : 96% Acc, 99% ROC AUC with 4 replicates of Cross-Validation (4 CV-Rep). It uses the 43 patients we had after the inclusion of the new patients. * **IMAGINA_31** : 62% Acc, 61% ROC AUC with 4 CV-Rep. It uses the 31 original patients * **IMAGINA_43** : 53% Acc, 44% ROC AUC with 4 CV-Rep. It uses the 43 patients we had after the inclusion of the new patients. * **IMAGINA_43_DA8** : 55% Acc, 44% ROC AUC with 4 CV-Rep. Same as above but with data augmentation mode 8 * **IMAGINA_tl_nofreeze** : 42% Acc, 46% ROC AUC with 4 CV-Rep. Transfer learning from weights of **MIA_pt** on IMAGINA_43 dataset. No layer are frozen during the training. * **IMAGINA_tl_freeze** : 40% Acc, 49% ROC AUC with 4 CV-Rep. Transfer learning from weights of **MIA_pt** on IMAGINA_43 dataset. The first 4 layers are frozen during the training. ## Other CNN architecture : EfficientNet, ResNet, DenseNet * scripts : `train.py` * datasets : MIA and IMAGINA * results : using other models with more parameters did not lead to better performance, with still a lot of overfitting. In **DN_RS** and **EffNet** I do a random search to try to find the optimal hyperparameters of DenseNet and EffNet on the IMAGINA dataset. In **IMAGINA_DN** I try different optimizers, lr, modalities, spacing, etc. with DenseNet without success. ## Pre-trained models : Med3D weights for ResNet * scripts : `train.py` * datasets : IMAGINA * results : at first by looking at the experiments in **Med3d_TL** we could think that the Med3D weights combined with heavy data augmentation reduced the overfitting. But by doing more replicates (in **Med3D_rep**) we have an average accuracy of 0.55% and average ROC AUC of 0.57%. However even if the performances are not good, using data augmentation still helped a little bit. ## Transformers architecture from scratch : ViT, CCT * scripts : `train_vit.py` (ViT) or `train.py` (ViT and CCT). The first one allows to use more arguments specific to ViT. * Architecture : number of blocks, hidden size, heads, etc. for 2D and 3D * project : * **MIA_2D** * **IMAGINA_VIT** * results : * VIT is an interesting architecture that gives good performances on MIA dataset and was kept for other experiments. * CCT is hard to adapt in 3D because using small kernels lead to a huge sequence that take too much GPU VRAM. Using two 7x7x7 kernels works but I stopped using CCT in favor of ViT because it was simpler for the experiments (less hyperparameters). * When training in 2D, CCT is faster and better than ViT. ## Many different data augmentation combinaison of flip, rotation, noise, smoothing, zoom and elastic deformations. * scripts : all * projects : **da_search**, **Med3D_TL** and many others * results : Data Augmentation helps the models. I found that there can be two optimal data augmentation that would help depending on the model and task : * "simple" augmentation that doesn't change the structure of the image : noise, contrast, flip. (DA mode 15) * "heavy" augmentation that changes the data structure : rotation, zoom, elastic transforms. (DA mode 9) * For example with ResNet and Med3D weights heavy augmentation seemed better while when training the ViTMAE simple augmentation was better. ## Regularization and dropout to reduce overfitting * scripts : `train.py`, `train_vit.py` * datasets : IMAGINA * projects : **l2dropout** * results : L2 Regularization do, to some extent, reduce overfitting but it was not enough. Dropout didn't seem to change anything. ## Train for the prognosis task Not a project in itself, but has be done in other projects : multitask, meta-learning and others not on wandb. ## Multitask : prognosis and diagnosis * scripts : `train_multitask.py` and `train_vitmae_transfer_multitask.py` * datasets : IMAGINA * projects : **multitask** * results : multitasking didn't help to learn and models performing well for prognosis often performed poorly on diagnosis and vice-versa. ## Transformer architectures with self-supervised pre-training : ViTMAE, SIMMIM * scripts : `train_vitmae.py`, `train_vitmae_transfer.py` and `train_vitmae_transfer_multitask.py` * model : ViT * datasets (self-supervised) : IMAGINA, ADNI and IXI (the largest one by far, but only healthy patient) * target task : IMAGINA, MIA (2D and 3D) * finetuning method : whole model, last block or last layer * results : * **SimMIM** : it doesn't work as good as ViTMAE * **ViTMAE** : experiments with ViTMAE on MIA and IMAGINA but it 2D * **ViTMAE_3D** : experiments in 3D. I concluded that the best architecture was with 4 blocks, 4 heads and a hidden size of 512. I trained multiple model with MIA, IMAGINA and IXI datasets. * **ViTMAE_3D_Transfer** and **ViTMAE_3D_Transfer_IMAGINA** train a ViT on IMAGINA using the weights of self-supervised ViTMAE. Experiment with data augmentation, layers to freeze, etc. * I found that for the transfer learning, averaging the sequence worked better than using the class token (idea from CCT model) ## Meta Learning : * scripts : `train_meta.py`, `train_proto.py` * methods : Baseline, Baseline++, ProtoNet * meta training tasks : ADNI, PPMI, DD (combined or not) * target tasks : ADNI, PPMI, DD, IMAGINA, IMAGINA prognosis (one at a time) * models : ResNet, DenseNet, EfficientNet, ViT * pretrain weights : only for ViT with self-supervised training * project : **ProtoNet**, **meta_ADNI**, **meta_PPMI**, **meta_training**, **meta_train_VIT**, **meta_train_rep** and **DA_and_IXI** * experiments : * Baseline/Baseline++ with DenseNet, ResNet, EfficientNet, ViT and BasaiaNet * ProtoNet with ViT and EfficientNet * P>M>F method with P (pretraining) on IXI using ViTMAE method, M : ProtoNet or Baseline/Baseline++ and F : finetuning with or without cosine classifier * results : * It looks promising but : * We don't have enough dataset/tasks to train on during the meta-learning step * Some tasks may be too hard to really help during the meta-learning step (DD or IMAGINA) * The target task (IMAGINA) is also very hard, it is therefore difficult to analyze the results and understand the influence of certain parameters on the performance. For example when changing the target task to PPMI which is easier, we can see that the pre-training step helps a lot. * Episodic trainings (ProtoNet, MAML) are hard to implement and require a lot of GPU RAM, making them harder to train. * EfficientNet and ViT seemed to be the best architectures for meta learning. * It could be worth the try to change the dataset and the self-supervised method for the pretraining step : * Have more images * Have patients with brain pathology * Try DINO or other SS methods that works with ViT and also the other architectures. * Baseline++ seemed to perform better than Baseline and ProtoNet ## Patient confidance analysis * scripts : * standard training : `utils/analyse_patients.py` * meta-learning : no script needed (directly in Wandb or in the results folder) * results : I didn't explore this part too much become it takes a lot of time to look at the tables and understand a "logic" in the predictions of the model # Conclusion IMAGINA is a small dataset and a dificult task, therefore our models quickly overfit on the train set without really learning features useful for the prediction of the diagnosis/prognosis. This causes poor performances on the validation/test set. In future work it can be a good idea to reduce overfitting : * With pre-trained networks either with self-supervised learning (at a bigger scale and/or with different methods), or supervised training such as in Med3D. * With Meta Learning / Few Shot Learning, but with more tasks (and task that are revelant, i.e. the model learn something from it during the meta training) * Maybe using more classic machine learning methods, or mixing ML and DL. This would help both for diagnosis and prognosis.