## Knowledge Transfer for Continual Learning ## Bi-Weekly Meeting Minutes ### 9th June 2021 - Meeting Gianma + Happy - After running Gianma code, try the same on my code to get a good baseline - Request Code for moon data sampling from Fariz - Update the loss ablation table with statistical significancy results - Gianma to share BLS code from Darmesh - Update the plots code ### 7th April 2021 - Meeting Gianma + Emti + Happy - Write up the Experimental results part - Meeting with CL team - Think of possible collaboration (scaling current experiments to bigger architectures) ### 24th March 2021 - Repeat all the MNIST experiments by; - Train teacher longer to get the best performance possible - Use similar parameters across all experiments - Try DSL+KD ### 10th Feb 2021 (Fariz+Gianma) Suggested next steps to check for large scale KD (Tiny-ImageNet and CIFAR100) with DNN2GP - Compute hessian ggn only on memorable points - The time taken still large (5 days for 10000 memorable points, minimal number of class 200) - Start form 1000 memorable points, 5 for each class - Check this proposed bottleneck in MNIST - Split the class on hessian ggn update - Encounter the singular matrix error on moist experiments ### 14th Nov 2020 (Happy+Fariz+Gianma) - How to overcome out of memory issue - Re-do experiment with DSL - Re-do experiment with DSL + Memorable loss (SL) - Implement early stoping in the code ### 11th Nov 2020 (Happy+Fariz+Gianma) - Check the FROMP code, adapt for TL task - On 2D Toy experiment, - Using Loss from FROMP, Train using only the second part of the loss - Train with the full loss - Meeting with Siddharth on (11/19) to discuss how to adapt FROMP code for Transfer learning. Meeting with Bayesian duality members **Kernel transfer in Knowledge Distillation** Introduction write up draft: Knowledge distillation has increasingly become a popular solution for model compression over the years with a wide range of applications. ### 14th October 2020 (Happy+ Fariz + GianMa) Direction to take on Dropout & Kernel transfer in Knowledge Distillation: ### Feedback - Land mark selection (with bayesian duality) - How to use DNN2GP representation for kernel computation ### 6th October 2020 (Happy+GianMa) - Toy experiment results for a simple FFN from scratch (Cats Vs Dogs) - Discussion about "Efficient kernel Transfer in Knowledge Distillation" - How to contribute to MCDropout ? ### Feedback - Improve kernel transfer computation with DNN2GP - Simple 2D for the current experiment (Cats Vs Dogs) - Create RAIDEN Environment to train all Inception layers ## 8th September 2020 (Happy + GianMa+Emti) #### Toy Experiment - Image classification with Inception Model (Cats Vs Dogs) - Some Visualisation - Paper discussion "Parameter Efficient Transfer learning in NLP" #### Feedback - Train a simple FFN model from Scratch (Cats Vs Dogs) - Freeze some layers and transfer using Inception - Train all Inception layers from Scratch - Compare two networks - Compute function representation at each layer - Get account for Raiden ## 25th August 2020 - Survey on transfer learning - Identify key transfer learning papers - Prepare slides and presentation at reading session ### Feedback - Identify specific/simple but promissing transfer learning task, do some experiment, visualisation & create benchmarks - Identify relationship between transfer learning and Continual lerning - Think about analogy of knowledge transfer - Look into similarity between NN layers ## 10th August 2020 (Happy+GianMa+Emti) - Paper discussion "Neural Variational Inference for Text Processing" ### Feedback - Paper is not related to transfer learning and not relevant to the team - Have a broad view/picture of transfer learning - Have a broader picture of losses used in transfer learning and Continual Learnig ## 29th July 2020 - Decided to hold a bi-weekly meeting to discuss progress and next steps - Discussed about my current research on question answering with memory networks and KGs - Think about solving related problem/s but in a simpler way. - Specifically: Find some work that have tried to solve the problem in a somewhat easier way. - Goal: Figure out how to apply transfer learning to broad NLP problems