# Meetings Coreset Selection
## Meeting 2021-03-06
### Project desctiption
1. Replicate one of the baselines from Selection via Proxy for coreset selection 50%
* Greedy K-Centers
3. Replicate one article 90%
* GLISTER: Online + Active
5. Analyze how well selected subsets generalize across different models 100%
### Questions
* What is the scope of the testing datasets?
* Regression + classification?
* Multi-class classification?
* How many parameter-tweaks?
* Noise in the dataset
* Size of dataset
* Density of dataset
* A: CIFAR-10
* A: MNIST
* What is the scope of the models we should train with the selected core-sets?
* How many models?
* What kinds of models?
* Standard: KNN, LinReg, DT, RF, LogReg
* Bayesian: Gaussian Process
* Reduction of datapoints with the algorithm depends on the model
* Finding out if same reduction of dataset for one algorithm also works well for other algorithms
* GLISTER and Greedy-K-Centers papers have both active learning and coreset selection
* Only do GLISTER online
* Shall various approximations of GLISTER (perfomance-related) be implemented?
* Focus on the main idea, main algorithm first
* What about computing ressources?
* Roland has access to Zhores Sandbox, Vladimir and Yuliya don't have any access
* What kind of algorithms do we have to compare?
* It's only Greedy K-Centers compared with GLISTER
* It is expected that G-KC performs slightly worse than GLISTER
* G-KC is a known algorithm, it is expected to work. GLISTER on the other hand is new, not verified if it works
* What about the code in the repos?
* It's OK to use the code from the repositories. The main body of work is to provide a test-interface for people to give various datasets, and output a comparison between G-KC and GLISTER
* (Yuliya) Regarding GLISTER algorithm, can we use the available codes of algorithms for comparison (FASS, BADGE)?
* (Yuliya) Regarding the GLISTER paper, should we replicate the appendix experiments?
# Meeting 2021-03-13
Next steps:
* Indices with Greedy K-Centers:
* Train ResNet on CIFAR/MNIST and extract features from ResNet (second to last layer)
* Use K-Centers algorithm on feature embeddings, and get core-set indices
* Indices with GLISTER:
* Try to get own implementation of GLISTER to work
* Build a wrapper around GLISTER
* Use GLISTER to get core-set indices
* Use selected core-sets to re-train ResNet and check performance
* Use selected core-sets to train other nets (AlexNet) and check performance
# Meeting 2021-03-18
* What kind of functionality should the commanding front end have?
* download weights of neural networks? No
* core-set selectors:
* generate latent space dataset
* download latent space dataset
* generate subset indices
* parameters:
* selection method (k-centers, gliser)
* % of dataset
* save indices list
* How many sub-sets of the full dataset do you want to have in the report?
* 30, 50
* If Inception takes too much time to train, can we use densenet 121?
* Use densenet 121
* Measurements:
* K-Centers and Random on CIFAR-10
* with 10%, 30%, 50%
* K-Center and GLISTER and Random on CIFAR-10 (if we have time)
* with 10%, 30%, 50%
* 100 epochs training for
* K-Center and GLISTER on smaller dataset (MNIST)
* with 10%, 30%, 50%
# Meeting 2021-03-20
* Final report Appendix B: Do we need to comment on every question?
* If there is some important additional information, write it down
* Plots: How do I get std/CI from one measurement?
* not necessary
* K-Centers: Replication was done with pre-activation layer
* it's fine with default resnet18
* Repository: Is it OK if we have test execution software in jupyter notebooks?
* It's OK for interfaces, still would be nice to have scripts
* Report:
* Add to introduction:
* Hyperparameter search to find best hyperparameters. Trains network often, and for that a smaller dataset is quite important.
* Neural architecture search: Train models many many times