SWYFT overview

# SWYFT overview ## Overall logic ### DataStore Components - model tag - list of DataSets - sample method, which takes mask, returns z or (x, z) pair ### Class DataSet Components - model tag - list of (x, z) - mask - has to act like a dataloader - implementation for HDF5 inputs ### function gen_dataset - takes DataStore, mask, model, tag - returns DataSet ### Network training - Network needs loop over DenseLeg components - Training routine updates for `combinations` ### Mask - General function that takes network and returns mask function - `mask(z)` constraint on the hyper-cube ### Sampler - Take mask, and return samples from constrained hypercube ### Logic 1. dataSet1_inc = gen_dataset(None, None, model, tag) # incomplete (None, z) 2. dataSet1 = complete(dataSet1_inc, model) 3. combinations = {1: [0, 1, 2, 3]} 4. network1 = .. 5. train(network1, dataSet1, combinations = combinations, n_epochs) 6. post1 = get_posteriors(network1) 7. mask1 = get_mask(network1) dataStore = DataStore([dataSet1]) 1. dataSet2 = gen_dataset(dataStore, mask1, model, tag) 2. combinations = {1: [0, 1, 2, 3]} 3. network2 = .. 4. train(network2, dataSet2, combinations = combinations) 5. post2 = get_posteriors(network2) 6. mask2 = get_mask(network2) # Discussion with Gilles Louppe ### What "is" SWYFT, how do we convey what it offers? - the technique for the marginals - the particular method of shrinking the support via the predictions from the trained model, - Sample reuse motivated by nested sampling (connected with pt. 2), this removes the degrees of freedom from what constitutes a "hit." ### Other meeting take-aways: - All sequential methods suffer from many reruns. They have tried to solve it but eliminating the sequential aspect. We are trying to solve it by sample reuse. - Louppe is working on a project where the prediction of the marginals is amortized, i.e. NN learns how to generate marginals based off of what the user gives it. He explained that this has problems because the marginals need to be consistent. I.e. \int p(x_1, x_2) dx_2 = \int p(x_1, x_3) dx_3. We also have this problem, but we could train for them specifically to match. - Caching of data suffers from high d.o.f. of what constitutes a "hit." We somewhat answer this by motivating from nested sampling. - "What we do with Yuri is not applicable to big parameter spaces." - not sure what this meant, curious about what was meant. - "Other methods have reweighting or resampling, nested approach is new" - The algorithm is what we publish (except for JMLR ML open source software section) - We demonstrate that our method is good by making it simulation efficient. It must also be accurate!! How to measure convergence is "always a problem" but he liked the heuristic of MCdropout