# Oralytics RLDM ## Overview We are trying for RLDM (website [here](https://rldm.org/call-for-abstracts/)) ## Possible Versions of this Paper * Full Technical Paper - CS Folks(UBICOMP, IEEE) * Medical Paper - [AMIA](https://amia.org/), [SBM (~Sept.)](https://www.sbm.org/meetings/2022) (POC: Jesse Lipschitz) ## RLDM Specific Page Budget 1/2 intro 1.5 PCS 1 Case study - just an illustration of how you would set up our simulations for your specific downstream task. We have examples of concerns folks should consider. In the 4 pager, we're not here to tell them which algorithm is the best for their application. 0.5 Graphs 1/2 references ### **Interpreting Figures and Results** * Intuition: with sufficient data per user in a highly heterogenous user environment, then one alg. per user should do the best. Similarly, we would think that if our users are heterogeneous and we have a cluster of k users who are very different from each other. Then we should expect that some users wouldn't benefit (won't get as high of a reward because the algorithm that pools k users is being applied to them but they aren't the average user.) * However, we find that in our experiment although learning one model per user achieves high reward on average across all 75 users, if we look at the lower 25th percentile of users (people who aren't benefitting as much), we see that algorithms that cluster users achieve higher average reward than algorithms that learn one model per user. => learning one model per user is producing variable between user results. * Hypothesis: Bias variance tradeoff, the individual level algorithm is more variable (from one pereson to the next) but less biased. The group level is more biased and has less variability. * there's massive variance across users, we get users that the algorithm doesn't do that well on. It's counterintuitive. * when you learn on one person (k=1 algorithms) did not work well in table 2 is that you might not visit different states a lot, but with clustering, you can leverage other people's data. * We noticed this, we think it might be this, in future work we want to investigate blah * [REGRET GRAPHS] it might show that clustering does better in the beginning on average but as time goes on, clustering is not better on average. * The poor performance of the 0-inflated model could be because we may not be taking enough samples for the posterior sampling scheme. In order to run a simulation in a reasonable amount of time. For future work, we will try different tuning parameters for the approximation scheme. (i.e. running it longer or taking more samples per update time) * for the big paper maybe try reporting median values Working paper link: [here](https://www.overleaf.com/read/wvmydxsdwjvp) ### TODOs/Things I Am Still Working On * Finish running experiments by evaluating RL algorithm candidates in every simulation environment. * Include variants that varies the mean effect size e.g. 0.1, 0.5, etc. * Include variants that reduces the amount of variance of choosing the effect size * Decide how to choose the final algorithm? If there's a conflict, then bring it to the domain experts. * Double check the optimization for fitting the base env model from the ROBAS 2 data. We need to make sure that this is truly beacuse the data is heterogeneous between users and not a numerical instability issue. ### Experiments We want to evaluate a set of RL algorithm candidates against multiple simulation environment variants to see which candidate preforms the best (high average reward, low regret) and runs stably (can run without constant monitoring and tweaking) We have the following RL algorithm candidates axes: 1. Base Model * 0-Inflated Poisson Thompson Sampler * Bayesian Linear Regresion Thompson Sampler 2. Cluster Size * K = 1 (one user per model) * K = 3 * K = 5 * K = EVERYONE! (for real paper) We have the following simulation env. variants 1. Base Model * 0-Inflated Poisson (Vanilla) * 0-Inflated Poisson (Non-Stationarity) 2. Effect Size * User specific effect size (small) * User specific effect size (large) * Context aware and user specific effect size (small) * Context aware and user specific effect size (large) * [for the real paper] Same effect size for all users (population level effect size)