[ARCHIVED - PRIOR WAS TOO STRONG] PCS For RL Full Paper Initial Experiments

# [ARCHIVED - PRIOR WAS TOO STRONG] PCS For RL Full Paper Initial Experiments $$p_{\text{data}} \neq p_{\theta}$$  ## Cluster By Homogenous Users ### Experiment 3: Average Reward and Lower 25th Percentile Ideas for fixes: 1. increase the cluster size (idea: more data with the same noise level will do better) 2. consider for one user's trajectory and just add more data. one user with 5 * T data compared to one cluster with the same 5 users. (sanity check, can be done with the metrics collab) In the previous experiment, we noticed that clustering, $k = 5$ did not do as well as learning one model per user $k = 1$ for both the average reward and average 25th percentile reward. One reason could be because our users are just too heterogeneous for clustering to be effective. We consider comparing cluster size $k = 1$ with cluster size $k = 5$ with "homogenous users" per cluster. For clustering of size $k = 5$, we first randomly draw 15 users from the 32 ROBAS 2 users. And then for each of the 15 clusters, we repet that user 5 times to make a cluster (75 users in total but each of the 15 users are repeated 5 times in each cluster). ### Results: $k = 1$ and $k = 5$ does the same. Turns out $k = 1$ and $k = 5$ have similar results. This makes sense because we have the same users per cluster, so we should expect clustering $k = 1$ and $k = 5$ to preform the same. | | Average Rewards | Average Rewards| 25th Percentile Rewards| 25th Percentile Rewards| | :--- | :----: | :----: | :----: | :----: | | **Rl Algorithms** | bayesian lin reg k = 1 | bayesian lin reg k = 5 | bayesian lin reg k = 1 | bayesian lin reg k = 5| |**Environment Variant** |||| | stationary user effect | 97.484 (3.805) | 95.296 (3.376)| 69.634 (9.027) |66.110 (8.569) | | stationary user context effect |90.616 (3.790)|88.983 (3.116)|69.366 (8.929)| 70.312 (8.153)| | stationary population effect |86.832 (4.897)|84.512 (4.295)|62.442 (9.358)| 61.994 (8.781)| | non-stationary user effect |107.957 (3.241)|107.356 (3.247)|70.051 (4.379)|71.399 (3.646)| | non-stationary user context effect |92.320 (7.983)|92.265 (8.061)|63.661 (6.813)|62.153 (7.205)| | non-stationary population effect |96.214 (7.112)|95.397 (7.152)|70.526 (5.295)|69.175 (4.638)| ## Experiment 4: Reward Plots Plotted average rewards across trajectories, across users, and then across trials. ### Results: Average Reward (Average Across trajectory, Average Across Users) $k = 1$ still beats $k = 5$ Stationary Env | Non-Stationary Env :-------------------------:|:-------------------------: ![](https://i.imgur.com/asHXXGl.png)| ![](https://i.imgur.com/yFBHdxM.png)| ![](https://i.imgur.com/KMM65tm.png)| ![](https://i.imgur.com/NntPEIG.png)| ![](https://i.imgur.com/Wc9JxvV.png)| ![](https://i.imgur.com/C12fg9R.png) | ### Results: Sum of Rewards (Sum Of trajectory, Average Across Users) $k = 1$ still beats $k = 5$ Stationary Env | Non-Stationary Env :-------------------------:|:-------------------------: ![](https://i.imgur.com/23ylYdk.png)| ![](https://i.imgur.com/O8Fshrw.png)| ![](https://i.imgur.com/KH8VjGw.png)| ![](https://i.imgur.com/a8HZjjM.png)| |![](https://i.imgur.com/497GfDy.png)|![](https://i.imgur.com/MHiFtml.png)| ## Experiment 5: Regret Plots Plotted average rewards across trajectories, across users, and then across trials. ### Results: Average Across Trials, Average Across Users, Average Across Trials $k = 5$ mean beats $k = 1$ mean Stationary Env | Non-Stationary Env :-------------------------:|:-------------------------: ![user_eff_stat](https://i.imgur.com/AYxUYJs.png)| ![user_eff_non_stat](https://i.imgur.com/UgieSEt.png) | ![pop_eff_stat](https://i.imgur.com/drmf01U.png)| ![pop_eff_non_stat](https://i.imgur.com/T4xLXG9.png)| | ![cont_user_eff_stat](https://i.imgur.com/LI4QEY7.png)|![cont_user_eff_non_stat](https://i.imgur.com/1v2XbgS.png) | ### Results: Average Across Users, Average Across Trials Stationary Env | Non-Stationary Env :-------------------------:|:-------------------------: ![](https://i.imgur.com/XjUGxiX.png)| ![](https://i.imgur.com/4Aan7WT.png)| ![](https://i.imgur.com/Cmu406N.png)| ![](https://i.imgur.com/KT3IGCj.png)| | ![](https://i.imgur.com/xCZRPBd.png)|![](https://i.imgur.com/Sq1SZ2x.png) |  75*5 users  -->