October 5, 2023 Brainstorm Outline # Uncertainty-Informed Predictions of Psychophysiological ("Mind-Body") State My goal for this brainstorm is for us to collaboratively identify exciting, yet reasonable (based on your Stat/RL experience), postdoctoral research directions to explore. More tangibly, an immediate priority for me is a planned NIH K99/R00 application (deadline Feb. 12, 2024). Although it is very rare to win a K99 during the first year of a postdoc, I see this initial submission as a golden opportunity for me to map out and really think through the core directions I will pursue during my time here. The brainstorm will accordingly start at a higher level to motivate, provide background, and map out a few overarching aims for my proposed research. We will then dive into ideas related to my proposed first aim to discuss mid-level details. ### Motivation and Background ![](https://hackmd.io/_uploads/SyBhXGog6.png) Numerous mental and physical illnesses involve episodic occurrences that can degrade quality of life. These episodes often take place outside of the clinic, away from health professionals that would otherwise be able to "sense and react" to the affected individual's symptoms. JITAIs and other envisioned closed-loop systems have the potential to mitigate symptoms during everyday life. However, the methods needed to map raw sensor/survey measurements to estimates of an individual's psychophysiological (mind-body) state are limited. The strategies - studied in this lab - used to intervene according to changes in an individual's state also need advancement (you could also act proactively to change the individual's state in some desirable way). ### Relevant PhD Research During my PhD, I focused on the estimation side of the "closed loop" (follow the arrows in the above figure for a sense of why "sense and react" systems are often referred to as closed-loop systems). Namely, I designed methods to map physiological signals measured from various endpoints of the autonomic nervous system (e.g., cardiovascular and respiratory systems) to changes in someone's acute (i.e., short-term rather than chronic) stress state. To then evaluate a potential stress-reducing intervention ("tVNS") in patients with posttraumatic stress disorder, I applied input-output modeling techniques to characterize the effects of traumatic memory-induced symptoms and tVNS on an individual's stress state. One of the key findings was to demonstrate that although trauma recall symptoms have stronger and faster effects on stress, tVNS counteracts those effects within tens of seconds, making it a potentially viable closed-loop intervention to reduce stress *during* trauma recall. There is still work left to be done to actually study whether closed-loop tVNS would work or not. ### (*Need Feedback and Ideas*) Proposed Research Aims for K99/R00 1. Estimate changes in physiological stress and the uncertainty in these estimates using methods conducive to reinforcement learning for JITAIs 2. Fuse passively and continuously estimated physiological stress state with sparsely observed psychological stress survey data, without treating survey as ground truth 3. Design and validate a RL algorithm that balances tradeoff between added confidence in stress estimate and user burden when requesting survey responses (pilot data for validation can be collected using R00 funding) ### (*Need Feedback and Ideas*) Designing General Value Functions to Estimate Stress State and State Estimation Confidence in RL Settings The way we estimate changes in stress and feed the information back to the closed-loop intervention should be conducive to the controller/RL agent in charge. [General value functions (GVFs)](https://dl.acm.org/doi/10.5555/2031678.2031726) are an appealing framework for predictive representation of various quantities in reinforcement learning settings. Unlike alternative predictive representations such as predictive state representations (PSRs) that are defined for a sequence of state-agnostic actions (open-loop), GVFs can be defined for policies with state-dependent actions, i.e., for *closed-loop* policies. What differentiates GVFs from the value functions you all know and love are flexibility in the discount factor and the expected reward they try to estimate. The "discount factor" can now be a sequence of multiplicative factors $\{\gamma_{t+i}\}_{i=1}^{N}$, rather than a constant $\gamma$. This sequence can be state and action-dependent (i.e., $\gamma_{t+1} \equiv \gamma(A_t, S_t, S_{t+1})$). The signal, called a "cumulant," $C_t$, does not need to be the reward, $R_t$. Put precisely, the return for a GVF is $G_t =\sum_{i=1}^{\infty}{[(\prod_{j=1}^{i-1}{\gamma_{t+j}})C_{t+i}}]$ instead of $G_t =\sum_{i=1}^{\infty}{\gamma^{i-1} R_{t+i}}$ #### Key Motivator Present GVF learning and design methods may be suboptimal for our problem setting.