# Successor Feature Sets with Options
## 11/11/2020 ***********************
* Think about the backup fully
* Think about the how to compute this efficiently
* High-level idea:
* $B_i^{\pi_k}B_i^{\pi}$ takes us from root to $i$ - leaf to $j$ leaf
* Grafting our old trees onto our new trees
Suppose we have leaves indexed by $i\in \{1,\ldots,k\}$ for the initial policy tree $\pi$
Each leaf has a matrix $B_i^\pi$
Suppose we graft policies $\pi_i$ onto leaves $i$
Suppose $\pi_i$ has leaves indexed by $(i,j)$ for $j\in\{1,\ldots,k_i\}$
Each leaf has a matrix $B^{\pi_i}_j$
Then the overall tree will have $\sum_{i=1}^k k_i$ leaves, indexed by $(i, j)$ with $i\in\{1,\ldots,k\}$ and $j\in\{1,\ldots, k_i\}$
Associated with leaf $(i,j)$ is matrix $B^{\pi_i}_jB_i^\pi$
Base cases: a one step policy tree with root action $a$ has one leaf per observation with $B_o = \gamma T_{ao}$; an empty policy tree has a single leaf with $B=I$.
$A^{\pi} = A^{\pi'} + \sum_{i=1}^k A^{\pi_k} B_i^{\pi_k}B_i^{\pi}$
<span style="color:grey">
## 11/11/2020 ***********************
* TODO
* Write down the recusion $B_i^{\pi'} = \gamma^{t_i} \prod_{s=1}^{t_i} T_{a^i_so^i_s}$
* Write down the inefficient backup equation
* <s> Clean up the proof </s>
## 10/21/2020 ***********************
Base case: (empty tree is linear)
$\phi^{\pi}(q) = \mathbb E_{\pi}[ \vec{0}]$
(vacuous truth)
Base case: (tree with one node is linear)
$\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) ]$
$\phi^{\pi}(q) = F_{a(\pi)} q$
Inductive case: Assume we are given a tree $\pi$, and we have labeled subtrees $\pi_1, \pi_2, \ldots, \pi_k$ within $\pi$. Assume that none of the $\pi_i$ contains any other $\pi_j$. Write $\pi'$ for the result of removing all of the subtrees $\pi_i$ from $\pi$. Assume that neither $\pi'$ nor any of $\pi_1,\ldots,\pi_k$ are equal to the entire original tree $\pi$.
By induction, we know that the feature functions $\phi^{\pi_1}(q), \ldots, \phi^{\pi_k}(q)$ are linear in $q$, as is the feature function $\phi^{\pi'}(q)$.
For each of the subtrees $\pi_i$, write $a^i_1, o^i_1, a^i_2, o^i_2, \ldots, a^i_{t_i}, o^i_{t_i}$ for the path from the root of $\pi$ down to $\pi_i$. Here $t_i$ is the depth of the root of $\pi_i$ within $\pi$. And, write $q'_i$ for the predictive state if we reach $\pi_i$.
$\phi^{\pi}(q_t) = \phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i}
\left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') P(o^i_s \mid q_{s},\text{do}~a^i_s) \right)\phi^{\pi_k}(q'_i)$
Since policy trees are determinstic, $\left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) = 1$
$\phi^{\pi}(q_t) = \phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i} P(o^i_s \mid q_{s},\text{do}~a^i_s) \phi^{\pi_k}(q'_i)$
We can compute $q'_{i} = \left[ \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t \right] / \left[ \prod_{s=1}^{t_i}P(o^i_s \mid q_{s},\text{do}~a^i_s)\right]$
By the inductive hypothesis $\phi^{\pi_1}(q) = A^{\pi_1}q, \ldots, \phi^{\pi_k} =A^{\pi_k}q$ and $\phi^{\pi'}(q) = A^{\pi'}q$. Substituting these expressions in and substituting for the $q_{t_i}$ expression above.
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}
P(o^i_s \mid q_{s},\text{do}~a^i_s) A^{\pi_k}q'_i$
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} A^{\pi_k}q'_i$
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$
We can observe that the RHS is a linear function of $q_t$, which completes our inductive proof of linearity.
Becasue of linearity, there exists a matrix $A^{\pi}$ such that $\phi^{\pi}(q) = A^{\pi}(q)$. With this notation
$A^{\pi}{q_t} = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$
Because the above equation must hold for any predictive state q_t, we get
$A^{\pi} = A^{\pi'} + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right)$
## 10/20/2020 ***********************
Base case: (empty tree is linear)
$\phi^{\pi}(q) = \mathbb E_{\pi}[ \vec{0} + \gamma \sum_{o} \vec{0}]$
(vacuous truth)
Base case: (tree with one node is linear)
$\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) + \gamma \sum_{o} \vec{0} ]$
$\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) ]$
$\phi^{\pi}(q) = F_{a(\pi)} q$
Inductive case: Assume we are given a tree $\pi$, and we have labeled subtrees $\pi_1, \pi_2, \ldots, \pi_k$ within $\pi$. Assume that none of the $\pi_i$ contains any other $\pi_j$. Write $\pi'$ for the result of removing all of the subtrees $\pi_i$ from $\pi$. Assume that neither $\pi'$ nor any of $\pi_1,\ldots,\pi_k$ are equal to the entire original tree $\pi$.
By induction, we know that the feature functions $\phi^{\pi_1}(q), \ldots, \phi^{\pi_k}(q)$ are linear in $q$, as is the feature function $\phi^{\pi'}(q)$.
For each of the subtrees $\pi_i$, write $a^i_1, o^i_1, a^i_2, o^i_2, \ldots, a^i_{t_i}, o^i_{t_i}$ for the path from the root of $\pi$ down to $\pi_i$. Here $t_i$ is the depth of the root of $\pi_i$ within $\pi$. And, write $q'_i$ for the predictive state if we reach $\pi_i$.
$\phi^{\pi}(q_t) = \phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i}
\left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') P(o^i_s \mid q_{s},\text{do}~a^i_s) \right)\phi^{\pi_k}(q'_i)$
[see note * below]
We can compute $q'_{i} = \left[ \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t \right] / \left[ \prod_{s=1}^{t_i}P(o^i_s \mid q_{s},\text{do}~a^i_s)\right]$
[use notation $q'_i$ below]
By the inductive hypothesis $\phi^{\pi_1}(q) = A^{\pi_1}q, \ldots, \phi^{\pi_k} =A^{\pi_k}q$ and $\phi^{\pi'}(q) = A^{\pi'}q$. Substituting these expressions in and substituting for the $q_{t_i}$ expression above.
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}
\left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') P(o^i_s \mid q_{s},\text{do}~a^i_s) \right) A^{\pi_k}q_{t_i}$
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} \left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) \left(\prod_{s=1}^{t_i} P(o^i_s \mid q_{s},\text{do}~a^i_s) \right) A^{\pi_k}q_{t_i}$
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} \left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$
Since policy trees are determinstic, $\left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) = 1$
[use this to simplify at * above]
$\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$
We can observe that the RHS is a linear function of $q_t$, which completes our inductive proof of linearity.
Becasue of linearity, there exists a matrix $A^{\pi}$ such that $\phi^{\pi}(q) = A^{\pi}(q)$. With this notation
$A^{\pi}{q_t} = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$
Because the above equation must hold for any predictive state q_t, we get
$A^{\pi} = A^{\pi'} + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right)$
===
so we need to keep track of $B_i^{\pi'} = \gamma^{t_i} \prod_{s=1}^{t_i} T_{a^i_so^i_s}$ for each leaf of $\pi'$
---
## 10/13/2020 ***********************
Base case: (empty tree is linear)
$\phi^{\pi}(q) = \mathbb E_{\pi}[ \vec{0} + \gamma \sum_{o} \vec{0}]$
(vacuous truth)
Base case: (tree with one node is linear)
$\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) + \gamma \sum_{o} \vec{0} ]$
$\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) ]$
$\phi^{\pi}(q) = F_{a(\pi)} q$
Inductive case: Assume we are given a tree $\pi$, and we have labeled subtrees $\pi_1, \pi_2, \ldots, \pi_k$ within $\pi$. Assume that none of the $\pi_i$ contains any other $\pi_j$. Write $\pi'$ for the result of removing all of the subtrees $\pi_i$ from $\pi$. Assume that neither $\pi'$ nor any of $\pi_1,\ldots,\pi_k$ are equal to the entire original tree $\pi$.
By induction, we know that the feature functions $\phi^{\pi_1}(q), \ldots, \phi^{\pi_k}(q)$ are linear in $q$, as is the feature function $\phi^{\pi'}(q)$.
For each of the subtrees $\pi_i$, write $a^i_1, o^i_1, a^i_2, o^i_2, \ldots, a^i_{t_i}, o^i_{t_i}$ for the path from the root of $\pi$ down to $\pi_i$. Here $t_i$ is the depth of the root of $\pi_i$ within $\pi$.
$\phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i} P(reach \pi_i)\phi^{\pi_k}(q \text{ if we reach i})$
q if we reach $i$ = $u^{T}\left[\prod_{s=1}^{t_i} T_{a^i_so^i_s}\right] q_t / P($ if we reach $i)$
---
$q_{t+1} = T_{ao} q_t / P(o\mid q_t, \text{do}~a)$
$P(o\mid q_t, \text{do}~a) q_{t+1} = T_{ao} q_t$
$T_{a'o'} P(o\mid q_t, \text{do}~a) q_{t+1} = T_{a'o'} T_{ao} q_t$
$P(o\mid q_t, \text{do}~a) T_{a'o'} q_{t+1} = T_{a'o'} T_{ao} q_t$
$P(o\mid q_t, \text{do}~a) [P(o'\mid q_{t+1}, \text{do}~a') q_{t+2}] = T_{a'o'} T_{ao} q_t$
$P(o\mid q_t, \text{do}~a) P(o'\mid q_{t+1}, \text{do}~a') u^T q_{t+2} = u^T T_{a'o'} T_{ao} q_t$
$P(o\mid q_t, \text{do}~a) P(o'\mid q_{t+1}, \text{do}~a') = u^T T_{a'o'} T_{ao} q_t$
---
$\phi_\psi(q_{t})= \mathbb E_{\pi}[ \phi^{\pi'}(q_{t}) + \gamma \sum_{o} \phi^{\pi(o)}(q_{t+1})]$
$\phi_\psi(q_t)= \mathbb E_{\pi}[ \phi^{\pi(a)}(q_t)] + \gamma \mathbb E_{\pi}[ \sum_{o} \phi^{\pi(o)}(q_{t+1}) ]$
$\phi_\psi(q_t)= P(a | \pi) \phi^{\pi(a)}(q_t) + \gamma P(o\mid q_t, \mathrm{do}~a) \sum_{o} \phi^{\pi(o)}(q_{t+1})$
$\phi_\psi(q_t) = P (a | \pi) \phi^{\pi(a)}(q_t) + \gamma \sum_{o \in k} \phi^{\pi(o)}_{o}(q_t+1) + \gamma \sum_{o \notin k} \phi^{\pi(o)}(q_t+1)$
**[$\sum_{o \notin k} \phi^{\pi(o)}(q_t+1) = \vec{0}$ (vacuous truth)]**
$\phi_\psi(q_t) = P (a | \pi) \phi^{\pi(a)}(q_t) + \gamma \sum_{o \in k} \phi^{\pi(o)}_o(q_t+1)$
**[inductive assumption $\phi^{\pi}_{1}, ......, \phi^{\pi}_{k}$ are linear in $q$]**
let $\mu q_t = \sum_{o \in k} \phi^{\pi(o)}_o(q_t+1)$
$\phi_\psi(q_t) = P (a | \pi) \phi^{\pi(a)}(q_t) + \gamma \mu q_t$
**[substitute $\phi^{\pi(a)}q_t = A^{\pi(a)} q_t$]**
$\phi_\psi(q_t) = P (a | \pi) A^{\pi(a)} q_t + \gamma \mu q_t$
$\phi_\psi(q_t) = \left[ P (a | \pi) A^{\pi(a)} + \gamma \mu \right] q_t$
**[Instead of the induction assumptio I can also substitute $q_t+1 = q_t T_ao /P( o | q_t, do a)$]**
## 10/2/2020 ***********************
**Base case: H $<$ t** (essentially im trying to say that there are no expected features after the options ends)
$\phi^{\pi}(q_{t}) = \mathbb E_\pi \left[ \vec{0} \right] = \sum_{a} P(a|\pi) * \vec{0} = \vec{0}$
**Base case: H $=$ t** (trying to figure out how to write the immediate feature)
$\phi^{\pi}(q_{t}) = \mathbb E_\pi \left[ \phi^{\pi(a_{t},o{t})}(q_{t}) \right]$
$\phi^{\pi}(q_{t}) = \sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ \phi^{\pi(a_{t}, o_{t})}(q_{t}) \right]$
$A^{\pi}(q_{t})=\sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ A^{\pi(a,o)}(q_{t}) \right]$
***Inductive case: H $>$ t***
$\phi^{\pi(q_{t+1})} = \mathbb E_\pi \left[ \phi^{\pi(a_{t+1},o_{t+1})} (q_{t+1}) + \gamma \phi^{\pi(a_{t+1}, o_{t+1})}(q_{t+2}) \right]$
$\phi^{\pi(q_{t+1})} = \sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ \phi^{\pi(a_{t+1}, o_{t+ 1})}(q_{t+1}) + \gamma \phi^{\pi(a_{t+1}, o_{t+1})}(q_{t+2}) \right]$
$\phi^{\pi(q_{t+1})}=\sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ \phi^{\pi(a_{t+1}, o_{t+ 1})}(q_{t+1}) + \gamma \phi^{\pi(a_{t+1}, o_{t+1})} \frac{T_{ao} q_{t+1}}{ P(o \mid q_{t}, \mathrm{do}~ a)} \right]$
$A^{\pi}(q_{t+1}) =\sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ A^{\pi(a_{t+ 1}, o_{t+1})}(q_{t+1}) + \gamma A^{\pi(a_{t+1}, o_{t+1})} \frac{T_{ao} q_{t+1}}{ P(o \mid q_{t}, \mathrm{do}~ a)} \right]$
$A^{\pi}(q_{t+1}) = \sum_{a,o} P(a \mid \pi) \left[ P(o \mid q_{t}, \mathrm{do}~ a) + \gamma T_{ao}\right] A^{\pi(a_{t+1}, o_{t+1})}(q_{t+1})$
## Related work
* Predictive State Representations with Options (https://web.eecs.umich.edu/~baveja/Papers/Wolfe_ICML06.pdf)
</span>