Successor Feature Sets with Options

# Successor Feature Sets with Options ## 11/11/2020 *********************** * Think about the backup fully * Think about the how to compute this efficiently * High-level idea: * $B_i^{\pi_k}B_i^{\pi}$ takes us from root to $i$ - leaf to $j$ leaf * Grafting our old trees onto our new trees Suppose we have leaves indexed by $i\in \{1,\ldots,k\}$ for the initial policy tree $\pi$ Each leaf has a matrix $B_i^\pi$ Suppose we graft policies $\pi_i$ onto leaves $i$ Suppose $\pi_i$ has leaves indexed by $(i,j)$ for $j\in\{1,\ldots,k_i\}$ Each leaf has a matrix $B^{\pi_i}_j$ Then the overall tree will have $\sum_{i=1}^k k_i$ leaves, indexed by $(i, j)$ with $i\in\{1,\ldots,k\}$ and $j\in\{1,\ldots, k_i\}$ Associated with leaf $(i,j)$ is matrix $B^{\pi_i}_jB_i^\pi$ Base cases: a one step policy tree with root action $a$ has one leaf per observation with $B_o = \gamma T_{ao}$; an empty policy tree has a single leaf with $B=I$. $A^{\pi} = A^{\pi'} + \sum_{i=1}^k A^{\pi_k} B_i^{\pi_k}B_i^{\pi}$ <span style="color:grey"> ## 11/11/2020 *********************** * TODO * Write down the recusion $B_i^{\pi'} = \gamma^{t_i} \prod_{s=1}^{t_i} T_{a^i_so^i_s}$ * Write down the inefficient backup equation * <s> Clean up the proof </s> ## 10/21/2020 *********************** Base case: (empty tree is linear) $\phi^{\pi}(q) = \mathbb E_{\pi}[ \vec{0}]$ (vacuous truth) Base case: (tree with one node is linear) $\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) ]$ $\phi^{\pi}(q) = F_{a(\pi)} q$ Inductive case: Assume we are given a tree $\pi$, and we have labeled subtrees $\pi_1, \pi_2, \ldots, \pi_k$ within $\pi$. Assume that none of the $\pi_i$ contains any other $\pi_j$. Write $\pi'$ for the result of removing all of the subtrees $\pi_i$ from $\pi$. Assume that neither $\pi'$ nor any of $\pi_1,\ldots,\pi_k$ are equal to the entire original tree $\pi$. By induction, we know that the feature functions $\phi^{\pi_1}(q), \ldots, \phi^{\pi_k}(q)$ are linear in $q$, as is the feature function $\phi^{\pi'}(q)$. For each of the subtrees $\pi_i$, write $a^i_1, o^i_1, a^i_2, o^i_2, \ldots, a^i_{t_i}, o^i_{t_i}$ for the path from the root of $\pi$ down to $\pi_i$. Here $t_i$ is the depth of the root of $\pi_i$ within $\pi$. And, write $q'_i$ for the predictive state if we reach $\pi_i$. $\phi^{\pi}(q_t) = \phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i} \left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') P(o^i_s \mid q_{s},\text{do}~a^i_s) \right)\phi^{\pi_k}(q'_i)$ Since policy trees are determinstic, $\left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) = 1$ $\phi^{\pi}(q_t) = \phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i} P(o^i_s \mid q_{s},\text{do}~a^i_s) \phi^{\pi_k}(q'_i)$ We can compute $q'_{i} = \left[ \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t \right] / \left[ \prod_{s=1}^{t_i}P(o^i_s \mid q_{s},\text{do}~a^i_s)\right]$ By the inductive hypothesis $\phi^{\pi_1}(q) = A^{\pi_1}q, \ldots, \phi^{\pi_k} =A^{\pi_k}q$ and $\phi^{\pi'}(q) = A^{\pi'}q$. Substituting these expressions in and substituting for the $q_{t_i}$ expression above. $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} P(o^i_s \mid q_{s},\text{do}~a^i_s) A^{\pi_k}q'_i$ $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} A^{\pi_k}q'_i$ $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$ $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$ We can observe that the RHS is a linear function of $q_t$, which completes our inductive proof of linearity. Becasue of linearity, there exists a matrix $A^{\pi}$ such that $\phi^{\pi}(q) = A^{\pi}(q)$. With this notation $A^{\pi}{q_t} = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$ Because the above equation must hold for any predictive state q_t, we get $A^{\pi} = A^{\pi'} + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right)$ ## 10/20/2020 *********************** Base case: (empty tree is linear) $\phi^{\pi}(q) = \mathbb E_{\pi}[ \vec{0} + \gamma \sum_{o} \vec{0}]$ (vacuous truth) Base case: (tree with one node is linear) $\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) + \gamma \sum_{o} \vec{0} ]$ $\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) ]$ $\phi^{\pi}(q) = F_{a(\pi)} q$ Inductive case: Assume we are given a tree $\pi$, and we have labeled subtrees $\pi_1, \pi_2, \ldots, \pi_k$ within $\pi$. Assume that none of the $\pi_i$ contains any other $\pi_j$. Write $\pi'$ for the result of removing all of the subtrees $\pi_i$ from $\pi$. Assume that neither $\pi'$ nor any of $\pi_1,\ldots,\pi_k$ are equal to the entire original tree $\pi$. By induction, we know that the feature functions $\phi^{\pi_1}(q), \ldots, \phi^{\pi_k}(q)$ are linear in $q$, as is the feature function $\phi^{\pi'}(q)$. For each of the subtrees $\pi_i$, write $a^i_1, o^i_1, a^i_2, o^i_2, \ldots, a^i_{t_i}, o^i_{t_i}$ for the path from the root of $\pi$ down to $\pi_i$. Here $t_i$ is the depth of the root of $\pi_i$ within $\pi$. And, write $q'_i$ for the predictive state if we reach $\pi_i$. $\phi^{\pi}(q_t) = \phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i} \left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') P(o^i_s \mid q_{s},\text{do}~a^i_s) \right)\phi^{\pi_k}(q'_i)$ [see note * below] We can compute $q'_{i} = \left[ \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t \right] / \left[ \prod_{s=1}^{t_i}P(o^i_s \mid q_{s},\text{do}~a^i_s)\right]$ [use notation $q'_i$ below] By the inductive hypothesis $\phi^{\pi_1}(q) = A^{\pi_1}q, \ldots, \phi^{\pi_k} =A^{\pi_k}q$ and $\phi^{\pi'}(q) = A^{\pi'}q$. Substituting these expressions in and substituting for the $q_{t_i}$ expression above. $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} \left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') P(o^i_s \mid q_{s},\text{do}~a^i_s) \right) A^{\pi_k}q_{t_i}$ $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} \left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) \left(\prod_{s=1}^{t_i} P(o^i_s \mid q_{s},\text{do}~a^i_s) \right) A^{\pi_k}q_{t_i}$ $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i} \left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$ Since policy trees are determinstic, $\left(\prod_{s=1}^{t_i} P(a^i_s \mid \pi') \right) = 1$ [use this to simplify at * above] $\phi^{\pi}(q_t) = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$ We can observe that the RHS is a linear function of $q_t$, which completes our inductive proof of linearity. Becasue of linearity, there exists a matrix $A^{\pi}$ such that $\phi^{\pi}(q) = A^{\pi}(q)$. With this notation $A^{\pi}{q_t} = A^{\pi'}q_t + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right) q_t$ Because the above equation must hold for any predictive state q_t, we get $A^{\pi} = A^{\pi'} + \sum_{i=1}^k \gamma^{t_i}A^{\pi_k} \left(\prod_{s=1}^{t_i} T_{a^i_so^i_s} \right)$ === so we need to keep track of $B_i^{\pi'} = \gamma^{t_i} \prod_{s=1}^{t_i} T_{a^i_so^i_s}$ for each leaf of $\pi'$ --- ## 10/13/2020 *********************** Base case: (empty tree is linear) $\phi^{\pi}(q) = \mathbb E_{\pi}[ \vec{0} + \gamma \sum_{o} \vec{0}]$ (vacuous truth) Base case: (tree with one node is linear) $\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) + \gamma \sum_{o} \vec{0} ]$ $\phi^{\pi}(q) = \mathbb E_{\pi} [ f(s,a) ]$ $\phi^{\pi}(q) = F_{a(\pi)} q$ Inductive case: Assume we are given a tree $\pi$, and we have labeled subtrees $\pi_1, \pi_2, \ldots, \pi_k$ within $\pi$. Assume that none of the $\pi_i$ contains any other $\pi_j$. Write $\pi'$ for the result of removing all of the subtrees $\pi_i$ from $\pi$. Assume that neither $\pi'$ nor any of $\pi_1,\ldots,\pi_k$ are equal to the entire original tree $\pi$. By induction, we know that the feature functions $\phi^{\pi_1}(q), \ldots, \phi^{\pi_k}(q)$ are linear in $q$, as is the feature function $\phi^{\pi'}(q)$. For each of the subtrees $\pi_i$, write $a^i_1, o^i_1, a^i_2, o^i_2, \ldots, a^i_{t_i}, o^i_{t_i}$ for the path from the root of $\pi$ down to $\pi_i$. Here $t_i$ is the depth of the root of $\pi_i$ within $\pi$. $\phi^{\pi'}(q_t) + \sum_{i=1}^k \gamma^{t_i} P(reach \pi_i)\phi^{\pi_k}(q \text{ if we reach i})$ q if we reach $i$ = $u^{T}\left[\prod_{s=1}^{t_i} T_{a^i_so^i_s}\right] q_t / P($ if we reach $i)$ --- $q_{t+1} = T_{ao} q_t / P(o\mid q_t, \text{do}~a)$ $P(o\mid q_t, \text{do}~a) q_{t+1} = T_{ao} q_t$ $T_{a'o'} P(o\mid q_t, \text{do}~a) q_{t+1} = T_{a'o'} T_{ao} q_t$ $P(o\mid q_t, \text{do}~a) T_{a'o'} q_{t+1} = T_{a'o'} T_{ao} q_t$ $P(o\mid q_t, \text{do}~a) [P(o'\mid q_{t+1}, \text{do}~a') q_{t+2}] = T_{a'o'} T_{ao} q_t$ $P(o\mid q_t, \text{do}~a) P(o'\mid q_{t+1}, \text{do}~a') u^T q_{t+2} = u^T T_{a'o'} T_{ao} q_t$ $P(o\mid q_t, \text{do}~a) P(o'\mid q_{t+1}, \text{do}~a') = u^T T_{a'o'} T_{ao} q_t$ --- $\phi_\psi(q_{t})= \mathbb E_{\pi}[ \phi^{\pi'}(q_{t}) + \gamma \sum_{o} \phi^{\pi(o)}(q_{t+1})]$ $\phi_\psi(q_t)= \mathbb E_{\pi}[ \phi^{\pi(a)}(q_t)] + \gamma \mathbb E_{\pi}[ \sum_{o} \phi^{\pi(o)}(q_{t+1}) ]$ $\phi_\psi(q_t)= P(a | \pi) \phi^{\pi(a)}(q_t) + \gamma P(o\mid q_t, \mathrm{do}~a) \sum_{o} \phi^{\pi(o)}(q_{t+1})$ $\phi_\psi(q_t) = P (a | \pi) \phi^{\pi(a)}(q_t) + \gamma \sum_{o \in k} \phi^{\pi(o)}_{o}(q_t+1) + \gamma \sum_{o \notin k} \phi^{\pi(o)}(q_t+1)$ **[$\sum_{o \notin k} \phi^{\pi(o)}(q_t+1) = \vec{0}$ (vacuous truth)]** $\phi_\psi(q_t) = P (a | \pi) \phi^{\pi(a)}(q_t) + \gamma \sum_{o \in k} \phi^{\pi(o)}_o(q_t+1)$ **[inductive assumption $\phi^{\pi}_{1}, ......, \phi^{\pi}_{k}$ are linear in $q$]** let $\mu q_t = \sum_{o \in k} \phi^{\pi(o)}_o(q_t+1)$ $\phi_\psi(q_t) = P (a | \pi) \phi^{\pi(a)}(q_t) + \gamma \mu q_t$ **[substitute $\phi^{\pi(a)}q_t = A^{\pi(a)} q_t$]** $\phi_\psi(q_t) = P (a | \pi) A^{\pi(a)} q_t + \gamma \mu q_t$ $\phi_\psi(q_t) = \left[ P (a | \pi) A^{\pi(a)} + \gamma \mu \right] q_t$ **[Instead of the induction assumptio I can also substitute $q_t+1 = q_t T_ao /P( o | q_t, do a)$]** ## 10/2/2020 *********************** **Base case: H $<$ t** (essentially im trying to say that there are no expected features after the options ends) $\phi^{\pi}(q_{t}) = \mathbb E_\pi \left[ \vec{0} \right] = \sum_{a} P(a|\pi) * \vec{0} = \vec{0}$ **Base case: H $=$ t** (trying to figure out how to write the immediate feature) $\phi^{\pi}(q_{t}) = \mathbb E_\pi \left[ \phi^{\pi(a_{t},o{t})}(q_{t}) \right]$ $\phi^{\pi}(q_{t}) = \sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ \phi^{\pi(a_{t}, o_{t})}(q_{t}) \right]$ $A^{\pi}(q_{t})=\sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ A^{\pi(a,o)}(q_{t}) \right]$ ***Inductive case: H $>$ t*** $\phi^{\pi(q_{t+1})} = \mathbb E_\pi \left[ \phi^{\pi(a_{t+1},o_{t+1})} (q_{t+1}) + \gamma \phi^{\pi(a_{t+1}, o_{t+1})}(q_{t+2}) \right]$ $\phi^{\pi(q_{t+1})} = \sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ \phi^{\pi(a_{t+1}, o_{t+ 1})}(q_{t+1}) + \gamma \phi^{\pi(a_{t+1}, o_{t+1})}(q_{t+2}) \right]$ $\phi^{\pi(q_{t+1})}=\sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ \phi^{\pi(a_{t+1}, o_{t+ 1})}(q_{t+1}) + \gamma \phi^{\pi(a_{t+1}, o_{t+1})} \frac{T_{ao} q_{t+1}}{ P(o \mid q_{t}, \mathrm{do}~ a)} \right]$ $A^{\pi}(q_{t+1}) =\sum_{a,o} P(a \mid \pi) P(o \mid q_{t}, \mathrm{do}~ a) \left[ A^{\pi(a_{t+ 1}, o_{t+1})}(q_{t+1}) + \gamma A^{\pi(a_{t+1}, o_{t+1})} \frac{T_{ao} q_{t+1}}{ P(o \mid q_{t}, \mathrm{do}~ a)} \right]$ $A^{\pi}(q_{t+1}) = \sum_{a,o} P(a \mid \pi) \left[ P(o \mid q_{t}, \mathrm{do}~ a) + \gamma T_{ao}\right] A^{\pi(a_{t+1}, o_{t+1})}(q_{t+1})$ ## Related work * Predictive State Representations with Options (https://web.eecs.umich.edu/~baveja/Papers/Wolfe_ICML06.pdf) </span>

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.