# Writing - OORL - Attention version
## Note: Symbols (04/2021)
- Environment
- $\mathbb{L}$ = object library
- $N = |\mathbb{L}|$ = number of objects in the library
- $K (2 \le K \le N)$ = number of objects on each scene (fixed for all scenes)
- Model (GNN + self-attention)
- $s_t \in \mathbb{R}^{50 \times 50}$ = state image input at time t
- $m_t \in \mathbb{R}^{K \times 5 \times 5}$ = object masks / feature maps (for $K$ objects) (assuming using $10\times 10$ filter)
- $m_k \in \mathbb{R}^{5 \times 5}$ is the feature map of object $k \in [K]$
- **(We are still deciding if the mask should have $K$ or $N$ objects.)**
- $z_t \in \mathbb{R}^{K \times D}$, where $D$ is the feature dimension for each object
- $z_k \in \mathbb{R}^{D}$ is the embedding for each object
- $\mathbf{Q}, \mathbf{K}, \mathbf{V}$ = query, key, and value matrices in self-attention
- Element-wise example: $\text{key}_k = \text{MLP}_{\mathbf{K}}(m_k) \in \mathbb{R}^N$ (or $\mathbf{k}_k$)
- Every key or query is $N$-dimensional
- Conceptually, it's useful to think of $N$ keys and values for $N$ objects available, but in the implementation, we don't really care about the embeddings (locations) of invisible objects (i.e. other unchosen objects in the library).
- $\mathbf{Q} \in \mathbb{R}^{K \times N}$ has $K$ visible objects of $N$-dimensional query embeddings
- **(For key and value, we are still deciding the dimension for $K$ or $N$ objects.)**
- $\mathbf{K} \in \mathbb{R}^{N \times N}$ may have all $N$ objects of $N$-dimensional key embeddings
- $\mathbf{V} \in \mathbb{R}^{N \times D}$ may have all $N$ objects of $D$-dimensional value embeddings (the same dimension as $z_k$)
- $\text{index}_i, i \in [N]$
- $\text{key}_n, \text{value}_n, n \in [N]$
## Self-attention
- Background / Current case
- We aim to study compositional generalization, which requires the policy mapping $\pi: \mathcal{S} \rightarrow \Delta(\mathcal{A})$ to be equivariance w.r.t. object replacement.
- Thus, in order to achieve equivariant policies, we first study the equivariance property of (deterministic) transition model $T: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}$.
- For object-oriented environments, graph neural networks (GNNs) are commonly used to model the object-factorized dynamics by considering objects as graph nodes and objects' interactions as edges connecting objects.
- It is known that fully-connected GNNs (i.e. assuming all objects are interacting) are equivariance w.r.t. the labeling of nodes.
- Since we have $K$ objects on each scene, a object-factorized GNN is equivariant w.r.t. permutation symmetry group $S_K$.
- Motivation
- Although a GNN, which is factorized by objects, is able to be permutation equivariant, it may fail to be equivariant to object replacement symmetry $S_K \times S_N$.
- Motivating results
- We show the illustrative results
- Illustrative Examples
- (my object examples)
- Idea - Self-attention version
- .
- Implementation
- Examples