---
title: F-formation research - UET AI lab
---
**Available data**.
* Upstream task data
* Personality data
* 3D position and pose data
* F-formation labels
**Problems**.
* The amount of data is limited
* Occluded people still have position, leading to the fact that their position may be mapped to wrong people
**Survey**.
* Multitask learning.
* *Possible tasks*. Detection, tracking, human pose estimation, and head pose estimation
* *Ideas*.
* Train upstream tasks then downstream task
* Train upstream tasks and downstream task simultaneously
* Multi-dataset training
* *Model*. Model will output positional encoding, and object embedding, then close object embeddings will correspond to a F-formation group, i.e. KMeans
* Embedding learning.
* *Task-specific embedding*. Train with downstream tasks
* *Task-invariant embedding model*. Self-supervised learning
* *Hybrid model*. Hybrid of two approaches above
* Generative modeling.
**Ideas**.
* To be robust and deployable, we must not use 3D-related data, e.g. 3D position, 3D pose, camera calibration coefficients, etc.
$\to$ All of the given data of SALSA is useless
* Include temporal information via LSTM, attention, or 3D convolution
$\to$ No data for this
* Apply some trick to have "more data"
**Final**.
* *Step 1*. Use generative model to generate Gaussians for pairwise distance and pairwise angular distance
* *Step 2*. For each vertex pair, derive two Gaussians for in-Fformation or not, then compare two probabilities
* *Contrastive learning*. Input triples of (anchor, positive, negative)
* *Input*.
* *Edge*. Distance, angular distance
* *Vertex*.
* *Sampling method*. Use hard sampling
* *Step 3*. Some additional optimization algorithm
* *Loss function for F-formation*.