--- title: F-formation research - UET AI lab --- **Available data**. * Upstream task data * Personality data * 3D position and pose data * F-formation labels **Problems**. * The amount of data is limited * Occluded people still have position, leading to the fact that their position may be mapped to wrong people **Survey**. * Multitask learning. * *Possible tasks*. Detection, tracking, human pose estimation, and head pose estimation * *Ideas*. * Train upstream tasks then downstream task * Train upstream tasks and downstream task simultaneously * Multi-dataset training * *Model*. Model will output positional encoding, and object embedding, then close object embeddings will correspond to a F-formation group, i.e. KMeans * Embedding learning. * *Task-specific embedding*. Train with downstream tasks * *Task-invariant embedding model*. Self-supervised learning * *Hybrid model*. Hybrid of two approaches above * Generative modeling. **Ideas**. * To be robust and deployable, we must not use 3D-related data, e.g. 3D position, 3D pose, camera calibration coefficients, etc. $\to$ All of the given data of SALSA is useless * Include temporal information via LSTM, attention, or 3D convolution $\to$ No data for this * Apply some trick to have "more data" **Final**. * *Step 1*. Use generative model to generate Gaussians for pairwise distance and pairwise angular distance * *Step 2*. For each vertex pair, derive two Gaussians for in-Fformation or not, then compare two probabilities * *Contrastive learning*. Input triples of (anchor, positive, negative) * *Input*. * *Edge*. Distance, angular distance * *Vertex*. * *Sampling method*. Use hard sampling * *Step 3*. Some additional optimization algorithm * *Loss function for F-formation*.