# SEAGULL-Egocentric Perception
[TOC]
## Panoptic Segmentation
### Architecture
[Mask2Former-Swin-Base-IMG21K](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask2former)
Input:
- RGB image with size 900x900
Output:
- Semantic segmentation
- Object detection result, each with bbox/mask/confidence score
### Results
## State Estimator
### Architecture
Preprocess:
- Ignore Img frames < 1000 pixels
- Enlarge BBox by 30% for each sideto enclude more context
Input:
- Object centric image
- Class number for the object
Model:
- Object cat => Cat Embedding 8 dim
- Img Feat => Clip ViT-B/32 => Proj => Img Embbeding, 128 dim
- Two Embeddings (128 + 8) dim => MLP => 128 dom => 3
- For each attribute, sigmoid
- loss is only calculated on attributes that actually have a GT
Stats:
- Metric:
- Acc
- Recall
- F1
- Cat:
- Per attributes
python Ultimate_Evaluator.py --model_path /scratch/chaijy_root/chaijy2/jiayipan/outputs/model-mlm32-all-weights-open/AOA_Checkpoints/checkpoint_step270000.pth --log_path /scratch/chaijy_root/chaijy2/jiayipan/outputs/model-mlm32-all-weights-open/AOA_Results/checkpoint_step270000.pth
### Results
#### Val-seen
```
====isDirty====
TP: 16667, TN: 19273, FP: 1122, FN: 439
Precision: 0.937
Recall: 0.974
Accuracy: 0.958
Balanced Accuracy: 0.960
F1: 0.955
====isFilledWithWater====
TP: 15431, TN: 13763, FP: 4133, FN: 1972
Precision: 0.789
Recall: 0.887
Accuracy: 0.827
Balanced Accuracy: 0.828
F1: 0.835
====isToggled====
TP: 9175, TN: 17014, FP: 610, FN: 7205
Precision: 0.938
Recall: 0.560
Accuracy: 0.770
Balanced Accuracy: 0.763
F1: 0.701
```
#### Val-unseen
```
====isDirty====
TP: 48546, TN: 63073, FP: 2400, FN: 1657
Precision: 0.953
Recall: 0.967
Accuracy: 0.965
Balanced Accuracy: 0.965
F1: 0.960
====isFilledWithWater====
TP: 50297, TN: 43598, FP: 8792, FN: 6210
Precision: 0.851
Recall: 0.890
Accuracy: 0.862
Balanced Accuracy: 0.861
F1: 0.870
====isToggled====
TP: 30965, TN: 60583, FP: 3093, FN: 27988
Precision: 0.909
Recall: 0.525
Accuracy: 0.747
Balanced Accuracy: 0.738
F1: 0.666
```
## Depth Estimator
from [HLSM](https://github.com/valtsblukis/hlsm)