# SEAGULL-Egocentric Perception [TOC] ## Panoptic Segmentation ### Architecture [Mask2Former-Swin-Base-IMG21K](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask2former) Input: - RGB image with size 900x900 Output: - Semantic segmentation - Object detection result, each with bbox/mask/confidence score ### Results ## State Estimator ### Architecture Preprocess: - Ignore Img frames < 1000 pixels - Enlarge BBox by 30% for each sideto enclude more context Input: - Object centric image - Class number for the object Model: - Object cat => Cat Embedding 8 dim - Img Feat => Clip ViT-B/32 => Proj => Img Embbeding, 128 dim - Two Embeddings (128 + 8) dim => MLP => 128 dom => 3 - For each attribute, sigmoid - loss is only calculated on attributes that actually have a GT Stats: - Metric: - Acc - Recall - F1 - Cat: - Per attributes python Ultimate_Evaluator.py --model_path /scratch/chaijy_root/chaijy2/jiayipan/outputs/model-mlm32-all-weights-open/AOA_Checkpoints/checkpoint_step270000.pth --log_path /scratch/chaijy_root/chaijy2/jiayipan/outputs/model-mlm32-all-weights-open/AOA_Results/checkpoint_step270000.pth ### Results #### Val-seen ``` ====isDirty==== TP: 16667, TN: 19273, FP: 1122, FN: 439 Precision: 0.937 Recall: 0.974 Accuracy: 0.958 Balanced Accuracy: 0.960 F1: 0.955 ====isFilledWithWater==== TP: 15431, TN: 13763, FP: 4133, FN: 1972 Precision: 0.789 Recall: 0.887 Accuracy: 0.827 Balanced Accuracy: 0.828 F1: 0.835 ====isToggled==== TP: 9175, TN: 17014, FP: 610, FN: 7205 Precision: 0.938 Recall: 0.560 Accuracy: 0.770 Balanced Accuracy: 0.763 F1: 0.701 ``` #### Val-unseen ``` ====isDirty==== TP: 48546, TN: 63073, FP: 2400, FN: 1657 Precision: 0.953 Recall: 0.967 Accuracy: 0.965 Balanced Accuracy: 0.965 F1: 0.960 ====isFilledWithWater==== TP: 50297, TN: 43598, FP: 8792, FN: 6210 Precision: 0.851 Recall: 0.890 Accuracy: 0.862 Balanced Accuracy: 0.861 F1: 0.870 ====isToggled==== TP: 30965, TN: 60583, FP: 3093, FN: 27988 Precision: 0.909 Recall: 0.525 Accuracy: 0.747 Balanced Accuracy: 0.738 F1: 0.666 ``` ## Depth Estimator from [HLSM](https://github.com/valtsblukis/hlsm)