>[!Caution]
>***Reached the character limit, part two link down below
>https://hackmd.io/@NYTCEE/Bks7KxioWl***
# Note
Where to find different policies?
``
/mnt/train-data-1-hdd/naomi/lehome-challenge/.venv/lib/python3.11/site-packages/lerobot/policies
``
Where to find task descriptions?
``
/mnt/train-data-1-hdd/naomi/lehome-challenge/scripts/utils/parser.py
``
Where to modify recording videos?
``
/mnt/train-data-1-hdd/naomi/lehome-challenge/scripts/utils/eval_utils.py
``
``
/mnt/train-data-1-hdd/naomi/lehome-challenge/scripts/utils/evaluation.py
``
```bash
def save_videos_from_observations(
all_episode_frames: Dict[str, List[np.ndarray]],
save_dir: str,
episode_idx: int,
success: torch.Tensor,
fps: int = 30,
) -> None:
"""Save captured frames as MP4 videos."""
if success.item():
target_dir = os.path.join(save_dir, "success")
else:
target_dir = os.path.join(save_dir, "failure")
os.makedirs(target_dir, exist_ok=True)
for key, frames in all_episode_frames.items():
if len(frames) == 0:
continue
h, w, c = frames[0].shape
out_path = os.path.join(
target_dir, f"episode{episode_idx}_{key.replace('.', '_')}.mp4"
)
writer = cv2.VideoWriter(out_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
for frame in frames:
frame_bgr = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
writer.write(frame_bgr)
writer.release()
logger.info(f"Saved video: {out_path}")
```
Others
``
/mnt/train-data-1-hdd/naomi/lehome-challenge/source/lehome/lehome/utils/logger.py
``
Environment
NVIDIA GeForce RTX 5090
# Start
```
cd /mnt/train-data-1-hdd/naomi/lehome-challenge
source .venv/bin/activate
```
>[!Important]
>Please make sure you are using CPU to run ==evaluation==, cuz GPU is not supported
# Training
>[!Caution]
>Please make sure you already modified each ``yaml`` file
>and set video backend as ``pyav``
> ```
> dataset:
> repo_id: repo_act
> root: Datasets/example/four_types_merged
> video_backend: pyav
> ```
## ACT
```
lerobot-train --config_path=configs/train_act.yaml --dataset.video_backend pyav
```
## Diffusion
>[!Warning]
>New file updated by the organizer last week
>``lehome-challenge/configs/train_dp.yaml``
```
policy:
type: diffusion
device: cuda
push_to_hub: false
# Avoid or increase the crop size, as the small default crop dimensions
# may restrict the field of view and lead to task failure.
crop_shape: null
crop_is_random: false
```
```
lerobot-train --config_path=configs/train_dp.yaml --dataset.video_backend pyav
```
## SmolVLA
```
lerobot-train --config_path=configs/train_smolvla.yaml --dataset.video_backend pyav
```
## π₀.₅
Create new ``yaml`` file under ``/mnt/train-data-1-hdd/naomi/lehome-challenge/configs``
```
CUDA_VISIBLE_DEVICES=1 lerobot-train --config_path=configs/train_pi05.yaml --dataset.video_backend pyav
```
``train_pi05``
Under policy, plz add these lines
```
policy:
type: pi05
device: cuda
push_to_hub: false
gradient_checkpointing: true <-- New
freeze_vision_encoder: true <-- New
train_expert_only: true <-- New
```
## XVLA
Create new ``yaml`` file under ``/mnt/train-data-1-hdd/naomi/lehome-challenge/configs``
```
CUDA_VISIBLE_DEVICES=0 lerobot-train --config_path=configs/train_xvla.yaml --dataset.video_backend pyav
```
``train_xvla``
Under policy, plz add these lines
```
policy:
type: xvla
device: cuda
push_to_hub: false
freeze_vision_encoder: true <-- New
freeze_language_encoder: true <-- New
train_policy_transformer: true <-- New
train_soft_prompts: true <-- New
resize_imgs_with_padding: [384, 384] <-- New
max_len_seq: 1024 <-- New
florence_config: <-- New
vision_config:{} <-- New
text_config:{} <-- New
```
# Evaluation
While u r evaluating ``pants``, make sure you choose certain GPU first, for example: ``CUDA_VISIBLE_DEVICES=1`` then ur command
## Conclusion Graph - Success Rate
Not going to evaluate the rest of the DPs, cuz I think the result won't be better than other models.
**Evaluation Videos**
https://huggingface.co/datasets/IntelligentDecisionLab/Evaluation_Videos_Lehome_Challenge/tree/main
**Policies**
https://huggingface.co/nytcee/lehome_other_policies/tree/main
| **Model** | **Policy (Trained Under Which One)** | **Dataset (Given Which Kind of Task)** | **Garment Type (Evaluation Result Focus on Which One)** | **Success Rate** |
|:--------:|:--------:|:-------------:|:-------------:|:--------------------:|
| ACT | top_long_merged | top_long | top_long | 66.67% |
| ACT | top_short_merged | top_short | top_short | 48.33% |
| ACT | pant_long_merged | pant_long | pant_long | 38.33% |
| ACT | record_pant_long_release_10 | pant_long | pant_long |1.67%|
| ACT | pant_short_merged | pant_short | pant_short | 81.67% |
| ACT | four_types_merged | top_long | top_long | 36.67% |
| ACT | four_types_merged | top_short | top_short | 23.33% |
| ACT | four_types_merged | pant_long | pant_long | 26.67% |
| ACT | four_types_merged | pant_short | pant_short | 73.33% |
| ACT | four_types_merged | four_types_merged | pant_short | 75.00% |
| ACT | four_types_merged_combined_act | four_types_merged | top_long | 71.67% |
| ACT | four_types_merged_combined_act | four_types_merged | top_short | 40.00% |
| ACT | four_types_merged_combined_act | four_types_merged | pant_long | 25.00% |
| ACT | four_types_merged_combined_act | four_types_merged | pant_short | 86.67% |
| DP | top_long_merged | top_long | top_long | 3.33% |
| DP | top_short_merged | top_short | top_short | 1.67% |
| DP | pant_long_merged | pant_long | pant_long | ?% |
| DP | pant_short_merged | pant_short | pant_short | ?% |
| DP | four_types_merged | top_long | top_long | ?% |
| DP | four_types_merged | top_short | top_short | ?% |
| DP | four_types_merged | pant_long | pant_long | ?% |
| DP | four_types_merged | pant_short | pant_short | ?% |
| SmolVLA | top_long_merged | top_long | top_long | 60.00%, 73.33% |
| SmolVLA | top_short_merged | top_short | top_short | 43.33% |
| SmolVLA | pant_long_merged | pant_long | pant_long | 46.67% |
| SmolVLA | pant_short_merged | pant_short | pant_short | 78.33% |
| SmolVLA | four_types_merged | top_long | top_long | 53.33% |
| SmolVLA | four_types_merged | top_short | top_short | 11.67% |
| SmolVLA | four_types_merged | pant_long | pant_long | 41.67% |
| SmolVLA | four_types_merged | pant_short | pant_short | 76.67% |
| SmolVLA | four_types_merged_combined_smolvla | four_types_merged | top_long | 65.00% |
| SmolVLA | four_types_merged_combined_smolvla | four_types_merged | top_short | 23.33% |
| SmolVLA | four_types_merged_combined_smolvla | four_types_merged | pant_long | 41.67% |
| SmolVLA | four_types_merged_combined_smolvla | four_types_merged | pant_short | 86.67% |
| π₀.₅ | four_types_merged | top_long | top_long | 3.33% |
| π₀.₅ | four_types_merged | top_short | top_short | % |
| XVLA | four_types_merged | top_long | top_long | 1.33% |
| XVLA | four_types_merged | top_short | top_short | 0.00% |
| XVLA | four_types_merged | pant_long | pant_long | 3.33% |
| XVLA | four_types_merged | pant_short | pant_short | 28.33% |
## Best Performance Graph
| **Garment Type (Evaluation Result Focus on Which One)** | **Success Rate** | **Model** | **Policy (Trained Under Which One)** | **Dataset (Given Which Kind of Task)** |
|:--------:|:--------:|:-------------:|:-------------:|:--------------------:|
| top_long | 71.67% | ACT | four_types_merged_combined_act | four_types_merged |
| top_short | 48.33% | ACT | top_short_merged | top_short |
| pant_short | 86.67% | ACT | four_types_merged_combined_act | four_types_merged |
| pant_long | 41.67% | SmolVLA | four_types_merged_combined_smolvla | four_types_merged|
## ACT
### 1. top_long_merged
Garment Type: ==top_long==
Success Rate: 66.67%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_top_long/checkpoints/last/pretrained_model --garment_type "top_long" --dataset_root Datasets/example/top_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - ============================================================
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - ============================================================
2026-03-20 15:38:16 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-20 15:38:16 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-20 15:38:16 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-20 15:38:16 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-20 15:38:16 - scripts.utils.eval_utils - INFO - Average Return: 149.39 ± 56.78
2026-03-20 15:38:16 - scripts.utils.eval_utils - INFO - Success Rate: 66.67%
2026-03-20 15:38:16 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - ============================================================
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - ============================================================
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_0: Success Rate = 100.00%, Avg Return = 129.53
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_1: Success Rate = 80.00%, Avg Return = 142.69
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_2: Success Rate = 100.00%, Avg Return = 112.16
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_3: Success Rate = 80.00%, Avg Return = 106.93
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_4: Success Rate = 80.00%, Avg Return = 144.72
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_5: Success Rate = 100.00%, Avg Return = 125.89
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_6: Success Rate = 40.00%, Avg Return = 163.18
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_7: Success Rate = 20.00%, Avg Return = 162.84
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_8: Success Rate = 60.00%, Avg Return = 163.69
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Seen_9: Success Rate = 80.00%, Avg Return = 148.88
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Unseen_0: Success Rate = 40.00%, Avg Return = 234.12
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Top_Long_Unseen_1: Success Rate = 20.00%, Avg Return = 158.07
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - ============================================================
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-20 15:38:16 - scripts.utils.evaluation - INFO - ============================================================
[1786.440s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_top_long/checkpoints/last/pretrained_model --garment_type "top_long" --dataset_root Datasets/example/top_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video --model_name "act_top_long"
```
```bash
2026-04-08 17:45:48 - scripts.utils.evaluation - INFO - Episode 5/5: Return=133.52, Length=303, Success=True
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - ============================================================
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Overall Summary
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - ============================================================
2026-04-08 17:45:49 - scripts.utils.eval_utils - INFO - ==================================================
2026-04-08 17:45:49 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-04-08 17:45:49 - scripts.utils.eval_utils - INFO - ==================================================
2026-04-08 17:45:49 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-04-08 17:45:49 - scripts.utils.eval_utils - INFO - Average Return: 146.32 ± 58.32
2026-04-08 17:45:49 - scripts.utils.eval_utils - INFO - Success Rate: 66.67%
2026-04-08 17:45:49 - scripts.utils.eval_utils - INFO - ==================================================
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - ============================================================
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - ============================================================
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_0: Success Rate = 80.00%, Avg Return = 167.60
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_1: Success Rate = 80.00%, Avg Return = 141.07
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_2: Success Rate = 80.00%, Avg Return = 131.90
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_3: Success Rate = 80.00%, Avg Return = 103.99
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_4: Success Rate = 80.00%, Avg Return = 140.38
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_5: Success Rate = 60.00%, Avg Return = 198.81
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_6: Success Rate = 40.00%, Avg Return = 141.46
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_7: Success Rate = 40.00%, Avg Return = 132.97
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_8: Success Rate = 80.00%, Avg Return = 117.59
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Seen_9: Success Rate = 60.00%, Avg Return = 179.30
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Unseen_0: Success Rate = 60.00%, Avg Return = 176.69
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Top_Long_Unseen_1: Success Rate = 60.00%, Avg Return = 124.11
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - ============================================================
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-04-08 17:45:49 - scripts.utils.evaluation - INFO - ============================================================
[1481.177s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 2. top_short_merged
Garment Type: ==top_short==
Success Rate: 48.33 %
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_top_short/checkpoints/last/pretrained_model --garment_type "top_short" --dataset_root Datasets/example/top_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video\
```
```bash
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - ============================================================
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - ============================================================
2026-03-25 17:23:14 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-25 17:23:14 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-25 17:23:14 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-25 17:23:14 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-25 17:23:14 - scripts.utils.eval_utils - INFO - Average Return: 165.36 ± 78.49
2026-03-25 17:23:14 - scripts.utils.eval_utils - INFO - Success Rate: 48.33%
2026-03-25 17:23:14 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - ============================================================
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - ============================================================
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_0: Success Rate = 60.00%, Avg Return = 172.87
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_1: Success Rate = 80.00%, Avg Return = 140.37
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_2: Success Rate = 40.00%, Avg Return = 225.67
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_3: Success Rate = 60.00%, Avg Return = 185.84
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_4: Success Rate = 100.00%, Avg Return = 151.75
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_5: Success Rate = 20.00%, Avg Return = 182.13
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_6: Success Rate = 20.00%, Avg Return = 179.29
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_7: Success Rate = 80.00%, Avg Return = 129.93
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_8: Success Rate = 60.00%, Avg Return = 136.36
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Seen_9: Success Rate = 0.00%, Avg Return = 197.94
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 120.33
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Top_Short_Unseen_1: Success Rate = 60.00%, Avg Return = 161.78
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - ============================================================
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-25 17:23:14 - scripts.utils.evaluation - INFO - ============================================================
[2145.548s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 3. pant_long_merged
Garment Type: ==pant_long==
Success Rate: 38.33 %
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_pant_long/checkpoints/last/pretrained_model --garment_type "pant_long" --dataset_root Datasets/example/top_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - ============================================================
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - ============================================================
2026-03-23 15:23:42 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-23 15:23:42 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-23 15:23:42 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-23 15:23:42 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-23 15:23:42 - scripts.utils.eval_utils - INFO - Average Return: 146.13 ± 75.45
2026-03-23 15:23:42 - scripts.utils.eval_utils - INFO - Success Rate: 38.33%
2026-03-23 15:23:42 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - ============================================================
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - ============================================================
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_0: Success Rate = 60.00%, Avg Return = 115.54
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_1: Success Rate = 40.00%, Avg Return = 148.42
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_2: Success Rate = 20.00%, Avg Return = 121.39
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_3: Success Rate = 20.00%, Avg Return = 169.37
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_4: Success Rate = 80.00%, Avg Return = 111.72
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_5: Success Rate = 60.00%, Avg Return = 161.09
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_6: Success Rate = 40.00%, Avg Return = 127.06
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_7: Success Rate = 60.00%, Avg Return = 121.18
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_8: Success Rate = 20.00%, Avg Return = 180.28
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Seen_9: Success Rate = 0.00%, Avg Return = 224.17
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_0: Success Rate = 40.00%, Avg Return = 169.25
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_1: Success Rate = 20.00%, Avg Return = 104.10
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - ============================================================
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-23 15:23:42 - scripts.utils.evaluation - INFO - ============================================================
[4193.763s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 3'. record_pant_long_release_10
Garment Type: ==pant_long==
Success Rate: 1.67%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_record_pant_long10/checkpoints/last/pretrained_model --garment_type "pant_long" --dataset_root Datasets/example/record_pant_long_release_10/001 --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - ============================================================
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Overall Summary
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - ============================================================
2026-04-01 18:00:57 - scripts.utils.eval_utils - INFO - ==================================================
2026-04-01 18:00:57 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-04-01 18:00:57 - scripts.utils.eval_utils - INFO - ==================================================
2026-04-01 18:00:57 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-04-01 18:00:57 - scripts.utils.eval_utils - INFO - Average Return: 108.05 ± 8.74
2026-04-01 18:00:57 - scripts.utils.eval_utils - INFO - Success Rate: 1.67%
2026-04-01 18:00:57 - scripts.utils.eval_utils - INFO - ==================================================
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - ============================================================
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - ============================================================
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_0: Success Rate = 20.00%, Avg Return = 125.14
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_1: Success Rate = 0.00%, Avg Return = 107.35
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_2: Success Rate = 0.00%, Avg Return = 107.11
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_3: Success Rate = 0.00%, Avg Return = 107.65
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_4: Success Rate = 0.00%, Avg Return = 107.23
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_5: Success Rate = 0.00%, Avg Return = 107.09
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_6: Success Rate = 0.00%, Avg Return = 107.26
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_7: Success Rate = 0.00%, Avg Return = 108.28
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_8: Success Rate = 0.00%, Avg Return = 97.93
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Seen_9: Success Rate = 0.00%, Avg Return = 107.33
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_0: Success Rate = 0.00%, Avg Return = 107.14
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_1: Success Rate = 0.00%, Avg Return = 107.10
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - ============================================================
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-04-01 18:00:57 - scripts.utils.evaluation - INFO - ============================================================
```
### 4. pant_short_merged
Garment Type: ==pant_short==
Success Rate: 81.67 %
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_pant_short/checkpoints/last/pretrained_model --garment_type "pant_short" --dataset_root Datasets/example/pant_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:08:22 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:08:22 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 16:08:22 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:08:22 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 16:08:22 - scripts.utils.eval_utils - INFO - Average Return: 147.16 ± 79.23
2026-03-30 16:08:22 - scripts.utils.eval_utils - INFO - Success Rate: 81.67%
2026-03-30 16:08:22 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_0: Success Rate = 80.00%, Avg Return = 134.31
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_1: Success Rate = 100.00%, Avg Return = 95.25
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_2: Success Rate = 100.00%, Avg Return = 103.08
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_3: Success Rate = 100.00%, Avg Return = 133.69
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_4: Success Rate = 100.00%, Avg Return = 89.74
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_5: Success Rate = 80.00%, Avg Return = 153.98
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_6: Success Rate = 80.00%, Avg Return = 175.08
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_7: Success Rate = 100.00%, Avg Return = 126.57
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_8: Success Rate = 80.00%, Avg Return = 184.69
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Seen_9: Success Rate = 100.00%, Avg Return = 134.43
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 238.20
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_1: Success Rate = 60.00%, Avg Return = 196.90
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 16:08:22 - scripts.utils.evaluation - INFO - ============================================================
[1630.885s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-1. four_types_merged
Garment Type: ==top_long==
Success Rate: 36.67 %
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_four_types/checkpoints/last/pretrained_model --garment_type "top_long" --dataset_root Datasets/example/top_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:10:27 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:10:27 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 16:10:27 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:10:27 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 16:10:27 - scripts.utils.eval_utils - INFO - Average Return: 125.46 ± 28.60
2026-03-30 16:10:27 - scripts.utils.eval_utils - INFO - Success Rate: 36.67%
2026-03-30 16:10:27 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_0: Success Rate = 60.00%, Avg Return = 126.16
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_1: Success Rate = 40.00%, Avg Return = 135.99
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_2: Success Rate = 20.00%, Avg Return = 90.02
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_3: Success Rate = 0.00%, Avg Return = 146.80
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_4: Success Rate = 80.00%, Avg Return = 119.67
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_5: Success Rate = 40.00%, Avg Return = 117.49
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_6: Success Rate = 40.00%, Avg Return = 129.86
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_7: Success Rate = 20.00%, Avg Return = 114.18
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_8: Success Rate = 0.00%, Avg Return = 123.83
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Seen_9: Success Rate = 80.00%, Avg Return = 137.26
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Unseen_0: Success Rate = 60.00%, Avg Return = 151.00
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Top_Long_Unseen_1: Success Rate = 0.00%, Avg Return = 113.29
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 16:10:27 - scripts.utils.evaluation - INFO - ============================================================
[2806.861s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-2. four_types_merged
Garment Type: ==top_short==
Success Rate: 23.33%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_four_types/checkpoints/last/pretrained_model --garment_type "top_short" --dataset_root Datasets/example/top_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:54:45 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:54:45 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 16:54:45 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:54:45 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 16:54:45 - scripts.utils.eval_utils - INFO - Average Return: 166.60 ± 64.32
2026-03-30 16:54:45 - scripts.utils.eval_utils - INFO - Success Rate: 23.33%
2026-03-30 16:54:45 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_0: Success Rate = 60.00%, Avg Return = 137.98
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_1: Success Rate = 40.00%, Avg Return = 147.84
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_2: Success Rate = 20.00%, Avg Return = 203.04
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_3: Success Rate = 20.00%, Avg Return = 204.27
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_4: Success Rate = 20.00%, Avg Return = 171.49
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_5: Success Rate = 20.00%, Avg Return = 181.01
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_6: Success Rate = 20.00%, Avg Return = 143.01
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_7: Success Rate = 20.00%, Avg Return = 197.67
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_8: Success Rate = 40.00%, Avg Return = 129.36
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Seen_9: Success Rate = 0.00%, Avg Return = 160.80
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 108.31
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Top_Short_Unseen_1: Success Rate = 20.00%, Avg Return = 214.45
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 16:54:45 - scripts.utils.evaluation - INFO - ============================================================
[2265.725s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-3. four_types_merged
Garment Type: ==pant_long==
Success Rate: 26.67 %
```
CUDA_VISIBLE_DEVICES=0 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_four_types/checkpoints/last/pretrained_model --garment_type pant_long --dataset_root Datasets/example/pant_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:05:27 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:05:27 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 16:05:27 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:05:27 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 16:05:27 - scripts.utils.eval_utils - INFO - Average Return: 131.46 ± 48.84
2026-03-30 16:05:27 - scripts.utils.eval_utils - INFO - Success Rate: 26.67%
2026-03-30 16:05:27 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_0: Success Rate = 20.00%, Avg Return = 135.92
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_1: Success Rate = 20.00%, Avg Return = 147.19
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_2: Success Rate = 0.00%, Avg Return = 196.68
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_3: Success Rate = 0.00%, Avg Return = 124.05
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_4: Success Rate = 60.00%, Avg Return = 120.40
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_5: Success Rate = 60.00%, Avg Return = 152.00
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_6: Success Rate = 60.00%, Avg Return = 110.75
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_7: Success Rate = 40.00%, Avg Return = 171.71
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_8: Success Rate = 40.00%, Avg Return = 85.65
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Seen_9: Success Rate = 20.00%, Avg Return = 117.20
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_0: Success Rate = 0.00%, Avg Return = 107.72
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_1: Success Rate = 0.00%, Avg Return = 108.26
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 16:05:27 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30T08:05:27Z [3,163,944ms] [Error] [omni.kit.renderer.plugin] advanceCurrentFrame: backbuffers are not initialized!
[3167.120s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-4. four_types_merged
Garment Type: ==pant_short==
Success Rate: 73.33 %
```bash
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:47:36 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:47:36 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 16:47:36 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:47:36 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 16:47:36 - scripts.utils.eval_utils - INFO - Average Return: 169.12 ± 104.57
2026-03-30 16:47:36 - scripts.utils.eval_utils - INFO - Success Rate: 73.33%
2026-03-30 16:47:36 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_0: Success Rate = 20.00%, Avg Return = 261.11
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_1: Success Rate = 60.00%, Avg Return = 139.19
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_2: Success Rate = 60.00%, Avg Return = 154.08
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_3: Success Rate = 60.00%, Avg Return = 244.64
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_4: Success Rate = 100.00%, Avg Return = 90.65
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_5: Success Rate = 100.00%, Avg Return = 124.82
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_6: Success Rate = 100.00%, Avg Return = 134.19
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_7: Success Rate = 100.00%, Avg Return = 120.90
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_8: Success Rate = 100.00%, Avg Return = 139.04
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Seen_9: Success Rate = 80.00%, Avg Return = 202.69
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 309.14
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_1: Success Rate = 100.00%, Avg Return = 109.01
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 16:47:36 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30T08:47:36Z [1,715,523ms] [Error] [omni.kit.renderer.plugin] advanceCurrentFrame: backbuffers are not initialized!
[1718.476s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
Garment Type: ==pant_short==
Success Rate: 75.00 %
**``This one is under four_types_merged Datasets``**
```
python -m scripts.eval --policy_type lerobot --policy_path outputs/train/act_four_types/checkpoints/last/pretrained_model --garment_type pant_short --dataset_root Datasets/example/four_types_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - ============================================================
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - ============================================================
2026-03-24 16:48:23 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-24 16:48:23 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-24 16:48:23 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-24 16:48:23 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-24 16:48:23 - scripts.utils.eval_utils - INFO - Average Return: 174.04 ± 116.45
2026-03-24 16:48:23 - scripts.utils.eval_utils - INFO - Success Rate: 75.00%
2026-03-24 16:48:23 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - ============================================================
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - ============================================================
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_0: Success Rate = 60.00%, Avg Return = 208.73
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_1: Success Rate = 60.00%, Avg Return = 160.85
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_2: Success Rate = 60.00%, Avg Return = 175.25
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_3: Success Rate = 80.00%, Avg Return = 244.97
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_4: Success Rate = 80.00%, Avg Return = 153.13
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_5: Success Rate = 100.00%, Avg Return = 123.77
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_6: Success Rate = 100.00%, Avg Return = 132.39
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_7: Success Rate = 100.00%, Avg Return = 120.52
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_8: Success Rate = 80.00%, Avg Return = 176.73
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Seen_9: Success Rate = 80.00%, Avg Return = 152.84
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_0: Success Rate = 20.00%, Avg Return = 264.66
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_1: Success Rate = 80.00%, Avg Return = 174.69
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - ============================================================
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-24 16:48:23 - scripts.utils.evaluation - INFO - ============================================================
[1400.317s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
```
## DP
### 1. top_long_merged
Garment Type: ==top_long==
Success Rate: 3.33%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/dp_top_long/checkpoints/last/pretrained_model --garment_type "top_long" --dataset_root Datasets/example/top_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 18:27:17 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 18:27:17 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 18:27:17 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 18:27:17 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 18:27:17 - scripts.utils.eval_utils - INFO - Average Return: 132.56 ± 42.45
2026-03-30 18:27:17 - scripts.utils.eval_utils - INFO - Success Rate: 3.33%
2026-03-30 18:27:17 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_0: Success Rate = 0.00%, Avg Return = 119.13
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_1: Success Rate = 0.00%, Avg Return = 122.30
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_2: Success Rate = 0.00%, Avg Return = 152.56
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_3: Success Rate = 0.00%, Avg Return = 117.27
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_4: Success Rate = 20.00%, Avg Return = 159.20
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_5: Success Rate = 0.00%, Avg Return = 121.11
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_6: Success Rate = 0.00%, Avg Return = 127.83
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_7: Success Rate = 0.00%, Avg Return = 127.10
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_8: Success Rate = 0.00%, Avg Return = 139.34
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Seen_9: Success Rate = 0.00%, Avg Return = 124.18
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Unseen_0: Success Rate = 20.00%, Avg Return = 126.73
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Top_Long_Unseen_1: Success Rate = 0.00%, Avg Return = 153.93
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 18:27:17 - scripts.utils.evaluation - INFO - ============================================================
```
### 2. top_short_merged
Garment Type: ==top_short==
Success Rate: 1.67%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/dp_top_short/checkpoints/last/pretrained_model --garment_type "top_short" --dataset_root Datasets/example/top_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 19:47:11 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 19:47:11 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 19:47:11 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 19:47:11 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 19:47:11 - scripts.utils.eval_utils - INFO - Average Return: 155.60 ± 42.16
2026-03-30 19:47:11 - scripts.utils.eval_utils - INFO - Success Rate: 1.67%
2026-03-30 19:47:11 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_0: Success Rate = 0.00%, Avg Return = 141.89
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_1: Success Rate = 0.00%, Avg Return = 154.78
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_2: Success Rate = 0.00%, Avg Return = 134.35
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_3: Success Rate = 0.00%, Avg Return = 184.61
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_4: Success Rate = 0.00%, Avg Return = 192.06
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_5: Success Rate = 0.00%, Avg Return = 145.66
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_6: Success Rate = 0.00%, Avg Return = 180.04
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_7: Success Rate = 0.00%, Avg Return = 168.89
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_8: Success Rate = 0.00%, Avg Return = 107.79
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Seen_9: Success Rate = 0.00%, Avg Return = 136.18
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 144.96
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Top_Short_Unseen_1: Success Rate = 20.00%, Avg Return = 175.98
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 19:47:11 - scripts.utils.evaluation - INFO - ============================================================
```
## SmolVLA
### 1. top_long_merged
**First try**
Garment Type: ==top_long==
Success Rate: 60.00%
```
python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_top_long/checkpoints/last/pretrained_model --garment_type "top_long" --dataset_root Datasets/example/top_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 13:13:54 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 13:13:54 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 13:13:54 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 13:13:54 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 13:13:54 - scripts.utils.eval_utils - INFO - Average Return: 150.57 ± 61.54
2026-03-30 13:13:54 - scripts.utils.eval_utils - INFO - Success Rate: 60.00%
2026-03-30 13:13:54 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_0: Success Rate = 60.00%, Avg Return = 175.18
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_1: Success Rate = 60.00%, Avg Return = 174.73
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_2: Success Rate = 80.00%, Avg Return = 136.29
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_3: Success Rate = 60.00%, Avg Return = 169.51
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_4: Success Rate = 20.00%, Avg Return = 171.30
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_5: Success Rate = 80.00%, Avg Return = 160.42
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_6: Success Rate = 60.00%, Avg Return = 145.60
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_7: Success Rate = 20.00%, Avg Return = 177.75
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_8: Success Rate = 100.00%, Avg Return = 119.43
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Seen_9: Success Rate = 100.00%, Avg Return = 111.39
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Unseen_0: Success Rate = 60.00%, Avg Return = 140.01
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Top_Long_Unseen_1: Success Rate = 20.00%, Avg Return = 125.24
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 13:13:54 - scripts.utils.evaluation - INFO - ============================================================
[1434.170s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
**Second try**
Garment Type: ==top_long==
Success Rate: 73.33%
```bash
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:52:26 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:52:26 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 15:52:26 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:52:26 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 15:52:26 - scripts.utils.eval_utils - INFO - Average Return: 143.85 ± 63.58
2026-03-30 15:52:26 - scripts.utils.eval_utils - INFO - Success Rate: 73.33%
2026-03-30 15:52:26 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_0: Success Rate = 80.00%, Avg Return = 144.03
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_1: Success Rate = 100.00%, Avg Return = 136.50
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_2: Success Rate = 100.00%, Avg Return = 106.64
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_3: Success Rate = 100.00%, Avg Return = 130.64
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_4: Success Rate = 60.00%, Avg Return = 173.30
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_5: Success Rate = 60.00%, Avg Return = 143.77
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_6: Success Rate = 60.00%, Avg Return = 131.30
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_7: Success Rate = 40.00%, Avg Return = 231.09
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_8: Success Rate = 80.00%, Avg Return = 171.17
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Seen_9: Success Rate = 100.00%, Avg Return = 112.65
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Unseen_0: Success Rate = 80.00%, Avg Return = 134.99
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Top_Long_Unseen_1: Success Rate = 20.00%, Avg Return = 110.12
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 15:52:26 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30T07:52:26Z [2,910,003ms] [Error] [omni.kit.renderer.plugin] advanceCurrentFrame: backbuffers are not initialized!
[2912.794s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 2. top_short_merged
Garment Type: ==top_short==
Success Rate: 43.33 %
```
python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_top_short/checkpoints/last/pretrained_model --garment_type "top_short" --dataset_root Datasets/example/top_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 12:16:46 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 12:16:46 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 12:16:46 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 12:16:46 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 12:16:46 - scripts.utils.eval_utils - INFO - Average Return: 174.95 ± 63.70
2026-03-30 12:16:46 - scripts.utils.eval_utils - INFO - Success Rate: 43.33%
2026-03-30 12:16:46 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_0: Success Rate = 80.00%, Avg Return = 160.29
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_1: Success Rate = 0.00%, Avg Return = 183.32
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_2: Success Rate = 60.00%, Avg Return = 189.17
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_3: Success Rate = 80.00%, Avg Return = 155.24
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_4: Success Rate = 100.00%, Avg Return = 151.91
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_5: Success Rate = 60.00%, Avg Return = 146.43
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_6: Success Rate = 60.00%, Avg Return = 166.27
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_7: Success Rate = 20.00%, Avg Return = 205.83
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_8: Success Rate = 40.00%, Avg Return = 178.53
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Seen_9: Success Rate = 0.00%, Avg Return = 196.72
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 152.94
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Top_Short_Unseen_1: Success Rate = 20.00%, Avg Return = 212.81
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 12:16:46 - scripts.utils.evaluation - INFO - ============================================================
[1894.876s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 3. pant_long_merged
Garment Type: ==pant_long==
Success Rate: 46.67%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_pant_long/checkpoints/last/pretrained_model --garment_type "pant_long" --dataset_root Datasets/example/pant_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:11:24 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 14:11:24 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 14:11:24 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 14:11:24 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 14:11:24 - scripts.utils.eval_utils - INFO - Average Return: 150.02 ± 77.52
2026-03-30 14:11:24 - scripts.utils.eval_utils - INFO - Success Rate: 46.67%
2026-03-30 14:11:24 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_0: Success Rate = 0.00%, Avg Return = 145.34
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_1: Success Rate = 60.00%, Avg Return = 135.79
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_2: Success Rate = 40.00%, Avg Return = 217.18
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_3: Success Rate = 20.00%, Avg Return = 128.73
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_4: Success Rate = 80.00%, Avg Return = 105.31
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_5: Success Rate = 60.00%, Avg Return = 159.65
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_6: Success Rate = 60.00%, Avg Return = 139.22
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_7: Success Rate = 100.00%, Avg Return = 74.78
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_8: Success Rate = 20.00%, Avg Return = 257.23
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Seen_9: Success Rate = 60.00%, Avg Return = 198.19
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_0: Success Rate = 60.00%, Avg Return = 114.55
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_1: Success Rate = 0.00%, Avg Return = 124.24
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 14:11:24 - scripts.utils.evaluation - INFO - ============================================================
[1798.167s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
```
### 4. pant_short_merged
Garment Type: ==pant_short==
Success Rate: 78.33%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_pant_short/checkpoints/last/pretrained_model --garment_type "pant_short" --dataset_root Datasets/example/pant_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:32:45 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 14:32:45 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 14:32:45 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 14:32:45 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 14:32:45 - scripts.utils.eval_utils - INFO - Average Return: 161.85 ± 93.60
2026-03-30 14:32:45 - scripts.utils.eval_utils - INFO - Success Rate: 78.33%
2026-03-30 14:32:45 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_0: Success Rate = 80.00%, Avg Return = 178.29
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_1: Success Rate = 100.00%, Avg Return = 94.67
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_2: Success Rate = 80.00%, Avg Return = 177.03
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_3: Success Rate = 80.00%, Avg Return = 163.91
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_4: Success Rate = 100.00%, Avg Return = 109.62
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_5: Success Rate = 80.00%, Avg Return = 177.64
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_6: Success Rate = 80.00%, Avg Return = 184.10
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_7: Success Rate = 100.00%, Avg Return = 129.64
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_8: Success Rate = 100.00%, Avg Return = 137.33
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Seen_9: Success Rate = 100.00%, Avg Return = 145.50
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 263.37
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_1: Success Rate = 40.00%, Avg Return = 181.12
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 14:32:45 - scripts.utils.evaluation - INFO - ============================================================
[1131.520s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-1. four_types_merged
Garment Type: ==top_long==
Success Rate: 53.33 %
```
CUDA_VISIBLE_DEVICES=0 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_four_types/checkpoints/last/pretrained_model --garment_type "top_long" --dataset_root Datasets/example/top_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:36:29 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:36:29 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 16:36:29 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:36:29 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 16:36:29 - scripts.utils.eval_utils - INFO - Average Return: 156.06 ± 62.76
2026-03-30 16:36:29 - scripts.utils.eval_utils - INFO - Success Rate: 53.33%
2026-03-30 16:36:29 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_0: Success Rate = 80.00%, Avg Return = 140.91
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_1: Success Rate = 40.00%, Avg Return = 177.24
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_2: Success Rate = 60.00%, Avg Return = 149.45
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_3: Success Rate = 80.00%, Avg Return = 122.37
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_4: Success Rate = 60.00%, Avg Return = 139.22
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_5: Success Rate = 80.00%, Avg Return = 131.60
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_6: Success Rate = 20.00%, Avg Return = 195.51
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_7: Success Rate = 0.00%, Avg Return = 184.03
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_8: Success Rate = 80.00%, Avg Return = 157.97
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Seen_9: Success Rate = 60.00%, Avg Return = 167.62
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Unseen_0: Success Rate = 60.00%, Avg Return = 175.40
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Top_Long_Unseen_1: Success Rate = 20.00%, Avg Return = 131.34
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 16:36:29 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30T08:36:29Z [2,194,849ms] [Error] [omni.kit.renderer.plugin] advanceCurrentFrame: backbuffers are not initialized!
[2198.124s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-2. four_types_merged
Garment Type: ==top_short==
Success Rate: 11.67 %
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_four_types/checkpoints/last/pretrained_model --garment_type "top_short" --dataset_root Datasets/example/top_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video
```
```bash
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:02:21 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:02:21 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 16:02:21 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:02:21 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 16:02:21 - scripts.utils.eval_utils - INFO - Average Return: 184.97 ± 63.13
2026-03-30 16:02:21 - scripts.utils.eval_utils - INFO - Success Rate: 11.67%
2026-03-30 16:02:21 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_0: Success Rate = 60.00%, Avg Return = 129.87
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_1: Success Rate = 0.00%, Avg Return = 159.10
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_2: Success Rate = 20.00%, Avg Return = 216.97
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_3: Success Rate = 40.00%, Avg Return = 240.06
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_4: Success Rate = 20.00%, Avg Return = 265.71
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_5: Success Rate = 0.00%, Avg Return = 206.01
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_6: Success Rate = 0.00%, Avg Return = 161.92
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_7: Success Rate = 0.00%, Avg Return = 153.62
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_8: Success Rate = 0.00%, Avg Return = 184.40
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Seen_9: Success Rate = 0.00%, Avg Return = 140.49
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Unseen_0: Success Rate = 0.00%, Avg Return = 170.04
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Top_Short_Unseen_1: Success Rate = 0.00%, Avg Return = 191.50
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 16:02:21 - scripts.utils.evaluation - INFO - ============================================================
[3659.861s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-3. four_types_merged
Garment Type: ==pant_long==
Success Rate: 41.67%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_four_types/checkpoints/last/pretrained_model --garment_type "pant_long" --dataset_root Datasets/example/pant_long_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:23:47 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:23:47 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 15:23:47 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:23:47 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 15:23:47 - scripts.utils.eval_utils - INFO - Average Return: 143.45 ± 67.44
2026-03-30 15:23:47 - scripts.utils.eval_utils - INFO - Success Rate: 41.67%
2026-03-30 15:23:47 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_0: Success Rate = 0.00%, Avg Return = 121.01
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_1: Success Rate = 40.00%, Avg Return = 133.03
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_2: Success Rate = 60.00%, Avg Return = 166.87
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_3: Success Rate = 20.00%, Avg Return = 133.58
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_4: Success Rate = 60.00%, Avg Return = 139.52
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_5: Success Rate = 40.00%, Avg Return = 175.26
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_6: Success Rate = 100.00%, Avg Return = 119.45
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_7: Success Rate = 20.00%, Avg Return = 194.48
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_8: Success Rate = 20.00%, Avg Return = 128.64
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Seen_9: Success Rate = 40.00%, Avg Return = 167.91
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_0: Success Rate = 60.00%, Avg Return = 151.98
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Pant_Long_Unseen_1: Success Rate = 40.00%, Avg Return = 89.71
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 15:23:47 - scripts.utils.evaluation - INFO - ============================================================
[2595.787s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```
### 5-4. four_types_merged
Garment Type: ==pant_short==
Success Rate: 76.67%
```
CUDA_VISIBLE_DEVICES=1 python -m scripts.eval --policy_type lerobot --policy_path outputs/train/smolvla_four_types/checkpoints/last/pretrained_model --garment_type "pant_short" --dataset_root Datasets/example/pant_short_merged --num_episodes 5 --enable_cameras --device cpu --save_video --task_description "fold the garment on the table"
```
```bash
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Overall Summary
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:21:20 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:21:20 - scripts.utils.eval_utils - INFO - Evaluation Results Summary
2026-03-30 15:21:20 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:21:20 - scripts.utils.eval_utils - INFO - Total Episodes: 60
2026-03-30 15:21:20 - scripts.utils.eval_utils - INFO - Average Return: 163.73 ± 103.17
2026-03-30 15:21:20 - scripts.utils.eval_utils - INFO - Success Rate: 76.67%
2026-03-30 15:21:20 - scripts.utils.eval_utils - INFO - ==================================================
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Per-Garment Summary
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_0: Success Rate = 60.00%, Avg Return = 165.78
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_1: Success Rate = 80.00%, Avg Return = 117.49
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_2: Success Rate = 100.00%, Avg Return = 95.85
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_3: Success Rate = 60.00%, Avg Return = 241.39
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_4: Success Rate = 100.00%, Avg Return = 134.53
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_5: Success Rate = 80.00%, Avg Return = 177.31
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_6: Success Rate = 100.00%, Avg Return = 132.45
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_7: Success Rate = 100.00%, Avg Return = 129.04
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_8: Success Rate = 60.00%, Avg Return = 259.84
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Seen_9: Success Rate = 100.00%, Avg Return = 131.02
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_0: Success Rate = 40.00%, Avg Return = 197.52
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Pant_Short_Unseen_1: Success Rate = 40.00%, Avg Return = 182.49
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - ============================================================
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - Evaluation completed successfully
2026-03-30 15:21:20 - scripts.utils.evaluation - INFO - ============================================================
[1835.825s] Simulation App Shutting Down
/home/nytcee/.local/share/uv/python/cpython-3.11.14-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(lehome) nytcee@idlab1:/mnt/train-data-1-hdd/naomi/lehome-challenge$
```