🤖 Robot Decision & Command System

# 🤖 Robot Decision & Command System - **Workspace:** `robot_ws` - **Main Package:** `decision_maker` - **Helper Package:** `object_query` - **Status:** Active / Development ## 1. Project Structure Based on the current source code, the system is organized into modular Python nodes and interface packages. ```text data/ ├── params.npz ├── semantic.json src/ ├── decision_maker/ # Main Orchestrator Package │ ├── decision_maker/ # Python Source Code │ │ ├── __init__.py │ │ ├── audio_command_node.py # 🎤 Voice Input (Whisper) │ │ ├── cancel_command_node.py # 🛑 Safety/Abort Input │ │ ├── decision_maker_node.py # 🧠 The Main Brain │ │ ├── nl_command_node.py # ⌨️ Text Input (Terminal) │ │ ├── scenario_library.py # 📚 Logic & Translation │ │ ├── command_types.py # Data Classes │ │ ├── nl_planner.py # Planner Utils │ │ ├── mock_grasp_server.py # Simulation Helper │ │ └── mock_nav_server.py # Simulation Helper │ ├── launch/ # Launch Files │ ├── resource/ │ ├── test/ │ ├── package.xml │ └── setup.py │ ├── decision_maker_interfaces/ # Custom Action Definitions │ ├── object_query/ # Semantic Map Service │ ├── object_query/ │ │ ├── object_query_server.py # 🗺️ 3D Object Database │ │ └── object_query_client.py │ └── ... │ └── object_query_interfaces/ # Custom Service Definitions ``` --- ## 2. System Architecture The system follows a **Producer-Consumer** pattern. Three distinct input nodes produce commands on the `/manual_command` topic. The Decision Maker consumes them, translates user intent into primitives via the **Scenario Library**, and executes them safely. ```mermaid graph TD %% Define Styles classDef input fill:#e1f5fe,stroke:#01579b,stroke-width:2px; classDef logic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px; classDef hardware fill:#fff3e0,stroke:#ef6c00,stroke-width:2px; classDef safety fill:#ffebee,stroke:#c62828,stroke-width:2px; classDef slam fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px; %% --- STAGE 0: PERCEPTION & SLAM --- subgraph Perception_Layer [Stage 0: Perception & Mapping] direction TB Cam[RGB-D Camera] ==>|Stream| SGS_SLAM(SGS-SLAM Server) SGS_SLAM -->|Generate| MapData[(Map Data .npz)] MapData -.->|Read| QueryServ[object_query_server] end %% --- STAGE 1: INPUTS --- subgraph Input_Layer [Stage 1: Input Interfaces] direction TB Mic[Microphone] -->|Raw Audio| AudioNode(audio_command_node) AudioNode -->|Text| AudioCmd[/manual_command/] Keyboard[Terminal Input] -->|stdin| TextNode(nl_command_node) TextNode -->|Text| AudioCmd StopKey[Enter Key] -->|stdin| CancelNode(cancel_command_node) CancelNode -->|Signal| CancelCmd[/cancel_command/] end %% --- STAGE 2: THE BRAIN --- subgraph Brain_Layer [Stage 2: Decision Maker Node] direction TB AudioCmd --> Listener(Command Listener) CancelCmd -->|High Priority| Abort(Abort Logic) Listener -->|Lookup Phrase| Registry{Scenario Registry} Registry -.->|Import| Lib[scenario_library.py] Lib -->|Return Plan| Primitives[List: goto, grasp, place] Primitives --> Queue[Command Queue] Abort -.->|Clear| Queue Queue -->|Dequeue| Executor[Execution Thread] Abort -.->|Stop| Executor %% Object Query Interaction Executor -- "Query: 'bottle'" --> QueryServ QueryServ -->|Return: XYZ| Executor end %% --- STAGE 3: HARDWARE EXECUTION --- subgraph Hardware_Layer [Stage 4: Hardware Action] Executor -- "goto:pose" --> Nav2[Nav2 Stack] Executor -- "grasp/place" --> Arm[Manipulation Stack] end %% Feedback Loop Executor -.->|/task_status| TextNode %% Apply Styles class Mic,AudioNode,Keyboard,TextNode input; class Listener,Registry,Lib,Primitives,Queue,Executor logic; class Nav2,Arm hardware; class StopKey,CancelNode,Abort,CancelCmd safety; class SGS_SLAM,MapData,QueryServ,Cam slam; ``` --- ## 3. Stage 0: SLAM Explore The system initially explore the environment before accepting any command. This part is still under development. ## 4. Stage 1: Input Interfaces The system provides three ways to interact with the robot, defined by the files in `decision_maker/`. ### A. Audio Input (`audio_command_node.py`) * **Role:** Hands-free operation using OpenAI Whisper. * **Logic:** Continuously buffers **5-second chunks** of audio, transcribes them, removes non-alphanumeric characters, and publishes to `/manual_command`. * **Latency:** High (~1-3s due to transcription). ### B. Text Input (`nl_command_node.py`) * **Role:** Debugging and testing logic without speaking. * **Logic:** Non-blocking terminal interface. Reads typing via `sys.stdin`, publishes to `/manual_command`, and listens to `/task_status` for feedback. * **Latency:** Instant. ### C. Safety Input (`cancel_command_node.py`) * **Role:** Emergency Abort. * **Logic:** A lightweight node monitoring a dedicated terminal. Hitting "Enter" immediately sends a signal to `/cancel_command`, causing the Brain to clear its queue and stop all motors. --- ## 5. Stage 2: Scenario Logic (`scenario_library.py`) This file acts as the **Semantic Translator**. It decouples the robot's *actions* from the user's *intent*. ### A. The Registry Pattern We do not hardcode logic inside the main node. Instead, we map phrases to functions: ```python # scenario_library.py SCENARIO_REGISTRY = { "give me": give_item, # -> ["goto:kitchen", "grasp:item", "place:user"] "go home": park_robot, # -> ["goto:home"] "clean table": clean_table, # -> ["goto:table", "grasp:trash", "place:bin"] } ``` ### B. 🧠 LLM Extensibility This module is the designated injection point for Generative AI. * **Current State:** Deterministic (Dictionary Lookup). * **Future State:** The `SCENARIO_REGISTRY` can be replaced by an LLM Agent (using `nl_planner.py`) that accepts the text prompt and world context to generate the list of primitives dynamically. --- ## 6. Stage 3: Execution Engine (`decision_maker_node.py`) The Brain executes the primitives generated by the Scenario Library. ### A. Threading & Safety * **Sensor Thread:** Runs `rclpy.spin()` to keep subscribers (like Cancel) active. * **Execution Thread:** Uses **Blocking I/O** (waiting for action results) to ensure tasks happen sequentially. ### B. Supported Primitives The internal executor understands these string formats: 1. **`goto:x,y,theta`** - Sends Nav2 goal. 2. **`grasp:object_name`** - Queries `object_query_server`, then grasps. 3. **`place:location`** - Triggers placement sequence. ### C. Safety Timeouts & Logic Strict timeouts are enforced to prevent the robot from "freezing" if a hardware action hangs or becomes unreachable. The timeout logic works on two levels: **Global Batch** and **Atomic Action**. #### The Logic Process The `_execute_batch` function implements a "Watchdog" timer that runs alongside the sequential execution of primitives. ```mermaid graph TD Start([Start Batch]) --> InitTime[Record start_time] InitTime --> CheckList{More Primitives?} CheckList -- Yes --> CheckTime{Elapsed > 150s?} CheckTime -- Yes (Timeout) --> Cancel[TRIGGER ABORT] Cancel --> PubCancel[Publish /cancel_command] PubCancel --> Stop[Return False] CheckTime -- No (Safe) --> Exec[Execute Next Primitive] Exec --> ActionWait{Action Client Wait} ActionWait -- Success --> CheckList ActionWait -- Action Timeout --> Cancel CheckList -- No --> Finish([Task Complete]) style Cancel fill:#f96,stroke:#333,stroke-width:2px ``` 1. **Initialization:** When a command batch starts (e.g., "bring apple"), the system records `start_time = time.time()`. 2. **Interval Check:** Before starting *any* new primitive (Goto, Grasp, etc.), the system calculates `elapsed = time.time() - start_time`. 3. **Global Abort:** If `elapsed > 150.0s`, the system internally calls `_send_cancel()`. This publishes to `/cancel_command`, ensuring that even if the logic layer stops, the hardware layer also receives a distinct stop signal. 4. **Atomic Wait:** Individual ROS actions (like Nav2) are called with `spin_until_future_complete(timeout=X)`. If the specific action fails to report back within its specific limit (e.g., 15s for Nav), the batch is immediately aborted. #### Timeout Configuration Table | Scope | Duration | Logic Implementation | Action on Timeout | | :--- | :--- | :--- | :--- | | **Total Batch** | **150.0s** | `time.time() - batch_start_time` checked before every step. | Sequence cancelled. Robot stops. | --- ## 7. Object Query Service **Package:** `object_query` **File:** `object_query_server.py` This node is a dependency for the `grasp:` primitive. * **Input:** Loads `.npz` (point cloud) and `.json` (semantic labels). * **Service:** Provides `object_query_interfaces/srv/ObjectQuery`. * **Logic:** Returns the centroid $(x, y, z)$ of the requested object label. --- ## 8. Usage To run the full system, use separate terminals. ### 1. Start the Infrastructure (Object DB) ```bash ros2 run object_query object_query_server ``` ### 2. Start the Brain (Decision Maker) ```bash ros2 run decision_maker decision_maker_node ``` *Note: If you don't have hardware, use the mocks found in your source:* *`ros2 launch decision_maker mock_actions.launch.py`* ### 3. Start an Input Interface **Option A (Voice):** ```bash ros2 run decision_maker audio_command_node ``` **Option B (Text):** ```bash ros2 run decision_maker nl_command_node ``` ### 4. Start the Safety Switch ```bash ros2 run decision_maker cancel_command_node ``` --- ## 9. Troubleshooting :::info **Simulation Mode** Your file list contains `decision_maker_node_isaacsim.py`. If running in Isaac Sim, ensure you run this node instead of the standard `decision_maker_node.py` to ensure coordinate frames match the simulation environment. ::: :::warning **Microphone Conflicts** If `audio_command_node` crashes with `PortAudioError`, ensure `nl_command_node` or other apps (Zoom/Teams) are not blocking the audio device. ::: # 🤖 SGS-SLAM ROS 2 Streaming Guide ## Project Overview This guide describes how to run the Streaming Gaussian Splatting SLAM (SGS-SLAM) system using a ROS 2 Action Server (`run_gs_slam_server`) and a dedicated client (`run_gs_slam_client`). The system handles high-frequency camera streams by decoupling data reception from the heavy SLAM computation. It utilizes dynamic topic subscription throttling to manage processing lag and ensures data integrity through synchronized disk I/O. ## 🏗️ System Architecture & Process Diagram The `StreamingGSSLAMServer` uses a multi-threaded Producer-Consumer architecture to bridge ROS 2 topics with the file-based SLAM algorithm. ### Process Flow Diagram ```mermaid graph TD %% ROS Layer subgraph ROS_Layer [ROS 2 Callbacks] RGB[/rgb topic/] -->|Resize 1200x680| Matcher Depth[/depth topic/] -->|Float->UInt16 & Buffer| Matcher Matcher{Timestamp Match} -->|Pair Found| IO_Queue[I/O Queue] end %% Disk Layer subgraph Disk_Layer [Async Disk Writer] IO_Queue -->|Dequeue| Writer(Disk Writer Thread) Writer -->|Write .jpg| TmpRGB[tmp/frames/] Writer -->|Write .png| TmpDepth[tmp/depths/] Writer -.->|Sync Confirmation| Flag[Completed Set] end %% SLAM Layer subgraph Logic_Layer [Main SLAM Loop] Check{Files Ready?} -->|Yes| Launch[Launch Subprocess] Launch -->|python slam_copy.py| Subprocess(SLAM Core) Subprocess -->|Generate| Checkpoint[.npz / .ply Map] %% Lag Control Logic Subprocess -.->|Lag > 300| Unsub(Unsubscribe Topics) Subprocess -.->|Lag < 150| Resub(Resume Subscription) end TmpRGB --> Check TmpDepth --> Check Flag --> Check ``` ## ⚙️ The Mapping Process The mapping process is performed incrementally, frame-by-frame, to allow for real-time streaming data ingestion. 1. **Initialization:** - The server cleans up temporary directories (`tmp/frames`, `tmp/depths`). - It publishes a `/slam_start_signal` to tell the client (Dummy Node) to start streaming. 2. **Streaming & Buffering:** - Incoming RGB images are resized to **1200x680**. - Incoming Depth images are converted from **Meters (Float32)** to **Millimeters (UInt16)**. - Data is written to disk asynchronously via a separate thread to avoid blocking the ROS callbacks. 3. **Incremental Optimization:** - The main loop monitors the disk for fully written frame pairs. - **Frame 0:** The SLAM subprocess starts from scratch with `SLAM_LOAD_CHECKPOINT=False`. - **Frame N:** The subprocess loads the "latest" checkpoint from the previous step (`SLAM_LOAD_CHECKPOINT=True`), integrates the new frame, optimizes the 3D Gaussians, and saves the state. 4. **Completion:** - Once the client signals `data_transfer_done` and the queue is empty, the server waits for the final subprocess to exit and returns the result. ## 📂 .npz File Generation The system does not generate the final map file directly in the ROS node Python code. Instead, it acts as a wrapper that triggers the generation via the core SLAM library: 1. **Trigger:** The `StreamingGSSLAMServer` executes a Python subprocess for every frame: ```bash python scripts/slam_copy.py configs/replica/slam.py ``` 2. **Environment Variables:** The server passes context via environment variables: - `SLAM_START_FRAME`: Tells the script which specific frame to process. - `SLAM_CHECKPOINT_NAME`: Set to `'latest'` to ensure continuous mapping. 3. **Output:** The `slam_copy.py` script performs the Gaussian Splatting optimization. It saves the intermediate and final map representations (typically **.npz** or **.ply** files) into the `experiments/` directory defined in the config. --- ## Prerequisites & Setup Before running the commands, ensure the following steps are completed in **both terminals**: 0. Navigate to the workspace: ```bash cd /mnt/HDD3/Sabrina/catkin_ws ``` 1. **Activate Conda Environment:** ```bash conda activate slam ``` 2. **Source ROS Workspace:** ```bash source install/setup.bash ``` **Note:** Ensure the `camera node` is running or prepared to run via the client, as SGS-SLAM requires active RGB and Depth topics. ## Execution Steps Open **two separate terminal windows** and execute the commands below in their respective terminals. ### Terminal 1: Start the SLAM Action Server The server waits for the client goal, manages temporary image storage, and launches the core SGS-SLAM optimization subprocesses per frame. ```bash export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtiff.so.5.7.0" && ros2 run semantic_gs_slam_ros run_gs_slam_server ``` ### Terminal 2: Start the Client and Data Streamer The client sends the action goal to initiate the process and simultaneously acts as the **Dummy Node** to stream the RGB-D image sequence. ```bash export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtiff.so.5.7.0" && ros2 run semantic_gs_slam_ros run_gs_slam_client ``` ## 💾 Result Location Upon completion of the SLAM process (when all frames are processed), the final reconstructed Gaussian Splatting map and experiment logs will be saved to the following directory: ```text /mnt/HDD3/Sabrina/catkin_ws/src/SGS_SLAM/experiments/streamimg_test_ros ``` # 🤖 Object Query Server Node - **Package:** `object_query` - **Node Name:** `object_query_server` - **Status:** Active ## 1. Overview The **ObjectQueryServer** is a ROS 2 node designed to serve as a semantic database for the robot. It acts as a bridge between the robot's perception data and its navigation or task planning systems. It loads a pre-built 3D semantic map (typically from Gaussian Splatting or Point Cloud data), calculates the centroid of labeled objects, and provides a service interface for other nodes to query the location of specific objects by name. ## 2. Dependencies ### System - **ROS 2** (Humble/Jazzy) - **Python 3.10+** ### Libraries - `numpy` - `json` - `os` (Standard library) ### Custom Interfaces - `object_query_interfaces` (Must be built in the workspace) --- ## 3. ROS 2 Interfaces ### 📡 Parameters | Parameter Name | Type | Default Value | Description | | :--- | :--- | :--- | :--- | | `map_path` | `string` | `robot_ws/data/params.npz` | Absolute path to the `.npz` file containing 3D point data. | | `semantic_path` | `string` | `robot_ws/data/semantic.json` | Absolute path to the `.json` file containing class labels. | ### 🛠 Services **Service Name:** `/object_query` **Type:** `object_query_interfaces/srv/ObjectQuery` This service accepts an object name (case-insensitive) and returns its 3D coordinates. #### Request Definition ```python string name # e.g., "chair", "cup" ``` #### Response Definition ```python bool found # True if object exists in DB geometry_msgs/Point position # (x, y, z) centroid string message # Status message ``` ### 📢 Publishers **Topic:** `/object_list` **Type:** `std_msgs/msg/String` Publishes the entire database of loaded objects as a JSON-formatted string upon startup. This is useful for debugging or verifying which objects were successfully loaded. **Example Payload:** ```json { "chair": [1.5, 2.0, 0.0], "table": [3.2, 1.1, 0.5], "monitor": [-1.0, 0.5, 1.2] } ``` --- ## 4. Input Data Format The node requires two specific files to function correctly. ### A. Map File (`.npz`) A `numpy` archive containing the 3D data. It must contain the following keys: - **`means3D`**: Array of shape `(N, 3)` containing XYZ coordinates. - **`semantic_ids`**: Array of shape `(N,)` containing integer IDs corresponding to classes. ### B. Semantic File (`.json`) A JSON file mapping integer IDs to human-readable class names. **Schema:** ```json { "segmentation": [ { "id": 1, "class": "chair" }, { "id": 2, "class": "table" } ] } ``` --- ## 5. Usage ### Running via CLI You can run the node directly using `ros2 run`. Ensure you override the parameters if your data is not in the default hardcoded path. ```bash ros2 run home_robot object_query_server --ros-args \ -p map_path:="/path/to/your/data/params.npz" \ -p semantic_path:="/path/to/your/data/semantic.json" ``` ### Running via Launch File It is recommended to use a launch file to handle the paths cleanly. ```python from launch import LaunchDescription from launch_ros.actions import Node def generate_launch_description(): return LaunchDescription([ Node( package='home_robot', executable='object_query_server', name='object_query_server', output='screen', parameters=[{ 'map_path': '/home/user/data/scene_01.npz', 'semantic_path': '/home/user/data/classes.json' }] ) ]) ``` ### Testing the Service To manually test if the node is working, use the ROS 2 CLI: ```bash ros2 service call /object_query object_query_interfaces/srv/ObjectQuery "{name: 'chair'}" ``` --- ## 6. Notes & Troubleshooting :::info **Calculation Logic:** The node calculates the **centroid** (mean position) of all points belonging to a specific semantic ID. It currently does not account for the object's orientation or bounding box dimensions. ::: :::warning **Common Errors:** 1. **"Map file not found":** Double-check the absolute paths provided in the parameters. 2. **"Found: False":** The query is case-insensitive, but the spelling must match the JSON class name exactly. ::: ## 7. Theory of Operation ### A. Object Position Computation The node determines the position of an object (e.g., a "chair") by calculating its geometric **centroid** from the raw point cloud data. This process occurs once during the initialization phase (`load_semantic_map`). The algorithm follows these steps: 1. **Data Loading:** The node loads `means3D` (an array of XYZ coordinates for every point in the map) and `semantic_ids` (an array of integer labels corresponding to those points). 2. **Semantic Masking:** For every unique object class found in the JSON file (e.g., `id: 1` -> "chair"), the node creates a **boolean mask**. This mask isolates only the points in the 3D map that belong to that specific ID. $$P_{chair} = \{ p \in \text{means3D} \mid \text{semantic\_id}(p) = 1 \}$$ 3. **Centroid Calculation:** The node calculates the arithmetic mean of these isolated points along the X, Y, and Z axes. This results in a single 3D coordinate representing the center of the object cluster. $$C_{x,y,z} = \frac{1}{N} \sum_{i=1}^{N} P_i$$ *Where $N$ is the total number of points belonging to that object class.* 4. **Storage:** This calculated coordinate is stored in the `self.object_db` dictionary with the object name (lowercased) as the key. ### B. Object Listing Process The Object Listing process is designed to provide downstream nodes (or the user) with a complete catalog of available objects immediately after the map is processed. 1. **Database Serialization:** Once the `object_db` is fully populated with names and centroid coordinates, the node uses Python's `json.dumps()` method to serialize the dictionary into a string format. 2. **Publishing:** This JSON string is wrapped in a `std_msgs/String` message and published to the `/object_list` topic. * **Why JSON?** Using a JSON string allows the node to publish a variable number of objects without needing a custom dynamic array message definition. It can be easily parsed by other Python or C++ nodes using standard JSON libraries.