# 🤖 Robot Decision & Command System
- **Workspace:** `robot_ws`
- **Main Package:** `decision_maker`
- **Helper Package:** `object_query`
- **Status:** Active / Development
## 1. Project Structure
Based on the current source code, the system is organized into modular Python nodes and interface packages.
```text
data/
├── params.npz
├── semantic.json
src/
├── decision_maker/ # Main Orchestrator Package
│ ├── decision_maker/ # Python Source Code
│ │ ├── __init__.py
│ │ ├── audio_command_node.py # 🎤 Voice Input (Whisper)
│ │ ├── cancel_command_node.py # 🛑 Safety/Abort Input
│ │ ├── decision_maker_node.py # 🧠 The Main Brain
│ │ ├── nl_command_node.py # ⌨️ Text Input (Terminal)
│ │ ├── scenario_library.py # 📚 Logic & Translation
│ │ ├── command_types.py # Data Classes
│ │ ├── nl_planner.py # Planner Utils
│ │ ├── mock_grasp_server.py # Simulation Helper
│ │ └── mock_nav_server.py # Simulation Helper
│ ├── launch/ # Launch Files
│ ├── resource/
│ ├── test/
│ ├── package.xml
│ └── setup.py
│
├── decision_maker_interfaces/ # Custom Action Definitions
│
├── object_query/ # Semantic Map Service
│ ├── object_query/
│ │ ├── object_query_server.py # 🗺️ 3D Object Database
│ │ └── object_query_client.py
│ └── ...
│
└── object_query_interfaces/ # Custom Service Definitions
```
---
## 2. System Architecture
The system follows a **Producer-Consumer** pattern. Three distinct input nodes produce commands on the `/manual_command` topic. The Decision Maker consumes them, translates user intent into primitives via the **Scenario Library**, and executes them safely.
```mermaid
graph TD
%% Define Styles
classDef input fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
classDef logic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
classDef hardware fill:#fff3e0,stroke:#ef6c00,stroke-width:2px;
classDef safety fill:#ffebee,stroke:#c62828,stroke-width:2px;
classDef slam fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px;
%% --- STAGE 0: PERCEPTION & SLAM ---
subgraph Perception_Layer [Stage 0: Perception & Mapping]
direction TB
Cam[RGB-D Camera] ==>|Stream| SGS_SLAM(SGS-SLAM Server)
SGS_SLAM -->|Generate| MapData[(Map Data .npz)]
MapData -.->|Read| QueryServ[object_query_server]
end
%% --- STAGE 1: INPUTS ---
subgraph Input_Layer [Stage 1: Input Interfaces]
direction TB
Mic[Microphone] -->|Raw Audio| AudioNode(audio_command_node)
AudioNode -->|Text| AudioCmd[/manual_command/]
Keyboard[Terminal Input] -->|stdin| TextNode(nl_command_node)
TextNode -->|Text| AudioCmd
StopKey[Enter Key] -->|stdin| CancelNode(cancel_command_node)
CancelNode -->|Signal| CancelCmd[/cancel_command/]
end
%% --- STAGE 2: THE BRAIN ---
subgraph Brain_Layer [Stage 2: Decision Maker Node]
direction TB
AudioCmd --> Listener(Command Listener)
CancelCmd -->|High Priority| Abort(Abort Logic)
Listener -->|Lookup Phrase| Registry{Scenario Registry}
Registry -.->|Import| Lib[scenario_library.py]
Lib -->|Return Plan| Primitives[List: goto, grasp, place]
Primitives --> Queue[Command Queue]
Abort -.->|Clear| Queue
Queue -->|Dequeue| Executor[Execution Thread]
Abort -.->|Stop| Executor
%% Object Query Interaction
Executor -- "Query: 'bottle'" --> QueryServ
QueryServ -->|Return: XYZ| Executor
end
%% --- STAGE 3: HARDWARE EXECUTION ---
subgraph Hardware_Layer [Stage 4: Hardware Action]
Executor -- "goto:pose" --> Nav2[Nav2 Stack]
Executor -- "grasp/place" --> Arm[Manipulation Stack]
end
%% Feedback Loop
Executor -.->|/task_status| TextNode
%% Apply Styles
class Mic,AudioNode,Keyboard,TextNode input;
class Listener,Registry,Lib,Primitives,Queue,Executor logic;
class Nav2,Arm hardware;
class StopKey,CancelNode,Abort,CancelCmd safety;
class SGS_SLAM,MapData,QueryServ,Cam slam;
```
---
## 3. Stage 0: SLAM Explore
The system initially explore the environment before accepting any command. This part is still under development.
## 4. Stage 1: Input Interfaces
The system provides three ways to interact with the robot, defined by the files in `decision_maker/`.
### A. Audio Input (`audio_command_node.py`)
* **Role:** Hands-free operation using OpenAI Whisper.
* **Logic:** Continuously buffers **5-second chunks** of audio, transcribes them, removes non-alphanumeric characters, and publishes to `/manual_command`.
* **Latency:** High (~1-3s due to transcription).
### B. Text Input (`nl_command_node.py`)
* **Role:** Debugging and testing logic without speaking.
* **Logic:** Non-blocking terminal interface. Reads typing via `sys.stdin`, publishes to `/manual_command`, and listens to `/task_status` for feedback.
* **Latency:** Instant.
### C. Safety Input (`cancel_command_node.py`)
* **Role:** Emergency Abort.
* **Logic:** A lightweight node monitoring a dedicated terminal. Hitting "Enter" immediately sends a signal to `/cancel_command`, causing the Brain to clear its queue and stop all motors.
---
## 5. Stage 2: Scenario Logic (`scenario_library.py`)
This file acts as the **Semantic Translator**. It decouples the robot's *actions* from the user's *intent*.
### A. The Registry Pattern
We do not hardcode logic inside the main node. Instead, we map phrases to functions:
```python
# scenario_library.py
SCENARIO_REGISTRY = {
"give me": give_item, # -> ["goto:kitchen", "grasp:item", "place:user"]
"go home": park_robot, # -> ["goto:home"]
"clean table": clean_table, # -> ["goto:table", "grasp:trash", "place:bin"]
}
```
### B. 🧠 LLM Extensibility
This module is the designated injection point for Generative AI.
* **Current State:** Deterministic (Dictionary Lookup).
* **Future State:** The `SCENARIO_REGISTRY` can be replaced by an LLM Agent (using `nl_planner.py`) that accepts the text prompt and world context to generate the list of primitives dynamically.
---
## 6. Stage 3: Execution Engine (`decision_maker_node.py`)
The Brain executes the primitives generated by the Scenario Library.
### A. Threading & Safety
* **Sensor Thread:** Runs `rclpy.spin()` to keep subscribers (like Cancel) active.
* **Execution Thread:** Uses **Blocking I/O** (waiting for action results) to ensure tasks happen sequentially.
### B. Supported Primitives
The internal executor understands these string formats:
1. **`goto:x,y,theta`** - Sends Nav2 goal.
2. **`grasp:object_name`** - Queries `object_query_server`, then grasps.
3. **`place:location`** - Triggers placement sequence.
### C. Safety Timeouts & Logic
Strict timeouts are enforced to prevent the robot from "freezing" if a hardware action hangs or becomes unreachable. The timeout logic works on two levels: **Global Batch** and **Atomic Action**.
#### The Logic Process
The `_execute_batch` function implements a "Watchdog" timer that runs alongside the sequential execution of primitives.
```mermaid
graph TD
Start([Start Batch]) --> InitTime[Record start_time]
InitTime --> CheckList{More Primitives?}
CheckList -- Yes --> CheckTime{Elapsed > 150s?}
CheckTime -- Yes (Timeout) --> Cancel[TRIGGER ABORT]
Cancel --> PubCancel[Publish /cancel_command]
PubCancel --> Stop[Return False]
CheckTime -- No (Safe) --> Exec[Execute Next Primitive]
Exec --> ActionWait{Action Client Wait}
ActionWait -- Success --> CheckList
ActionWait -- Action Timeout --> Cancel
CheckList -- No --> Finish([Task Complete])
style Cancel fill:#f96,stroke:#333,stroke-width:2px
```
1. **Initialization:** When a command batch starts (e.g., "bring apple"), the system records `start_time = time.time()`.
2. **Interval Check:** Before starting *any* new primitive (Goto, Grasp, etc.), the system calculates `elapsed = time.time() - start_time`.
3. **Global Abort:** If `elapsed > 150.0s`, the system internally calls `_send_cancel()`. This publishes to `/cancel_command`, ensuring that even if the logic layer stops, the hardware layer also receives a distinct stop signal.
4. **Atomic Wait:** Individual ROS actions (like Nav2) are called with `spin_until_future_complete(timeout=X)`. If the specific action fails to report back within its specific limit (e.g., 15s for Nav), the batch is immediately aborted.
#### Timeout Configuration Table
| Scope | Duration | Logic Implementation | Action on Timeout |
| :--- | :--- | :--- | :--- |
| **Total Batch** | **150.0s** | `time.time() - batch_start_time` checked before every step. | Sequence cancelled. Robot stops. |
---
## 7. Object Query Service
**Package:** `object_query`
**File:** `object_query_server.py`
This node is a dependency for the `grasp:` primitive.
* **Input:** Loads `.npz` (point cloud) and `.json` (semantic labels).
* **Service:** Provides `object_query_interfaces/srv/ObjectQuery`.
* **Logic:** Returns the centroid $(x, y, z)$ of the requested object label.
---
## 8. Usage
To run the full system, use separate terminals.
### 1. Start the Infrastructure (Object DB)
```bash
ros2 run object_query object_query_server
```
### 2. Start the Brain (Decision Maker)
```bash
ros2 run decision_maker decision_maker_node
```
*Note: If you don't have hardware, use the mocks found in your source:*
*`ros2 launch decision_maker mock_actions.launch.py`*
### 3. Start an Input Interface
**Option A (Voice):**
```bash
ros2 run decision_maker audio_command_node
```
**Option B (Text):**
```bash
ros2 run decision_maker nl_command_node
```
### 4. Start the Safety Switch
```bash
ros2 run decision_maker cancel_command_node
```
---
## 9. Troubleshooting
:::info
**Simulation Mode**
Your file list contains `decision_maker_node_isaacsim.py`. If running in Isaac Sim, ensure you run this node instead of the standard `decision_maker_node.py` to ensure coordinate frames match the simulation environment.
:::
:::warning
**Microphone Conflicts**
If `audio_command_node` crashes with `PortAudioError`, ensure `nl_command_node` or other apps (Zoom/Teams) are not blocking the audio device.
:::
# 🤖 SGS-SLAM ROS 2 Streaming Guide
## Project Overview
This guide describes how to run the Streaming Gaussian Splatting SLAM (SGS-SLAM) system using a ROS 2 Action Server (`run_gs_slam_server`) and a dedicated client (`run_gs_slam_client`).
The system handles high-frequency camera streams by decoupling data reception from the heavy SLAM computation. It utilizes dynamic topic subscription throttling to manage processing lag and ensures data integrity through synchronized disk I/O.
## 🏗️ System Architecture & Process Diagram
The `StreamingGSSLAMServer` uses a multi-threaded Producer-Consumer architecture to bridge ROS 2 topics with the file-based SLAM algorithm.
### Process Flow Diagram
```mermaid
graph TD
%% ROS Layer
subgraph ROS_Layer [ROS 2 Callbacks]
RGB[/rgb topic/] -->|Resize 1200x680| Matcher
Depth[/depth topic/] -->|Float->UInt16 & Buffer| Matcher
Matcher{Timestamp Match} -->|Pair Found| IO_Queue[I/O Queue]
end
%% Disk Layer
subgraph Disk_Layer [Async Disk Writer]
IO_Queue -->|Dequeue| Writer(Disk Writer Thread)
Writer -->|Write .jpg| TmpRGB[tmp/frames/]
Writer -->|Write .png| TmpDepth[tmp/depths/]
Writer -.->|Sync Confirmation| Flag[Completed Set]
end
%% SLAM Layer
subgraph Logic_Layer [Main SLAM Loop]
Check{Files Ready?} -->|Yes| Launch[Launch Subprocess]
Launch -->|python slam_copy.py| Subprocess(SLAM Core)
Subprocess -->|Generate| Checkpoint[.npz / .ply Map]
%% Lag Control Logic
Subprocess -.->|Lag > 300| Unsub(Unsubscribe Topics)
Subprocess -.->|Lag < 150| Resub(Resume Subscription)
end
TmpRGB --> Check
TmpDepth --> Check
Flag --> Check
```
## ⚙️ The Mapping Process
The mapping process is performed incrementally, frame-by-frame, to allow for real-time streaming data ingestion.
1. **Initialization:**
- The server cleans up temporary directories (`tmp/frames`, `tmp/depths`).
- It publishes a `/slam_start_signal` to tell the client (Dummy Node) to start streaming.
2. **Streaming & Buffering:**
- Incoming RGB images are resized to **1200x680**.
- Incoming Depth images are converted from **Meters (Float32)** to **Millimeters (UInt16)**.
- Data is written to disk asynchronously via a separate thread to avoid blocking the ROS callbacks.
3. **Incremental Optimization:**
- The main loop monitors the disk for fully written frame pairs.
- **Frame 0:** The SLAM subprocess starts from scratch with `SLAM_LOAD_CHECKPOINT=False`.
- **Frame N:** The subprocess loads the "latest" checkpoint from the previous step (`SLAM_LOAD_CHECKPOINT=True`), integrates the new frame, optimizes the 3D Gaussians, and saves the state.
4. **Completion:**
- Once the client signals `data_transfer_done` and the queue is empty, the server waits for the final subprocess to exit and returns the result.
## 📂 .npz File Generation
The system does not generate the final map file directly in the ROS node Python code. Instead, it acts as a wrapper that triggers the generation via the core SLAM library:
1. **Trigger:** The `StreamingGSSLAMServer` executes a Python subprocess for every frame:
```bash
python scripts/slam_copy.py configs/replica/slam.py
```
2. **Environment Variables:** The server passes context via environment variables:
- `SLAM_START_FRAME`: Tells the script which specific frame to process.
- `SLAM_CHECKPOINT_NAME`: Set to `'latest'` to ensure continuous mapping.
3. **Output:** The `slam_copy.py` script performs the Gaussian Splatting optimization. It saves the intermediate and final map representations (typically **.npz** or **.ply** files) into the `experiments/` directory defined in the config.
---
## Prerequisites & Setup
Before running the commands, ensure the following steps are completed in **both terminals**:
0. Navigate to the workspace:
```bash
cd /mnt/HDD3/Sabrina/catkin_ws
```
1. **Activate Conda Environment:**
```bash
conda activate slam
```
2. **Source ROS Workspace:**
```bash
source install/setup.bash
```
**Note:** Ensure the `camera node` is running or prepared to run via the client, as SGS-SLAM requires active RGB and Depth topics.
## Execution Steps
Open **two separate terminal windows** and execute the commands below in their respective terminals.
### Terminal 1: Start the SLAM Action Server
The server waits for the client goal, manages temporary image storage, and launches the core SGS-SLAM optimization subprocesses per frame.
```bash
export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtiff.so.5.7.0" && ros2 run semantic_gs_slam_ros run_gs_slam_server
```
### Terminal 2: Start the Client and Data Streamer
The client sends the action goal to initiate the process and simultaneously acts as the **Dummy Node** to stream the RGB-D image sequence.
```bash
export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtiff.so.5.7.0" && ros2 run semantic_gs_slam_ros run_gs_slam_client
```
## 💾 Result Location
Upon completion of the SLAM process (when all frames are processed), the final reconstructed Gaussian Splatting map and experiment logs will be saved to the following directory:
```text
/mnt/HDD3/Sabrina/catkin_ws/src/SGS_SLAM/experiments/streamimg_test_ros
```
# 🤖 Object Query Server Node
- **Package:** `object_query`
- **Node Name:** `object_query_server`
- **Status:** Active
## 1. Overview
The **ObjectQueryServer** is a ROS 2 node designed to serve as a semantic database for the robot. It acts as a bridge between the robot's perception data and its navigation or task planning systems.
It loads a pre-built 3D semantic map (typically from Gaussian Splatting or Point Cloud data), calculates the centroid of labeled objects, and provides a service interface for other nodes to query the location of specific objects by name.
## 2. Dependencies
### System
- **ROS 2** (Humble/Jazzy)
- **Python 3.10+**
### Libraries
- `numpy`
- `json`
- `os` (Standard library)
### Custom Interfaces
- `object_query_interfaces` (Must be built in the workspace)
---
## 3. ROS 2 Interfaces
### 📡 Parameters
| Parameter Name | Type | Default Value | Description |
| :--- | :--- | :--- | :--- |
| `map_path` | `string` | `robot_ws/data/params.npz` | Absolute path to the `.npz` file containing 3D point data. |
| `semantic_path` | `string` | `robot_ws/data/semantic.json` | Absolute path to the `.json` file containing class labels. |
### 🛠 Services
**Service Name:** `/object_query`
**Type:** `object_query_interfaces/srv/ObjectQuery`
This service accepts an object name (case-insensitive) and returns its 3D coordinates.
#### Request Definition
```python
string name # e.g., "chair", "cup"
```
#### Response Definition
```python
bool found # True if object exists in DB
geometry_msgs/Point position # (x, y, z) centroid
string message # Status message
```
### 📢 Publishers
**Topic:** `/object_list`
**Type:** `std_msgs/msg/String`
Publishes the entire database of loaded objects as a JSON-formatted string upon startup. This is useful for debugging or verifying which objects were successfully loaded.
**Example Payload:**
```json
{
"chair": [1.5, 2.0, 0.0],
"table": [3.2, 1.1, 0.5],
"monitor": [-1.0, 0.5, 1.2]
}
```
---
## 4. Input Data Format
The node requires two specific files to function correctly.
### A. Map File (`.npz`)
A `numpy` archive containing the 3D data. It must contain the following keys:
- **`means3D`**: Array of shape `(N, 3)` containing XYZ coordinates.
- **`semantic_ids`**: Array of shape `(N,)` containing integer IDs corresponding to classes.
### B. Semantic File (`.json`)
A JSON file mapping integer IDs to human-readable class names.
**Schema:**
```json
{
"segmentation": [
{ "id": 1, "class": "chair" },
{ "id": 2, "class": "table" }
]
}
```
---
## 5. Usage
### Running via CLI
You can run the node directly using `ros2 run`. Ensure you override the parameters if your data is not in the default hardcoded path.
```bash
ros2 run home_robot object_query_server --ros-args \
-p map_path:="/path/to/your/data/params.npz" \
-p semantic_path:="/path/to/your/data/semantic.json"
```
### Running via Launch File
It is recommended to use a launch file to handle the paths cleanly.
```python
from launch import LaunchDescription
from launch_ros.actions import Node
def generate_launch_description():
return LaunchDescription([
Node(
package='home_robot',
executable='object_query_server',
name='object_query_server',
output='screen',
parameters=[{
'map_path': '/home/user/data/scene_01.npz',
'semantic_path': '/home/user/data/classes.json'
}]
)
])
```
### Testing the Service
To manually test if the node is working, use the ROS 2 CLI:
```bash
ros2 service call /object_query object_query_interfaces/srv/ObjectQuery "{name: 'chair'}"
```
---
## 6. Notes & Troubleshooting
:::info
**Calculation Logic:**
The node calculates the **centroid** (mean position) of all points belonging to a specific semantic ID. It currently does not account for the object's orientation or bounding box dimensions.
:::
:::warning
**Common Errors:**
1. **"Map file not found":** Double-check the absolute paths provided in the parameters.
2. **"Found: False":** The query is case-insensitive, but the spelling must match the JSON class name exactly.
:::
## 7. Theory of Operation
### A. Object Position Computation
The node determines the position of an object (e.g., a "chair") by calculating its geometric **centroid** from the raw point cloud data. This process occurs once during the initialization phase (`load_semantic_map`).
The algorithm follows these steps:
1. **Data Loading:**
The node loads `means3D` (an array of XYZ coordinates for every point in the map) and `semantic_ids` (an array of integer labels corresponding to those points).
2. **Semantic Masking:**
For every unique object class found in the JSON file (e.g., `id: 1` -> "chair"), the node creates a **boolean mask**. This mask isolates only the points in the 3D map that belong to that specific ID.
$$P_{chair} = \{ p \in \text{means3D} \mid \text{semantic\_id}(p) = 1 \}$$
3. **Centroid Calculation:**
The node calculates the arithmetic mean of these isolated points along the X, Y, and Z axes. This results in a single 3D coordinate representing the center of the object cluster.
$$C_{x,y,z} = \frac{1}{N} \sum_{i=1}^{N} P_i$$
*Where $N$ is the total number of points belonging to that object class.*
4. **Storage:**
This calculated coordinate is stored in the `self.object_db` dictionary with the object name (lowercased) as the key.
### B. Object Listing Process
The Object Listing process is designed to provide downstream nodes (or the user) with a complete catalog of available objects immediately after the map is processed.
1. **Database Serialization:**
Once the `object_db` is fully populated with names and centroid coordinates, the node uses Python's `json.dumps()` method to serialize the dictionary into a string format.
2. **Publishing:**
This JSON string is wrapped in a `std_msgs/String` message and published to the `/object_list` topic.
* **Why JSON?** Using a JSON string allows the node to publish a variable number of objects without needing a custom dynamic array message definition. It can be easily parsed by other Python or C++ nodes using standard JSON libraries.