# Object Detection Documentation
This documentation is a helpful guide to the files inside the object_detection folder in the repo.
# detector.py
### What It Is
Estabishes an ```ObjectDetectorNode``` class to serve as a parent class for detectors.
### Parameters
```self.bridge``` is a Utility to convert ROS2 Images to OpenCV ones, and back.
```tss``` is an Approximate Time Synchronizer (link here) to match topic streams when they come out of order. It is initialized with:
- ```queue_size```: how many incoming messages from each subscriber the synchronizer keeps in memory while trying to form timestamp-matched tuples. If new ones come in, the oldest is dropped.
- ```acceptable_delay```: the delay between messages that is allowed to match them together.
When ```tss``` finds a matching tuple, it calls ```synced_callback()```
```self.target_size```: The size of the image that is outputted.``
### Methods
```synced_callback(self, rgb_msg, depth_msg)```:
* Converts the incoming ```rgb_msg``` and ```depth_msg``` ROS2 images to OpenCV images.
* Calls the inference method to infer objects in the image.
* Takes the lebelled image (the result generated by the object's inference method) and converts it back into a ROS2 image.
* Publishes the image onto the /mask topic.
```inference()```: an overridden method that changes behaviour depending on the subclass.
# color_detector.py
### What It Is:
This class is a subclass of ```detector.py```.
A detector class to detect red pixels in an BGR image.
### Methods
```inference()```:
* Converts the input BGR image to HSV.
* Defines HSV ranges intended to capture red pixels.
* Creates two binary masks for each range, ORs them together.
* Normalizes the combined mask to 0-1 values by integer dividing by 255.
* Shows the mask and rgb with ```cv2.imshow()``` for debugging purposes.
* Returns the mask as float32.
# example_detector.py
### What It Is:
A sample class you can use to make your own detectors!
# gate_detector.py
### What it is:
A Detector class used to run a [YOLO](https://docs.ultralytics.com/) inference model on an RGB image and post-process the results.
### Methods
-Run a YOLO inference model on the rgb image.
- Create an empty label image (numpy array)
- Go through the results of the inference model, find the coordinates of each object, and
- Draw a bounding box around each identified object.
# manual_inference.py
### What It Is:
A class to utilize an onnx inference utility to run inference on an image. Likely a predecessor for ```onnx_segmentation_detector.py```.
### Methods
```init```: The class initializes by verifying the onnx model path, setting up an inference session, and setting variables according to the nature of the model output (2 outputs (raw_preds + proto) or 3 outputs (det, coefs, proto)).
```preprocess()```
- Loads an image as a OpenCV image
- Resizes it while preserving it's aspect ratio, and pads it to make it square.
- Converts the color, format of the image.
- Adds a batch dimension (the first dimension in a tensor that tells the model how many images are included in the current input.)
```postprocess()```:
- Unpacks the tensor shapes (removes the batch dimension)
- Removes low confidence detections.
- takes the shared prototype (proto) mask features (generic pieces that the model uses to create a mask for a detected object) and the coefficients (coeff) and uses them to create one mask for each detected object.
- We then get logits - unbounded values that signify the raw output of the model.
- The logits are then converted to 0-1 ranged values using a [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) function.
- If the mask values are above a threshold, they are converted into 255, otherwise 0. This gives us a binary result that tells us whether a pixel is part of an object or not.
- The method then undoes the letterboxing and scaling done in ```preprocess()```, and draws a bounding box and a coloured mask on theoriginl image for each detected object. This image is the result.
```infer()```:
- runs the pre-processing step to get the tensor fed into the model, the original BGR image, and scale and padding factors.
- Feeds the tensor into the model
- Checks to see the type out output the model returns
- if 2-output returned (raw_preds, proto), then the method decodes the bounding boxes, class logits, and mask coefficients.
- If 3-outputs, nothing special needs to be done, everything is already decoded.
- The method then runs post-processing on the model results.
# onnx_segmentation_detector.py (Himmat, Peter)
### What It Is
**ONNX** (Open Neural Network Exchange) is a portable format for ML models (.onnx).
OnnxSegmentationDetector is a subclass of **ObjectDetectorNode** that loads an ONNX segmentation model and use it to label the synchronized RGB + depth frames.
### How it is used
**Subscribers**: /rgb, /depth → continuous stream of frames
**Services**: /change_model, /set_inference_camera
**Publishes to**: /mask
**Pipeline**
/rgb + /depth messages arrive
- ApproximateTimeSynchronizer (matches frames by timestamp)
- synced_callback() (called automatically when frames are synchronized)
+ imgmsg_to_cv2() (convert ROS Image messages → NumPy arrays)
+ inference(rgb, depth) (subclass method that runs ONNX model)
+ call preprocess(), session.run(), postprocess_onnx() (prepare data -> pass through model -> postprocess)
+ return NumPy array (segmentation mask)
+ cv2_to_imgmsg() (convert NumPy mask → ROS Image message)
+ publish() (send labeled mask image to /mask topic)
### Methods and Parameters
- **`__init__(self)`**
Initializes parameters and sets up the node environment.
**1. Model + Node Configuration Parameters**
- `model_path` : path or filename of the ONNX model (e.g., `"gate.onnx"`)
- `conf_threshold` : minimum confidence score to accept detections (default: 0.4)
- `mask_threshold` : threshold for turning mask logits into binary masks (default: 0.3)
- `input_size` : model input image size (default: 640 × 640)
- `top_k` : limits how many detections to keep (default: 5; `-1` means no limit)
- `providers` : hardware execution providers for ONNX Runtime (e.g., `"CUDAExecutionProvider"`)
- `debug` : toggles debug logging and visualization (default: `True`)
**2. Services**
- `/change_model` : allows switching the ONNX model dynamically via `change_model_callback()`
- `/set_inference_camera` : enables/disables inference or switches between front/bottom cameras via `set_inference_camera_callback()`
**3. Logging / Runtime Info**
- Logs the ONNX providers being used
- Logs confirmation that the segmentation detector initialized successfully
- Logs key model settings (model path, confidence threshold, and input size)
- **`__del__(self)`**
Cleans up OpenCV windows when the node shuts down (if debug mode is enabled).
- **`change_model_callback(self, request, response)`**
Handles requests from other nodes to switch the loaded ONNX model.
- **`set_inference_camera_callback(self, request, response)`**
Handles service requests to enable/disable inference or switch between front/bottom cameras.
- **`load_model(self)`**
Loads the ONNX model into an `onnxruntime.InferenceSession`, identifies whether it’s a 2-output or 3-output model, and logs model info.
- **`apply_depth_masking(self, img, depth)`**
Optionally tints pixels farther than 10 meters blue to help the model ignore distant areas.
- **`preprocess(self, img, depth=None)`**
Resizes and letterboxes the image, converts BGR→RGB, normalizes pixel values, and prepares the tensor input for the ONNX model.
- **`postprocess_onnx(self, det, coefs, proto, orig_shape, scale, pad_x, pad_y)`** —
Converts raw ONNX model outputs into a *segmentation mask*.
Filters low-confidence detections, reconstructs and thresholds object masks, and combines them into a labeled image.
- **`inference(self, rgb, depth)`**
Calls `preprocess()` → runs the ONNX model (`session.run()`) → calls `postprocess_onnx()` → returns the segmentation mask as a NumPy array.
- **`main(args=None)`**
ROS 2 entry point: initializes `rclpy`, creates the node, spins (keeps it running), and cleans up on shutdown.