Object_Detection

# Object Detection Documentation This documentation is a helpful guide to the files inside the object_detection folder in the repo. # detector.py ### What It Is Estabishes an ```ObjectDetectorNode``` class to serve as a parent class for detectors. ### Parameters ```self.bridge``` is a Utility to convert ROS2 Images to OpenCV ones, and back. ```tss``` is an Approximate Time Synchronizer (link here) to match topic streams when they come out of order. It is initialized with: - ```queue_size```: how many incoming messages from each subscriber the synchronizer keeps in memory while trying to form timestamp-matched tuples. If new ones come in, the oldest is dropped. - ```acceptable_delay```: the delay between messages that is allowed to match them together. When ```tss``` finds a matching tuple, it calls ```synced_callback()``` ```self.target_size```: The size of the image that is outputted.`` ### Methods ```synced_callback(self, rgb_msg, depth_msg)```: * Converts the incoming ```rgb_msg``` and ```depth_msg``` ROS2 images to OpenCV images. * Calls the inference method to infer objects in the image. * Takes the lebelled image (the result generated by the object's inference method) and converts it back into a ROS2 image. * Publishes the image onto the /mask topic. ```inference()```: an overridden method that changes behaviour depending on the subclass. # color_detector.py ### What It Is: This class is a subclass of ```detector.py```. A detector class to detect red pixels in an BGR image. ### Methods ```inference()```: * Converts the input BGR image to HSV. * Defines HSV ranges intended to capture red pixels. * Creates two binary masks for each range, ORs them together. * Normalizes the combined mask to 0-1 values by integer dividing by 255. * Shows the mask and rgb with ```cv2.imshow()``` for debugging purposes. * Returns the mask as float32. # example_detector.py ### What It Is: A sample class you can use to make your own detectors! # gate_detector.py ### What it is: A Detector class used to run a [YOLO](https://docs.ultralytics.com/) inference model on an RGB image and post-process the results. ### Methods -Run a YOLO inference model on the rgb image. - Create an empty label image (numpy array) - Go through the results of the inference model, find the coordinates of each object, and - Draw a bounding box around each identified object. # manual_inference.py ### What It Is: A class to utilize an onnx inference utility to run inference on an image. Likely a predecessor for ```onnx_segmentation_detector.py```. ### Methods ```init```: The class initializes by verifying the onnx model path, setting up an inference session, and setting variables according to the nature of the model output (2 outputs (raw_preds + proto) or 3 outputs (det, coefs, proto)). ```preprocess()``` - Loads an image as a OpenCV image - Resizes it while preserving it's aspect ratio, and pads it to make it square. - Converts the color, format of the image. - Adds a batch dimension (the first dimension in a tensor that tells the model how many images are included in the current input.) ```postprocess()```: - Unpacks the tensor shapes (removes the batch dimension) - Removes low confidence detections. - takes the shared prototype (proto) mask features (generic pieces that the model uses to create a mask for a detected object) and the coefficients (coeff) and uses them to create one mask for each detected object. - We then get logits - unbounded values that signify the raw output of the model. - The logits are then converted to 0-1 ranged values using a [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) function. - If the mask values are above a threshold, they are converted into 255, otherwise 0. This gives us a binary result that tells us whether a pixel is part of an object or not. - The method then undoes the letterboxing and scaling done in ```preprocess()```, and draws a bounding box and a coloured mask on theoriginl image for each detected object. This image is the result. ```infer()```: - runs the pre-processing step to get the tensor fed into the model, the original BGR image, and scale and padding factors. - Feeds the tensor into the model - Checks to see the type out output the model returns - if 2-output returned (raw_preds, proto), then the method decodes the bounding boxes, class logits, and mask coefficients. - If 3-outputs, nothing special needs to be done, everything is already decoded. - The method then runs post-processing on the model results. # onnx_segmentation_detector.py (Himmat, Peter) ### What It Is **ONNX** (Open Neural Network Exchange) is a portable format for ML models (.onnx). OnnxSegmentationDetector is a subclass of **ObjectDetectorNode** that loads an ONNX segmentation model and use it to label the synchronized RGB + depth frames. ### How it is used **Subscribers**: /rgb, /depth → continuous stream of frames **Services**: /change_model, /set_inference_camera **Publishes to**: /mask **Pipeline** /rgb + /depth messages arrive - ApproximateTimeSynchronizer (matches frames by timestamp) - synced_callback() (called automatically when frames are synchronized) + imgmsg_to_cv2() (convert ROS Image messages → NumPy arrays) + inference(rgb, depth) (subclass method that runs ONNX model) + call preprocess(), session.run(), postprocess_onnx() (prepare data -> pass through model -> postprocess) + return NumPy array (segmentation mask) + cv2_to_imgmsg() (convert NumPy mask → ROS Image message) + publish() (send labeled mask image to /mask topic) ### Methods and Parameters - **`__init__(self)`** Initializes parameters and sets up the node environment. **1. Model + Node Configuration Parameters** - `model_path` : path or filename of the ONNX model (e.g., `"gate.onnx"`) - `conf_threshold` : minimum confidence score to accept detections (default: 0.4) - `mask_threshold` : threshold for turning mask logits into binary masks (default: 0.3) - `input_size` : model input image size (default: 640 × 640) - `top_k` : limits how many detections to keep (default: 5; `-1` means no limit) - `providers` : hardware execution providers for ONNX Runtime (e.g., `"CUDAExecutionProvider"`) - `debug` : toggles debug logging and visualization (default: `True`) **2. Services** - `/change_model` : allows switching the ONNX model dynamically via `change_model_callback()` - `/set_inference_camera` : enables/disables inference or switches between front/bottom cameras via `set_inference_camera_callback()` **3. Logging / Runtime Info** - Logs the ONNX providers being used - Logs confirmation that the segmentation detector initialized successfully - Logs key model settings (model path, confidence threshold, and input size) - **`__del__(self)`** Cleans up OpenCV windows when the node shuts down (if debug mode is enabled). - **`change_model_callback(self, request, response)`** Handles requests from other nodes to switch the loaded ONNX model. - **`set_inference_camera_callback(self, request, response)`** Handles service requests to enable/disable inference or switch between front/bottom cameras. - **`load_model(self)`** Loads the ONNX model into an `onnxruntime.InferenceSession`, identifies whether it’s a 2-output or 3-output model, and logs model info. - **`apply_depth_masking(self, img, depth)`** Optionally tints pixels farther than 10 meters blue to help the model ignore distant areas. - **`preprocess(self, img, depth=None)`** Resizes and letterboxes the image, converts BGR→RGB, normalizes pixel values, and prepares the tensor input for the ONNX model. - **`postprocess_onnx(self, det, coefs, proto, orig_shape, scale, pad_x, pad_y)`** — Converts raw ONNX model outputs into a *segmentation mask*. Filters low-confidence detections, reconstructs and thresholds object masks, and combines them into a labeled image. - **`inference(self, rgb, depth)`** Calls `preprocess()` → runs the ONNX model (`session.run()`) → calls `postprocess_onnx()` → returns the segmentation mask as a NumPy array. - **`main(args=None)`** ROS 2 entry point: initializes `rclpy`, creates the node, spins (keeps it running), and cleans up on shutdown.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.