Lego Detection Project

# Detecting and Recognizing Lego Bricks **Authors:** - Cristian Cutitei 5336767 - Bakul Jangley 6055826 - Tjerk van der Weij 4858999 **GitHub code:** https://github.com/bakuljangley/legoBricksClassifier.git # Introduction LEGO is a popular toy enjoyed by people of all ages, fostering creativity and imagination. Over time, the LEGO bricks can end up in piles of random bricks. The process of making an inventory of your LEGO bricks can be time-consuming and tedious, hindering the fun of building constructions. We simplify the sorting process by introducing a computer vision algorithm to automatically detect and recognize LEGO bricks from an image of a pile of bricks. This enables users to efficiently locate specific bricks needed for their building projects, enhancing the overall LEGO building experience. For future work, building recommendations can be given to the user to present different LEGO constructions to build with a set of the currently available bricks. In this project, we will focus on the detection and recognition part. ## Current Approaches Already various approaches have been introduced to tackle the LEGO bricks detection and recognition problem. Existing solutions explore the use of MobileNet, ResNet and InceptionV3 combined with semantic segmentation using GrabCut [[1](https://github.com/kirill-sidorchuk)] or train a custom CNN from scratch [[2](https://ladvien.com/lego-deep-learning-classifier/)]. Other approaches also make use of Mask R-CNN and Faster R-CNN with synthetic data to train the model [[3](https://www.mdpi.com/1424-8220/23/4/1898)]. Another approach is to use end to end networks capable of regression (localization) and classification such as RetinaNet-50 and YOLOv5 [[4](https://mostwiedzy.pl/en/publication/hierarchical-2-step-neural-based-lego-bricks-detection-and-labeling,155119-1)]. Current models aren’t tested for occlusion and usually models are trained to recognize single blocks. We aim on designing a model robust to occlusion and able to identify individual LEGO bricks from a crowded image containing many such objects. ## Problem Definition Contemporary models are not tested for occlusion and usually models are trained to recognize single blocks. The models that can handle images of a collection of bricks have low accuracies on detecting single LEGO bricks when the bricks occlude each other. In this project we aim to simplify the process of identifying and localizing individual LEGO bricks in an image of a collection of bricks. The aim is to investigate how occlusion and lighting conditions affect the performance of computer vision based deep learning models. ## Research Questions Our project will explore the following research questions: #### What is an appropriate model for object classification regarding the problem of recognizing LEGO bricks? In this study, we aim to determine which neural network architecture currently performs best on the classification of a LEGO dataset. We want to examine which modern neural network architecture achieves the highest accuracy in classifying LEGO pieces and how different architectures compare in terms of training time and computational efficiency. #### What is the effect of occlusion, poor lighting conditions and cluttered backgrounds on the performance of our model? To ensure the robustness and real-world applicability of the classification models, it is essential to evaluate their performance under various challenging conditions. This study specifically examines how well the models handle occlusion, poor lighting conditions and cluttered backgrounds when classifying LEGO pieces. #### How does the color of the objects affect the accuracy of image recognition models? While the use of grayscale images simplifies image recognition tasks by reducing computational complexity and focusing on texture and shape, the absence of color information can impact the performance of classification models. This study explores how the inherent colors of LEGO pieces, when converted to grayscale, affect the accuracy and robustness of image recognition models. # Methodology We use a two-staged model for classification and regression (localization). The classification task is performed using a neural network trained on a dataset consisting of LEGO pieces. The regression task is achieved using computer vision techiques for edge detection to propose bounding boxes on images with multiple LEGO bricks and then the proposal regions are fed into the classifier. ## Datasets **Object Recognition Dataset:** The B200C LEGO Detection Dataset [[5]](https://www.kaggle.com/datasets/ronanpickell/b200c-lego-classification-dataset) consists of a collection of 800,000 coloured images belonging to 200 classes. The dataset features images captured from different vantage points and a variety of backgrounds. **Object Detection Dataset:** The B200 LEGO Detection Dataset [[6]](https://www.kaggle.com/datasets/ronanpickell/b100-lego-detection-dataset) is fully synthetic, and attempts to mimic photo-realism as closely as possible. It consists of 800,000 coloured images of LEGO bricks belonging to the same 200 classes as the object recognition dataset. It features various backgrounds which provide an additional challenge for object detection. Furthermore, the cluster of LEGO bricks is often crowded with bricks being occluded. The size and number of classes in the dataset ensures that we have sufficient data to train an image recognition model. The varied backgrounds and occlusion in the detection dataset also ensures that our data is realistic and true to the problem statement. **Augmented Dataset:** To perform our experiments on model performance variation due to occulusion and lighting conditions, we designed a new dataset modifying the object recognition dataset. We added a black box (randomly) to images to occlude the object and changed the lighting conditions. <div style="display:flex; justify-content:space-around;"> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/rJHMwaHSR.jpg" alt="Image 1" style="width:100%;"> <p>Original Image</p> </div> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/rJVGDpSrC.jpg" alt="Image 2" style="width:65%;"> <p>Augmented Brightness</p> </div> </div> <div style="display:flex; justify-content:space-around;"> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/BJyJ_aHHR.jpg" alt="Image 1" style="width:100%;"> <p>Original Image</p> </div> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/H14AD6Br0.jpg" alt="Image 2" style="width:100%;"> <p>Occluded Image</p> </div> </div> <p style="text-align: center;">Figure 1: Skip Connections in a Residual Network</p> ## Object Recognition We tried to perform the classification task using two types of models: a custom neural network and using pretrained ResNet models. ### The LeNet CNN The LeNet class implements a simple and efficient CNN for image classification. It features two convolutional layers followed by ReLU activations and max pooling layers to extract features, and two fully connected layers with a final Log-Softmax activation to output class probabilities. ### The ResNet Residual Networks (ResNets) are a type of deep neural network architecture that introduce skip connections or "residual connections" to address various challenges in training deep networks. In deep networks, gradients can become very small during backpropagation, leading to slow or stalled training. Residual connections provide an alternate pathway for gradients to flow, helping to maintain gradient magnitudes and making it easier to train very deep networks. A skip connection skips one or more layers and performs identity mapping by shortcutting the original input to the output of these layers. Mathematically, if the input is x, the output after a few layers (let's say two layers for simplicity) without a skip connection would be F(x). With a skip connection, the output becomes: $$y = F(x) + x $$ <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/HkbjZ5rSA.png" alt="Resized Image" width="400"> <p>Figure 2: Skip Connections in a Residual Network</p> </div> The identity mapping ensures that the gradient can flow directly through the skip connection, making the training of very deep networks easier. Thus, the skip connections help to propagate gradients back through the network, effectively mitigating the vanishing gradient problem. Each variant of ResNet differs in the depth of a network which results in: - **ResNet-18:** A shallower network that is computationally efficient and faster to train. - **ResNet-34:** Provides a balance between depth and performance, capturing more complex features than ResNet-18. - **ResNet-50:** A deeper network capable of learning more intricate patterns, though it requires more computational resources. By combining pretrained weights from these networks with a fully connected layer for classification, we aim to create a model that captures a wide range of features from simple to complex, improving overall performance. ## Object Detection From an image with many LEGO bricks, we need to generate proposal regions and then feed them into the classifier. We herefore use bounding boxes to specify the spatial location of interest, in our case the location of the bricks. Edge detection [[7]](https://learnopencv.com/edge-detection-using-opencv/) is a computer vision technique used to identify the boundaries of objects or regions within an image. To show bounding boxes in an image, we used Canny edge detection by OpenCV as it is a robust and flexible technique. ### Canny Edge Detection Canny edge detection is a multi-stage algorithm used to identify edges in an image using gradient-based methods and thresholding. It involves: 1. Noise Reduction: Smooth the image using a Gaussian blur to reduce noise and make edge detection more reliable. 1. Gradient Calculation: Compute the intensity gradients of the image to find regions with high intensity changes, indicating potential edges. 1. Non-Maximum Suppression: Thin out the edges to one-pixel-wide lines by suppressing non-maximum gradient values, leaving only the strongest edges. 1. Double Thresholding: Classify edges as strong, weak, or non-edges based on two threshold values. Strong edges are kept, weak edges are kept only if they are connected to strong edges, and non-edges are discarded. 1. Edge Tracking by Hysteresis: Finalize the edge detection by connecting weak edges to strong edges if they are adjacent, ensuring continuity and robustness of the detected edges. However, simple edge detection proved to be too inaccurate to draw the correct bounding boxes. Therefore, we utilizied adaptive thresholding to better localize individual LEGO bricks. ![canny_edge_detection](https://hackmd.io/_uploads/Hyf6EcBBA.png) <p style="text-align: center;">Figure 3: Canny Edge Detection</p> ### Adaptive Thresholding Adaptive thresholding is a technique used in image processing to convert a grayscale image into a binary image. We used this technique to try to enhance edge detection. Unlike global thresholding, which uses a single threshold value for the entire image, adaptive thresholding determines the threshold for smaller regions or neighborhoods within the image. Adaptive thresholding is particularly effective for images with varying lighting conditions or non-uniform illumination. By considering local pixel intensity variations, adaptive thresholding can preserve details in different parts of the image that might be lost with a single global threshold. In figure .. the result of applying adaptive thresholding to the original image is visualized. As can be seen, this technique had the effect of increased detection of the cluttered background. As a result, the bounding boxes were not accurate. ![adaptive_thresholding](https://hackmd.io/_uploads/HkjZH5SBA.png) <p style="text-align: center;">Figure 4: Adaptive Thresholding</p> ### Blob Detection Blob detection is a technique used in computer vision to detect regions in an image that differ in properties such as brightness or color compared to surrounding regions. These regions, known as "blobs," can correspond to objects, parts of objects, or areas of interest in the image. Blob detection identifies regions based on significant differences in intensity (for grayscale images) or color (for color images) compared to the surrounding pixels. The results of the blob detection algorithm (which were very inaccurate) are presented below. Finally, we jumped back to our original method of detecting edges to see if this could be altered and fine-tune to give better results. <div style="text-align:center;"> <p>Blob Detection</p> <img src="https://hackmd.io/_uploads/B1YXY9BBC.png" alt="Resized Image" width="250"> <p style="text-align: center;">Figure 5: Blob Detection</p> </div> ### Canny edge detection with morphological operations In the end, morphological operations were used to better separate individual LEGO bricks. To enhance the edges and make them more prominent, dilation and erosion operations are applied successively. Erosion [[8]](https://www.geeksforgeeks.org/python-opencv-morphological-operations/) mainly involves eroding (thickening) the outer surface of the image. Dilation [[8]](https://www.geeksforgeeks.org/python-opencv-morphological-operations/) is exactly the other way around, where the operation involves dilating (thinning) the outer surface of the image. Also, fine-tuning the parameters of the edge detection gave better results. Contours are identified using `cv2.findContours` and then filtered based on area thresholds to remove noise and select only significant objects. ### Bounding Box Estimation Of all the techniques we explored, Canny edge detection with morphological operations [[9]](https://medium.com/@rajdeepsingh/a-quick-reference-for-bounding-boxes-in-object-detection-f02119ddb76b) performed the best and was used in the final model. Initially, the input image is preprocessed by converting it to grayscale and applying Gaussian blur to smoothen the image and reduce noise, which helps in better edge detection. Edge detection is then performed using the Canny edge detection algorithm to identify significant edges in the image, which is used for drawing bounding boxes. Bounding boxes are drawn around the filtered contours on the input image, providing a visual representation of the detected objects. For each detected object, a cutout image is extracted from the original image based on the bounding box coordinates. These cutout images will then be processed further by the classification model. <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/HyMVGarBA.png" alt="Resized Image" width="600"> <p>Figure 6: Object Detection Algorithm </p> </div> ### Regular Noise Removal from Background While creating the algorithm for edge detection we observed that it got "stuck" on the edges some of the backgrounds had. Since some of those edges were regular in size they could easily be confused for rectangles. That seemed to be a problem since lego bricks also look like rectangles, plus it was way overstating the number of bricks in an image in these cases. Notice that the term used to describe the background noise was: "regular". That gave us enough of a clue as to what was the solution: <strong> Frequency Removal from Magnitude Spectrum </strong> Here is an example of the images which presented this type of noise: <div style="display:flex; justify-content:space-around;"> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/rys6TEur0.png" alt="Image 1" style="width:90%;"> </div> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/r1opaVurA.png" alt="Image 2" style="width:90%;"> </div> </div> ![0](https://hackmd.io/_uploads/rys6TEur0.png) ![11](https://hackmd.io/_uploads/r1opaVurA.png) <p style="text-align: center;">Figure 7: Images with noise</p> <br> While looking at the fourier spectrum of one of these images compared to one with smooth background we can quickly see the "regular pattern". ![fourier_0](https://hackmd.io/_uploads/BJSm8S_SC.png) ![fourier_7](https://hackmd.io/_uploads/BkSX8rdSA.png) <p style="text-align: center;">Figure 8: Fourier Transforms</p> By removing the "specs" outside of the origin of the magnitude spectral map(right image), we esentially remove the line pattern in the rug. Removal of the frequencies is done automatically by removing blobs where the intensity is higher than a treshold computed from the average heuristically, ignoring the origin. <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/r1R2iSdS0.jpg" alt="Image 2" style="width:40%;"> </div> ![WhatsApp Image 2024-06-13 at 12.29.25](https://hackmd.io/_uploads/ryXVcBOrA.jpg) <p style="text-align: center;">Figure 9: Comparison between reconstructed and original image</p> The regularness of the pattern is removed, becoming random noise, which is easier to filter. # Results ## Model Selection for Image Recognition ### The LeNet CNN The LeNet model had very poor accuracy on the test images (< 50%) and seemed too shallow to learn features from the input. ### The ResNet We trained a ResNet-34 model without converting the images to grayscale to investigate the effect of color on the model performance. #### Table 1: Performance metrics of ResNet-34 trained on RGB images | Model | Accuracy | Precision | F1 Score | |------------|----------|-----------|----------| | ResNet 34 | 0.8860 | 0.8930 | 0.8764 | Interestingly, the accuracy of the model drops slightly, compared to the grayscale. The color may be adding noise or unrequired artifacts to the input, and since it does not a part of class features. We trained ResNet 18, 34 and 50 models on the original object detection dataset by using pretrained weights and adding a fully connected layer for classification. The images were pre-processed by resizing to the standard ResNet input size (224x224), converted to grayscale and normalizing with ImageNet mean and standard deviation. #### Table 2: Performance Metrics of ResNet Models on Object Recognition | Model | Accuracy | Precision | F1 Score | |------------|----------|-----------|----------| | ResNet 18 | 0.8240 | 0.7837 | 0.7874 | | ResNet 34 | 0.9059 | 0.9089 | 0.8981 | | ResNet 50 | 0.9344 | 0.9344 | 0.9329 | The ResNet-34 and 50 offer decent accuracies and precision, but the ResNet-34 model has faster inference and training. <table> <tr> <th>ResNet - 34 Confusion Matrix</th> <th>ResNet - 50 Confusion Matrix</th> </tr> <tr> <td><img src="https://hackmd.io/_uploads/r1X4f-vH0.png" alt="Image 1" style="width: 300px;"/></td> <td><img src="https://hackmd.io/_uploads/H174zWvB0.png" alt="Image 2" style="width: 300px;"/></td> </tr> </table> <p style="text-align: center;">Figure 10: Confusion Matrices of ResNet-34 and ResNet-50</p> The models misclassify objects of similar shape/geometry. For example, between the two classes shown below, we suspect that due to the images being converted to grayscale and the background being noisy, the model is unable to differentiate between the two instances. These misclassfications decrease (but are still present) with the ResNet-50 compared to ResNet-34, implying that deeper models are not necessarily more suited to the problem. <div style="display:flex; justify-content:space-around;"> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/ryC77-vSA.png" alt="Image 1" style="width:70%;"> <p>Class 29</p> </div> <div style="text-align:center;"> <img src="https://hackmd.io/_uploads/SkCQQZvSR.png" alt="Image 2" style="width:70%;"> <p>Class 7</p> </div> </div> <p style="text-align: center;">Figure 11: Misclassification of objects</p> We selected the ResNet-34 to perform further experiments as the accuracy and precision gains between the ResNet-34 and 50 are not significant and this model requires shorter training time for the subsequent experiments. ## Effect of occlusion, poor lighting conditions and cluttered backgrounds on the performance We compared the performance of a ResNet-34 model trained on the original dataset and tested the accuracies on the original and modified datasets with occlusion and brightness changes. #### Table 3: Performance metrics of ResNet-34 trained on original dataset | Dataset | Accuracy | Precision | F1 Score | |------------ |----------|-----------|----------| | Original | 0.9059 | 0.9089 | 0.8981 | | Occluded | 0.6467 | 0.8084 | 0.6830 | | Brightness Variations | 0.8661 | 0.8697 | 0.8587 | The performance metrics suggest that while a model trained on a non-occluded dataset can handle changes in lighting conditions to a degree, since we do not loose shape information, there is a marked decrease in accuracy when predicting on occluded instances. If we train a ResNet-34 model on the occluded dataset, the model performs significantly better while testing. #### Table 4: Performance metrics of ResNet-34 trained on occluded dataset | Dataset | Accuracy | Precision | F1 Score | |------------ |----------|-----------|----------| | Original | 0.8878 | 0.8709 | 0.8631 | | Occluded | 0.8785 | 0.8584 | 0.8538 | | Brightness Variations | 0.8640 | 0.8581 | 0.8402 | ## Final Detection and Recognition Model The complete detection and recognition model can be seen below. In the end, objects in the image are localized by the edge detection and recognized by our ResNet-34 model. ![pipeline](https://hackmd.io/_uploads/BkxKQHdSC.jpg) <p style="text-align: center;">Figure 12: Final pipeline</p> With the abovementioned pipeline, we had 3 batches of tests. First run was using the ground truth annotations from the B200C dataset to crop the large image containing the pile of bricks then running our ResNet solution on each image, and finally comparing our predictions with the labels from the dataset. In the tables below, the overall and random sample performance of our model when trained on the original and occluded dataset can be seen. #### Table 5: Overall and random sample performance of our model on B200C Dataset when trained on Original dataset | Dataset |Sample | Accuracy | Precision | F1 Score | | -------- | --- | -------- | --------- | -------- | | Original | Overall | 0.8609 | 0.8728 | 0.8531 | | Original | 34 | 0.8625 | 0.8813 | 0.8528 | | Original | 61 | 0.8775 | 0.8816 | 0.8645 | | Original | 86| 0.8325 | 0.8402 | 0.8157 | | Original | 139 | 0.8525 | 0.8552 | 0.8342 | #### Table 6: Overall and random sample performance of our model on B200C Dataset when trained on Occluded dataset | Dataset | Sample | Accuracy | Precision | F1 Score | | -------- | --- | -------- | --------- | -------- | | Occluded | Overall | 0.8649 | 0.8458 | 0.8417 | | Occluded | 9 | 0.8625 | 0.8366 | 0.8344 | | Occluded | 81 | 0.8525 | 0.8227 | 0.8236 | | Occluded | 133 | 0.8800 | 0.8612 | 0.8550 | | Occluded | 188 | 0.8550 | 0.8412 | 0.8305 | The results from the B200C dataset are remarkably close to the original model results, meaning that it is robust to aspect ratio changes stemming from resizing the bounding box to a square. The second test was to prove that the proposed image segmentation works. We took the bounding boxes from our solution and matched them with the closest intersecting box from the dataset itself. This gave us an accuracy of 0.87. The IOU test gave a poorer result of 0.63. This is to be expected, since some of our bounding boxes span multiple bricks and for most of our bounding boxes we span the brick edge to edge, unlike the dataset which also has background in it. Finally, we ran the pipeline end to end. We verified the accuracy from comparing the bounding boxes we found with the closest in the dataset annotation, just like above, and got an accuracy of 0.71. It is lower than the first test run due to the cumulative error from misclassifying and wrong segmentation. # Discussion ### Research Questions #### What is an appropriate model for object classification regarding the problem of recognizing LEGO bricks? As can be seen in the results, we found ResNet-34 to be an appropriate model for the recognition of LEGO bricks. The accuracy and precision gains between the ResNet-34 and the ResNet-50 were not significant, and the former comes with faster inference and shorter training time. Furthermore, the accuracy and precision of ResNet-18 was not satisfactory, as were the results of the LeNet model. #### What is the effect of occlusion, poor lighting conditions and cluttered backgrounds on the performance of our model? The ResNet-34 model trained on the original dataset is able to perform well if tested with lighting variations but it has very poor results on the occluded dataset. If we train a ResNet-34 model on the occluded dataset, the model performs significantly better while testing and on the final pipeline. This suggest that although the model can adapt to lighting variations, it will not be able to handle occlusion unless trained specifically with data designed to feature occluded objects. #### How does the color of the objects affect the accuracy of image recognition models? Surprisingly, our research showed that the accuracy of the model drops slightly when trained on RGB images, compared to grayscale images. Color may be adding noise or unrequired artifacts to the input, which degrades performance. However, a more in-depth analysis has to be done in the future, as normally color values assist an object detection model's performance. ### Limitations 1. Our approach converts all images to grayscale. CNNs have trouble handling data imbalances caused by different recording conditions. Color invariance can fix this problem by ignoring color variations, but it also removes useful color information, reducing the model's ability to distinguish between specific classes. All the ResNet models misclassify between the same classes, a color imbalance within these classes may be the cause. 1. The bounding box estimation struggles intersection over union (IOU) with the ground truth, this could be fixed through better parameter tuning during image processing. We found that although the proposed algorithm aligns with the ground truth annotations with respect to number (and location) of objects detected, the IOU comparision is not as favourable. This may be caused due to the bounding box not perfectly aligning with the ground truth due to part of the object being occluded or due to background noise. ### Future work and Potential Improvements 1. Alternatively, ***selective search*** can be used to generate region proposals with the additional advantage of multi-scale proposals and efficiency. 1. An end-to-end model like YOLO, R-CNN etc may be used to perform the bounding box estimation and object classification tasks together. We tried training a YOLOv5 model, however due to technical errors and potentially large training time, we could not make sufficient progress in time. 1. After detecting the LEGO bricks from a pile, a model may be used to provide suggestions on LEGO constructions that can be built with the assortment of recognized bricks. There is a commercial platform called Brickit [[10]](https://brickit.app), that already does this task. ## Conclusion To conclude, our research showed that we were able to succesfully recognize LEGO bricks when training a ResNet-34 model on the B200C Lego Detection Dataset. We chose the ResNet-34 model for object recognition as it proved to have a good balance between depth and performance. For object detection, Canny edge detection by OpenCV was fine-tuned to give more accurate bounding boxes. Regular background noise in the images of B200 Lego Detection Dataset was dealt with by applying Frequency Removal from Magnitude Spectrum. Also, after training on the occluded dataset, our model still performed well with an accuracy of over 85%. ## References [1] kirill-sidorchuk - Overview. (n.d.). GitHub. https://github.com/kirill-sidorchuk [2] A LEGO classifier – CNN and elbow grease. (2019, August 30). Ladvien’s Lab. https://ladvien.com/lego-deep-learning-classifier/ [3] Vidal, J., Vallicrosa, G., Martí, R., & Barnada, M. (2023). Brickognize: Applying Photo-Realistic Image Synthesis for Lego Bricks Recognition with Limited Data. Sensors, 23(4), 1898. https://doi.org/10.3390/s23041898 [4] Boiński, T. (2021). Hierarchical 2-step neural-based LEGO bricks detection and labeling. Bridge of Knowledge - Your Knowledge Portal. https://mostwiedzy.pl/en/publication/hierarchical-2-step-neural-based-lego-bricks-detection-and-labeling,155119-1 [5] B200C LEGO Classification Dataset. (2021, August 9). https://www.kaggle.com/datasets/ronanpickell/b200c-lego-classification-dataset [6] B200 LEGO Detection Dataset. (2024, March 16). https://www.kaggle.com/datasets/ronanpickell/b100-lego-detection-dataset [7] Team, L., & Team, L. (2024, February 5). Edge Detection using OpenCV | LearnOpenCV #. LearnOpenCV – Learn OpenCV, PyTorch, Keras, Tensorflow With Code, & Tutorials. https://learnopencv.com/edge-detection-using-opencv/ [8] GeeksforGeeks. (2023, January 3). Python OpenCV Morphological Operations. GeeksforGeeks. https://www.geeksforgeeks.org/python-opencv-morphological-operations/ [9] Singh, R. (2024, January 18). A quick reference for bounding boxes in object detection. Medium. https://medium.com/@rajdeepsingh/a-quick-reference-for-bounding-boxes-in-object-detection-f02119ddb76b [10] Brickit — Build new things from your good old bricks. (n.d.). https://brickit.app/