computervision_group38
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # computer-vision-valorant [Justin Luu](https://github.com/justinluu2311) | [Arjun Vilakathara](https://github.com/avilakathara) | [Remi Lejeune](https://github.com/Remi-Lejeune) ## Introduction Computer vision, especially image segmentation and object detection, is a rapidly evolving field with potential applications across various industries. In recent years, significant improvement has occurred in the development of algorithms and training methodologies, drawing from existing techniques to enhance both the accuracy and efficiency of image segmentation tasks. As a team interested in CV (Computer Vision), we've been intrigued by the recent surge in AI (Artifical Intellingence) applications and advancements. Our curiosity led us to explore the intersection of AI and gaming, specifically how AI technologies could affect players' experiences in tactical FPS (First-Person Shooters) like Valorant. The primary goal of aiming in games like Valorant is precision and speed—key factors that can significantly impact gameplay. Traditionally, aimbots have relied on extracting data directly from the game or server to pinpoint the locations of opposing players. These methods are effective for cheating, but our interest here isn't to optimize how to cheat, but to see if we can make something similar using deep learning computer vision techniques. In our exploration, we will assess how these computer vision techniques can be used to develop an aimbot. The focus will be on evaluating the speed of processing the quality of image segmentation and its effectiveness within the game. We aim to develop an aimbot that mimics playing the game like a human would. The first step of which would be to first dileneate between player models and the background in the games video. In this blog, we describe how we trained different models to be able to perform object detection in the FPS game Valorant. We will provide an analysis of the results and determine which model is most suitable for this agent detection in Valorant. ## Models Multiple computer vision models are analysed in this blog, each one having their own specifity. First, there is YOLO (You Only Look Once), a state-of-the-art object detection model. Then, there is FastSAM (Fast Segment Anything Model), this model is an image segmentation model based on SAM (Segment Anything) that has been modified to run up to 50x times faster. Finally, there is RT-DETR (Real Time Detection Transformer) which is an object detection model like YOLO but it uses a transformer encoder. ### YOLO (You Only Look Once) ![image](https://hackmd.io/_uploads/SycBKtirC.png) *Figure 1: YOLOv9 Architecture* YOLO [1] is a popular object detection model known for its speed and accuracy. Unlike traditional object detection systems that apply the model to an image at multiple locations and scales, YOLO applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. YOLO is capable of detecting multiple objects in real-time, making it highly efficient for tasks that require quick processing, such as autonomous driving. But despite its strengths, YOLO has some limitations. The model may struggle with detecting small objects in close proximity [2]. The YOLO model used for this research is YOLOv9c. It is currently at the time of this research the latest available YOLO model. ### FastSAM (Fast Segment Anything Model) ![image](https://hackmd.io/_uploads/SkOt6YoHA.png) *Figure 2: FastSAM Architecture* FastSAM [3] is an advanced image segmentation model designed for rapid and accurate segmentation tasks. It builds upon SA [4](Segment Anything) and has been optimized to run up to 50x faster. It excels in identifying objects within an image, producing precise segmentations that can be used for various applications, such as medical imaging, autonomous driving, and video analysis. ### RT-DETR (Real Time Detection Transformer) ![image](https://hackmd.io/_uploads/Hy81AFiSC.png) *Figure 3: RTDETR Architecture* RT-DETR (Real-Time Detection Transformer) [5] is an advanced object detection model that utilizes a transformer encoder architecture as oppose to traditional convolutional neural network (CNN)-based models like YOLO. Its architecture involves dividing an image into a grid of patches, which are then processed through a series of transformer encoder layers. These layers enable the model to learn relationships between different parts of the image, allowing it to accurately detect and classify objects. Through that, RT-DETR addresses some of the limitations of grid-based models like YOLO by providing more precise localization and classification of objects. ## Dataset Datasets of annotated Valorant images are readily available on the internet. The first dataset that was tried is the [valorant-object-detection2](https://universe.roboflow.com/kwan-li-jqief/valorant-object-detection2/dataset/7) dataset, composed of 1416 training images. However when models were trained on this dataset, they resulted in poor performances. After looking more in depth into this dataset, it was found that some of the data was poorly labeled for the usecase of the project, since we are looking for data that has the whole body covered in a bounding box, not just parts of the chest area: ![image](https://hackmd.io/_uploads/BydiK0oHR.jpg) *Figure 4: Example of 'bad' annotation* This meant that another dataset had to be chosen, the newly selected dataset is the [Santyasa Image Dataset](https://universe.roboflow.com/alfin-scifo/santyasa/dataset/4), with 2975 training images. Review of this dataset inspired confidence in its annotations, as no obviously faulty ones were identified similar to those in the previous dataset. The data is divided into three parts: a training set, a validation set, and a testing set. The training process required the dataset to be in YOLO format, which involved a folder of images and a folder of text documents containing the annotations associated with the images. The labels and annotation in a text document for **segmentations** would be as follows: ``` 0 0.49038460850715637 0.47836539149284363 0.48798078298568726 0.48076921701431274 0.48798078298568726 0.48317307233810425 0.48317307233810425 0.48798078298568726 0.48317307233810425 0.49038460850715637 0.48076921701431274 0.4927884638309479 0.48076921701431274 0.4951923191547394 0.47836539149284363 0.4975961446762085 0.47836539149284363 0.5192307829856873 0.4759615361690521 0.5216346383094788 0.4759615361690521 0.5288461446762085 0.47836539149284363 0.53125 0.47836539149284363 0.5408653616905212 0.4759615361690521 0.5432692170143127 0.4759615361690521 0.5600961446762085 0.47836539149284363 0.5625 0.47836539149284363 0.567307710647583 0.48076921701431274 0.567307710647583 0.48076921701431274 0.5625 0.48798078298568726 0.5552884340286255 ``` The labels and annotation in a text document for the **bounding boxes** would be as follows: ``` 1 0.5360576923076923 0.4375 0.03365384615384615 0.040865384615384616 0 0.5360576923076923 0.5552884615384616 0.10817307692307693 0.3004807692307692 ``` Where the first number, is the class of the segmented object and the numbers are the coordinates points that make up the shape of the segmentation. In the dataset there are two classes: 0 and 1, where 0 is the body and 1 is the head of the agent. Here are a few example images from the dataset: | ![dwdw](https://hackmd.io/_uploads/Sy_OcthS0.jpg) | ![dww](https://hackmd.io/_uploads/r1eljY2r0.jpg) | |--------|---------| | ![efe](https://hackmd.io/_uploads/H1oKjtnSR.jpg) | ![fg](https://hackmd.io/_uploads/ryf-2K3HC.jpg) | *Figure 5: Example images from Santyasa dataset* Then, we created a dataset called [Edge Case Dataset](https://universe.roboflow.com/justin-3f4xp/valorant-agents-pl9nm/dataset/1), which was made to replicate situations considered as edge cases in the game. For example, where only the head is seen, or only part of the body is seen. The Santyasa Image Dataset does contain a few of these images, but the Edge Case Dataset was made especially for evaluation purposes only since it uses very difficult images. Here are a few example images from the dataset: | ![Screenshot-6-_png.rf.5201dfd3912c2ef5c62a9d5cab881ada](https://hackmd.io/_uploads/BJ1NSnhBC.jpg) | ![Screenshot-7-_png.rf.c6686d58c6d628eaf574480571e863a3](https://hackmd.io/_uploads/ByUVr2nSR.jpg) | |--------|---------| | ![Screenshot-15-_png.rf.4d1d9afe4fd4f92bead448de19beff87](https://hackmd.io/_uploads/S1K4S2nBR.jpg) | ![Screenshot-10-_png.rf.8455b5912363482b8c17c3dcec2a672b](https://hackmd.io/_uploads/BJpVH32rA.jpg) | *Figure 6: Example images from Edge Case dataset* Finally, to showcase the performance of the models in real time, a video clip was recored. The video is 30FPS (Frame Per Second), and will be used to show the results of running the detection models on this video in real time and for calculating the inference time. <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/vb26pUwg6Hc?si=6mu8sJ3JjbE5oKGL" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> *Video 1: Video clip to test real time performance of models on* ## Equipment Training is done using the following equipment, provided by Kaggle: https://hackmd.io/PMtzDryvQUy-wNrTX9wrKw?both# Kaggle GPU P100, 16GB Within these specs, these are the paramaters used for the training: FastSAM-s epochs=100, batch=16, imgsz=640 YOLOv9c epochs=100, batch=32, imgsz=640 RTDETR epochs=100, batch=32, imgsz=640 Performing the experiments is done using an RTX3060 TI 8GB. ## Training As mentioned, the project utilized YOLO, FastSAM, and RTDETR. This section provides a detailed breakdown of the training process, specifically focusing on the YOLO model for illustrative purposes. #### Dependencies Wandb (Weights & Biases): Used for experiment tracking and visualizing model performance. Secrets management provided by kaggle_secrets was used to securely access the wandb API key. Ultralytics: The project leverages the Ultralytics suite for training, validating, and testing the YOLO, FastSAM, and RTDETR models. These models are part of the Ultralytics ecosystem, which offers straightforward methods for training and testing. #### Loading the models Initially, the project aimed to train models from scratch. However, it was soon realized that this approach required extensive computational resources and time. To address this issue, transfer learning was employed, a technique that significantly accelerates the training process by fine-tuning models pre-trained on a large, generic dataset for a more specific task. Pre-trained versions of all models were fine-tuned with the chosen dataset (Santyasa Image Dataset). Models were loaded as follows in the code. For YOLO, the model was initialized with pre-trained weights ("YOLOv9c.pt"), and similar steps were taken for the other models. ``` from ultralytics import YOLO from ultralytics import RTDETR model_yolo = YOLO("YOLOv9c.pt") model_fastsam = YOLO("FastSAM-s.pt") model_rtdetr = RTDETR("rtdetr-l.pt") ``` #### Preparing the data The ultralytics training pipeline requires details about the dataset in a data.yaml file, which links to the dataset containing annotated images for object detection. This file outlines the dataset's structure, path, classes, and other configurations required for training. For exmaple: ``` train: ../train/images val: ../valid/images test: ../test/images nc: 2 names: ['1', '2'] ``` #### Training using paramaters The models were trained for 100 epochs, with a batch size of 16 for FastSAM and 32 for the other two models. Image size was set to 640x640 pixels. It was also ensured that the model and its best iterations were saved for later evaluation and inference. ``` model.train(data="data.yaml", epochs=100, batch=16, imgsz=640, save=True, device='0') ``` The chosen parameters and the inconsistency in batch size selection were influenced by computational resource limitations. FastSAM was trained using a smaller batch size than YOLO and RT-DETR. FastSAM would use up too much memory, so a smaller batch size had to be used during training. The goal here was to use as much power of the GPU as possible, because of the limited training time. As a result of this choice, FastSAM has the advantage of being able to lean more from each individual example, therefore it is expected to have an advantage in the metrics. ## Experiment For the experiments, the YOLO, FastSAM, and RTDETR models underwent testing to assess their effectiveness on the test subset of the Santyasa Image Dataset and the Edge Case Dataset created specifically for this project. Additionally, the models were evaluated on selected videos to measure average computation time and detection accuracy. #### Testing on Datasets The experiments began by deploying all three models on the test set of the Santyasa Image Dataset and the Edge Case Dataset. This approach was designed to compare the performance of the models in recognizing and detecting objects. ``` model = YOLO("chosen model e.g.: 'best_weights_fastsam_final.pt'") # model = RTDETR("best_weights_rtdetr_final.pt") use for RTDETR metrics = model.val(conf=0.5, data="path/to/testdata.yaml", split="test") ``` #### Testing on Video Data After offline testing, the models were applied to a series of pre-selected videos. The primary goal was to evaluate the models under more dynamic conditions, reflecting potential real-world applications. Each model's average computation time per frame and detection efficacy were recorded. ``` # Run experiment for real-time testing on videos # Loop through the video frames inference_times = [] # Loop through the video frames while cap.isOpened(): # Read a frame from the video success, frame = cap.read() if success: # Run YOLOv8 tracking on the frame, persisting tracks between frames start_time = time.time() results = model.track(frame, persist=True, conf=0.5) end_time = time.time() inference_time = end_time - start_time inference_times.append(inference_time) # Visualize the results on the frame annotated_frame = results[0].plot() # Display the annotated frame cv2.imshow("Real-Time Tracking", annotated_frame) ``` ## Results ### Metrics #### Precision Precision is the metric that indicates the accuracy of the predictions made by the model, specifically focusing on the proportion of positive identifications that were actually correct. It is calculated as the number of true positive detections divided by the total number of elements labeled as positives (true positives + false positives). It is particularly important when the cost of a false positive is high. Precision = $\frac{True Positives}{True Positives + False Positives}$ #### Recall Recall measures the ability of the model to find all the relevant cases (true positives) within a dataset. It is the proportion of actual positives that were correctly identified. This specifies the fraction of actual objects that were detected by the model. It's an important metric to consider when it’s important to capture as many positives as possible. Recall = $\frac{True Positives}{True Positives + False Negatives}$ #### F1-score The F1-score is a harmonic mean of precision and recall, providing a single score that balances both the concerns of precision and recall in one number. It’s particularly useful for our usecase since it is needed to compare multiple models. F1 Score = $2 \times \frac{Precision \times Recall}{Precision + Recall}$ #### MAP (Mean Average Precision) In object detection, AP (Average Precision) measures the accuracy of the model in detecting objects of a particular class, integrating over a precision-recall curve. MAP is the average of AP scores across all classes or over different IoU (Intersection over Union) thresholds. IoU is a measure used to determine the accuracy of a predicted bounding box. It calculates the ratio of the intersection area between the predicted bounding box and the ground truth bounding box to their union area. A higher IoU indicates a more accurate prediction. MAP50 and MAP50-95 were used, where the numbers indicate the IoU threshold values. IoU = $\frac{Overlap Predicted Box and Labeled Box}{Union Predicted Box and Labeled Box}$ MAP50 = $\text{Average of APs calculated at IoU threshold of 0.50 across all classes}$ MAP50-95 = $\text{Average of APs calculated at IoU thresholds from 0.50 to 0.95 across all classes}$ ### Offline Evaluation In this section the results of the offline evaluation are presented. Class "0" is body, class "1" is head and class "all" is about the aggregated results over all the images. | **Model** | **Class** | **Images** | **Instances** | **Precision** | **Recall** | **MAP50** | **MAP50-95** | **F1-score** | |-------------|-----------|------------|---------------|---------|-----------|-----------|--------------|--------------| | **YOLOv9c** | all | 404 | 881 | 0.871 | 0.418 | 0.653 | 0.304 | 0.565 | | | 0 | 399 | 461 | 0.891 | 0.675 | 0.795 | 0.417 | 0.569 | | | 1 | 365 | 420 | 0.85 | 0.162 | 0.51 | 0.192 | 0.272 | | | | | | | | | | | | **RT-DETR** | all | 404 | 881 | 0.769 | 0.602 | 0.639 | 0.265 | 0.675 | | | 0 | 399 | 461 | 0.89 | 0.827 | 0.876 | 0.412 | 0.881 | | | 1 | 365 | 420 | 0.648 | 0.377 | 0.403 | 0.118 | 0.477 | | | | | | | | | | | | **FastSAM** | all | 404 | 622 | **0.9** | **0.664** | **0.791** | **0.491** | **0.764** | | | 0 | 365 | 452 | 0.92 | 0.763 | 0.853 | 0.592 | 0.834 | | | 1 | 154 | 170 | 0.881 | 0.565 | 0.73 | 0.389 | 0.688 | *Table 1: Performance metrics achieved by the models on Santyasa Image Dataset* As shown in Table 1, YOLO scores the worst out of the three models on recall, which means it detects less true positives, than FastSAM and RT-DETR. RT-DETR scores the worst on precision. This indicates that RT-DETR has more false-positives than the other models. Overall FastSAM is yields the best scores on all metrics for Santyasa Image Dataset, therefore it is the best model on this dataset. **YOLOv9c results** | ![F1_curve](https://hackmd.io/_uploads/B15YTFiH0.png) | ![P_curve](https://hackmd.io/_uploads/By5YaYsrC.png) | |--------|---------| | ![PR_curve](https://hackmd.io/_uploads/Bk9YaYsS0.png) | ![R_curve](https://hackmd.io/_uploads/H1qKpYiHR.png) | *Figure 7: Performance metrics achieved by YOLOv9c on the test set* In Figure 7, it would appear from the F1-Confidence, Recall-Confidence and Precision-Confidence curves, that the best confidence threshold lies around 0.65. ![yolo_predictions](https://hackmd.io/_uploads/HJqlWa2BR.jpg) *Figure 8: Predictions made by YOLOv9c* ![yolo_labels](https://hackmd.io/_uploads/rJMG-pnB0.jpg) *Figure 9: Ground truth of the predictions by YOLOv9c* **RT-DETR results** | ![F1_curve](https://hackmd.io/_uploads/r1WwTYsrR.png) | ![P_curve](https://hackmd.io/_uploads/HJZP6FirR.png) | |--------|---------| | ![PR_curve](https://hackmd.io/_uploads/BkZDTFoSR.png) | ![R_curve](https://hackmd.io/_uploads/SkWwTYiSA.png) | *Figure 10: Performance metrics achieved by RT-DETR on the test set* In Figure 10, it would appear from the F1-Confidence, Recall-Confidence and Precision-Confidence curves, that the best confidence threshold lies around 0.7. **Predictions** ![rtdetr_predictions](https://hackmd.io/_uploads/SkMYx63r0.jpg) *Figure 11: Predictions made by RT-DETR* **Ground truth** ![rtdetr_labels](https://hackmd.io/_uploads/SJnpgpnS0.jpg) *Figure 12: Ground truth of the predictions by RT-DETR* **FastSAM results** | ![BoxF1_curve](https://hackmd.io/_uploads/rkcvRYiS0.png) | ![BoxP_curve](https://hackmd.io/_uploads/ry5DRYsSC.png) | |--------|---------| | ![BoxPR_curve](https://hackmd.io/_uploads/HJ5wRKirA.png) | ![BoxR_curve](https://hackmd.io/_uploads/SJx9wRYjrR.png) | *Figure 13: Performance metrics achieved by FastSAM on the test set* In Figure 13, it would appear from the F1-Confidence, Recall-Confidence and Precision-Confidence curves, that the best confidence threshold lies between 0.7 and 0.8. **Predictions** ![fastsam_predictions](https://hackmd.io/_uploads/BkvHe6nSC.jpg) *Figure 14: Predictions made by FastSAM* **Ground truth** ![fastsam_labels](https://hackmd.io/_uploads/r1ewlp3HR.jpg) *Figure 15: Ground truth of the predictions made by FastSAM* **Edge cases evaluation** | **Model** | Class | Images | Instances | P | R | mAP50 | mAP50-95 | |-------------|-------|--------|-----------|-------|--------|-------|----------| | **YOLOv9c** | all | 12 | 19 | 0 | 0 | 0 | 0 | | | | | | | | | | **RT-DETR** | all | 12 | 19 | 0.225 | **0.307** | **0.161** | 0.0497 | | | 0 | 11 | 11 | 0.288 | 0.364 | 0.162 | 0.0553 | | | 1 | 8 | 8 | 0.162 | 0.25 | 0.16 | 0.0441 | | | | | | | | | | | **FastSAM** | all | 12 | 22 | **0.362** | 0.0357 | 0.153 | **0.0763** | | | 0 | 11 | 14 | 0.723 | 0.0714 | 0.305 | 0.153 | | | 1 | 8 | 8 | 0 | 0 | 0 | 0 | *Table 2: Performance metrics achieved by the models on the Edge Case Dataset* On the Edge Case Dataset FastSAM has significantly lower recall score than RT-DETR. This indicates that FastSAM detects only a really small portion (3.57%) of all true positives, missing a significant 96.43% of them. RT-DETR detects about 30.7% of all true positives in the dataset. It misses about 69.3% of the positives. YOLOv9c appears to not detect anything at all in the Edge Case Dataset*. The precision of the models are also low. RT-DETR has a precision score of 0.225, so only 22.5% of the detections classified as positive by RT-DETR are true positives. FastSAM has a significantly higher precision score than RT-DETR (0.362), this shows that 36.2% of FastSAM’s positive detections are correct. **YOLO edge case predictions** ![yolo_edge_cases](https://hackmd.io/_uploads/rJfrb63rA.jpg) *Figure 16: YOLOv9c edge case predictions* YOLO does not detect any objects in the Edge Case Dataset. **FastSAM edge case predictions** ![fastsam_edge_cases](https://hackmd.io/_uploads/S1oUba3BA.jpg) *Figure 17: FastSAM edge case predictions* In Figure 17, you can see that FastSAM cannot detect the agent in the first image even though it is fully visible to the naked eye. It cannot detect any objects in the foggy environment in the third picture either. However, it is able to detect an object in the second picture where a part of the body is visible in a normal environment. **RT-DETR edge case predictions** ![rtdetr_edge_cases](https://hackmd.io/_uploads/rJxCZa3rA.jpg) *Figure 18: RT-DETR edge case predictions* In Figure 18, you can see that RT-DETR has detected two false positives in the first image and one false positive in the second image. RT-DETR is able to detect the agent in the foggy environment and the agent where only a part of the body is visible in a normal environment. **Ground truth Edge Case Dataset** ![edge_cases_labels](https://hackmd.io/_uploads/SkGeGanrC.jpg) *Figure 19: ground truth of the predictions in figures 16, 17 and 18* ### Real-time evaluation The inference time for all the models are very low. YOLO struggles with detecting the heads of the agents and it has the lowest number of false positives. FastSAM has more false positives than YOLO, but it has better detection, since it does detect the heads more often than YOLO. RT-DETR has the best ability to detect the heads, but it has significantly more false positives than YOLO or FastSAM. Video of YOLO: <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/Tj21M-y_cIM?si=og4QlEhWEUA9yzyd" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> *Video 2: Results of YOLO* Video of FastSAM: <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/R3svfqkTfow?si=P4wSDRGD2-ILXWtR" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> *Video 2: Results of FastSAM* Video of RT-DETR: <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/UWS17TGn6A8?si=RZLrEWjPMJP9dMS5" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> *Video 2: Results of RT-DETR* | | YOLOv9c | FastSAM | RT-DETR | |------|---------|---------|---------| | **mean** | 43.4ms | **33.8ms** | 73.8ms | | **std** | 1.47ms | **0.28ms** | 1.01ms | *Table 3: Mean and standard deviation of inference times for each model* From Table 3, it can be seen that FastSAM stands out as the superior model for applications requiring fast and consistent inference times, making it potentially more suitable for real-time object detection tasks. YOLOv9c and RT-DETR, with their longer inference times, are slower and therefore the worse choice for this circumstance. ## Discussion The reason why all models perform badly on the Edge Case Dataset could be that the dataset they were trained on does not contain a lot of the "complex" scenarios where only parts of an agent are visible to the model. A few examples for this can be seen in figure 12. RT-DETR has a few false positives in the first picture while FastSAM and YOLO detect nothing at all. An interesting case can be observed in the first picture. An agent has their back turned against the player. However, even though the agent is clearly visible, it is not detected by any of the models. We think the reason for this might be because the dataset only contains pictures of agents from the front. RT-DETR has a better recall but worse precision. This means it is more comprehensive in detecting objects but at the cost of accuracy. FastSAM has better precision but worse recall. This model is more reliable when it claims to detect an object, but it misses a lot more actual objects. From the findings documented in tables 1, 2 and 3 it is clear that FastSAM is the best performing model of the three in terms of the performance metrics, inference time and edge case detection. ## Conclusion In this study, we conducted a comparative analysis of three object detection models—YOLO, FastSAM, and RT-DETR—to identify the most effective model for detecting agents in the FPS game "Valorant". Each model was trained and tested utilizing the Santyasa Image Dataset. They were also tested on a specially designed Edge Case Dataset to challenge their robustness in atypical scenarios. The evaluation revealed that FastSAM consistently outperformed the other models in terms of inference speed and demonstrated superior performance on the Santyasa Image Dataset. FastSAM provided the most reliable detections among the three models. On the other hand, YOLO failed to detect any agents within the Edge Case Dataset, highlighting a significant limitation in handling complex scenarios. RT-DETR, detected agents in both datasets but performed less effectively than FastSAM, especially in the Santyasa Image Dataset. These findings suggest that FastSAM is the preferred model for object detection tasks within "Valorant" under normal conditions. However, the performance of all models in complex or atypical scenarios was underwhelming, indicating a need for further refinement to enhance their robustness and reliability. ## Limitations The scope and effectiveness of the project faced significant constraints due to limited computational resources, restricted training and testing time, and challenges in dataset creation. These limitations impacted the amount of data that could be processed and the extent to which the models could be optimized. #### Computational Power Constraints The available computational resources were insufficient to fully exploit the capabilities of the YOLO, FastSAM, and RTDETR models. High-performance computing environments are typically required to train and fine-tune such sophisticated models efficiently. However, the project had to operate with relatively weak processing power, which prolonged training durations and affected the overall throughput of data processing. This limitation was particularly challenging during model training phases that demanded intensive computations and data handling. #### Restricted Training and Testing Time The project was allocated approximately 30 hours of training time per week, totaling around 180 hours for the entire duration of the project. Although intially, it seems sufficient, it was found out that this was not as much as expected. For example, training RTDETR a single time took 20 hours. If a model required more time to train than was available on the GPU that week, we would have to wait until the time got reset the next week, which meant that a few hours were lost like this. This severely limited the amount of experimentation and optimization that could be conducted. #### Difficulties with FastSAM A considerable amount of time—approximately two weeks (50 hours of GPU time)-was spent attempting to understand and implement the FastSAM training and validation code available in the [official FastSAM repository](https://github.com/CASIA-IVA-Lab/FastSAM?tab=readme-ov-file). The intent was to utilize the provided code to train the FastSAM model. Despite following the setup instructions accurately, the training process encountered continuous errors related to dataset formatting. Even after resolving these issues, the model often failed to detect anything, indicating that it had not been trained correctly. Various dataset formats and training parameters were adjusted in an effort to resolve the training issues. Attempts were made to train the model in different environments, including Google Colab and local machines, but these were unsuccessful. This ongoing struggle led to the exploration of Ultralytics as an resource for training FastSAM. However, further complications arose when it was discovered that the FastSAM constructor did not include a .train() method, making training via this approach impossible. ``` from ultralytics import FastSAM model = FastSAM("FastSAM-s.pt") ``` After many trials and analysis of the FastSAM code—which revealed that YOLO model weights were used with the YOLO constructor for its transfer learning—it was decided to train the pretrained FastSAM using the YOLO constructor instead just as they did in their training. Although in theory the training is done just as it is in the FastSAM code, we cannot guanatee its equivalance. This approach was a deviation from the original goal, which was to use the FastSAM training method directly. #### Challenges in Creating a High-Quality Dataset Although the dataset used to train contains 2975 training images, it is far too small. Valorant as a game is very dynamic with many characters and lighting scenarios. To capture a significant number of such scenarios and situations would require potentially tens of thousands of images. Creating such a large high-quality, annotated dataset would be impractical given the project's time constraints. The inability to develop a large dataset that could trusted to be well annotated, limited the training potential of the models and subsequently narrowed the project's capacity to achieve potentially higher detection accuracy. #### Constraints on Model Selection Due to limited computational resources, the team was motivated to choose scaled-down versions of each model, designed specifically to minimize GPU load. Attempts to use larger models were stopped by the necessity to operate with very small batch sizes (batch size 1 or 2), to avoid exceeding the available GPU memory. Training with such small batch sizes would have resulted in training times exceeding the allocated GPU usage duration. These scaled-down models are advantageous in terms of reduced computational demands, but they generally provide less robustness and lower performance compared to their full-sized counterparts. This limitation could have not only impacted the precision and efficiency of object detection but also restricted the project’s capacity to explore more advanced capabilities that might be offered by more powerful models. #### Limited Scope of the Project These computational, time, data creation, and model selection limitations restricted the scope of the project. While the original intent was to explore the boundaries of object detection within specific environments, the resource constraints meant that the project could not be executed to the extent the group wished for. The limited training and testing time, coupled with the FastSAM issues, the challenges in dataset creation, and the forced choice of scaled-down models, limited the depth of analysis and refinement of the models, leading to a narrower exploration of their potential capabilities. ## Future works YOLO, RT-DETR and FastSAM are not good enough yet to be used, since their performance is subpar on Edge Case Dataset. To improve the perfomances of the models, we would need to train them on more data that covers more scenarios. Data augmentation could also be used to create synthetic variations of the existing dataset, increasing its diversity and helping the models generalize better to unseen edge cases. FastSAM should be trained with a more powerful GPU so all the models can be trained using the same batchsize. It would be even better if FastSAM could be trained directly. ## GitHub https://github.com/Remi-Lejeune/computer-vision-valorant ## References [1] Wang, Chien-Yao et al. “YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information.” ArXiv abs/2402.13616 (2024): n. pag. [2] Li, Yongjun & Li, Shasha & Du, Haohao & Chen, Lijia & Zhang, Dongming & Li, Yao. (2020). YOLO-ACN: Focusing on small target and occluded object detection. IEEE Access. PP. 1-1. 10.1109/ACCESS.2020.3046515. [3] Zhao, Xu & Ding, Wenchao & An, Yongqi & Du, Yinglong & Yu, Tao & Li, Min & Tang, Ming & Wang, Jinqiao. (2023). Fast Segment Anything. [4] Kirillov, Alexander, et al. "Segment anything." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. [5] Lv, Wenyu & Xu, Shangliang & Zhao, Yian & Wang, Guanzhong & Wei, Jinman & Cui, Cheng & Du, Yuning & Dang, Qingqing & Liu, Yi. (2023). DETRs Beat YOLOs on Real-time Object Detection.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully