# CROD: Design Document ## Introduction ### Description This project aims to create a machine learning model that is capable of identifying the cards a ClashRoyal player has on the board at any given time. ### Scope The model is trained on a set of images from numerous battles of the Clash Royale game. The main scope is to identify and classify all distinct classes determined by the card collection of the game; in other words, the implementation of a multiple object detection and localization algorithm. ### Methods There are a numerous research projects in the actual state-of-art for object detection, such as YOLO, Detectron, DPM, RCNN, Multibox, MultiGrasp, OverFeat, Fast RCNN, MaskRCNN. There are multiple way these algorithms work. Current detection systems take a classifier for an object and evaluate it at various locations and scales in a test image. Others use a sliding window approach wuer the classifier is run on each location-frame. Other methods consist on generate a potential bounding box by region proposal trough CNN, and the run a classifier on the proposed boxes. On most of these algorithms after classification, there is a post-processing in the boxes such as a refination of the frames or elimination of duplicated detections. One of the best implementations for object detection is the YOLO algorithm (acronym to You Only Look Once). This technique consist on the implementation of a number of convolutional network layers and fully connected layers to predict multiple bounding boxes and the give a class probabilities for each box. The problem is stated as a single regression problem; it trains on the full images and optimizes the detection performance. More specifically, the algorithm divides an input image into a SxS grid. Each grid predicts B bounding boxes with a confidence score reflecting how confident the model is that the box contains an object and how accurate it is. Each bounding box contains 5 predictors: x,y, w, h, and confidence. The x and y coordinates represent the center of the box relative to the grid cell; w and h (width and height) are predicted relative to the image. The confidence represents the intersection over union between the predicted box and the ground truth. Only one set of class probabilities is predicted per grid cell regardless of the number of boxes. ## Methodology ### Dataset As stated before, dataset consist of a set of images from numerous battles from the Clash Royale game. These images must have a standard shape ... ### Preprocessing ### Model ### Hyperparameter tuning ## Results ### Train-Test-Validate ## References