Technical Specification Document

## CA326 - Third year project ## ## ## # Technical Specification Document ## ## ## ## ## Project Title: ## ## "The Creation, Design and Implementation of Artificial Intellegence Systems in Image Processing and Object Recognition" ## ## ## ## ## Authors: ## James O'Neill - 17391756 ## Paul Davin - 15574747 ****** ## Table of Contents 0. [__Table of Contents__](#0-Table-of-Contents) 1. [__Introduction__](#1-Introduction) - 1.1 Overview - 1.2 Related Documents - 1.3 Terminology/Abbreviations - 1.4 Document Overview 2. [__System Architecture__](#2-System-Architecture) - 2.1 SystemRequirements/Constraints - 2.2 3rd Party Dependencies - 2.3 System Architecture Diagram 3. [__High-Level Design__](#3-High-Level-Design) - 3.1 Purpose of section - 3.2 Use Cases - 3.2.1 Passing one image directly - 3.2.2 Passing one image through website - 3.3 Datasets - 3.4 Performance - 3.5 Error Handling - 3.6 Testing - 3.7 Data Flow Diagram 4. [__Problems and Resolutions__](#4-Problems-and-Resolutions) - 4.1 Purpose of this section - 4.2 Initial Plan - 4.3 Neural Nets - 4.4 Google Vision 5. [__Installation Guide__](#5-Installation-Guide) - 5.1 Website - 5.2 Running the models 6. [__References__](#6-References) - 6.1 References ******** ## 1. Introduction ### 1.1 Overview Our project is to create a Automatic Number Plate Recognition (ANPR) engine, which will be given an image of a vehicle and will return the plate registration number if visible. In order to demonstrate this we will also create a website where you can upload an image, and it will return the registration number if present, using our ANPR engine. ### 1.2 Related Documents For more information which is not contained within this document, please see the below documents. | **Document** | **Link** | **Description**| | --- | --- | --- | | Functional Spec | link | The function specification for this project, made in November. There has been some deviation from this. | | Testing Document | link | A markdown document outlining some of the testing done throughout the development cycle. | | User Guide | link | A short guide on using our website. | | Video Walkthrough | link | A video walkthrough, showing the website in use. | ### 1.3 Terminology/Abreviations | **Term** | **Abbr.** | **Definition** | | --- | --- | --- | | Convolutional Neural Network | CNN | Subset of neural networks most commonly used to analyze visual imagery. | | Artificial Intelligence | AI | Technology that carries out tasks that would traditionally require human intelligence | | Automatic Number Plate Recognition | ANPR | Technology that uses AI to automatically read number plates | | K Nearest Neighbour Algorithm | KNN | A simple, easy-to-implement supervised machine learning algorithm used to solve classification and regression problems. | | Dataset | DS | The sample space upon which our CNN will learn from | | Optical Character Recognition | OCR | Technology used to identify individual characters from an image | |Google Vision |GV |Googles OCR engine | ### 1.4 Document Overview This document will cover the various technical aspects of the project. Below is a short description of what each section will cover. ### Section 2 System Architecture This section will outline the different libraries that the system depends on to function. It will explain what hardware requirements there are for the system to run, and any limitations the system has. A system architecture diagram is also included. ### Section 3 High-level Design This section will go into the overall design of the system, and explain different aspects of the system such as its performance, how it handles errors, the different use cases for the system, and the datasets that were used to train the system. It also contains a data flow diagram to illustrate the usual flow of data through the system. ### Section 4 Problems and Resolutions This section is to outline the different problems that arose during the development cycle. It will highlight the changes made from the initial design to the final one, and explain the reasons for these changes. ### Section 5 Installation Guide This section will outline how to use the system on a device. It will also explain how to load the different models we have created and use them on an Ubuntu 18.04 device. ### Section 6 Credits and References This section simply contains the credits and references for this document. ******** ## 2. System Architecture ### 2.1 System Requirements/Constraints Requirements constraints go here, limited usage, self hosted etc. ### 2.2 3rd Party Dependencies There is a number of libraries that this system depends on to work, as listed below. Some of these libraries where used during the development and testing of this system and may not be necessary to run the final system. - Scikit - Scimage - Numpy - Pillow - ImageAI - matplotlib - Flask - Google.cloud - openCV - tensorflow - keras - werkzeug ### 2.3 System Architecture Diagram ![SA Diagram](https://i.imgur.com/w5QaMrF.png) ******** ## 3. High-Level Design ### 3.1 Purpose of section The purpose of this section is to provide a broad overview of the system design, including use cases and the identification/explanation of each component involved in the system. ### 3.2 Use Cases #### 3.2.1 Passing One image directly **Description -** This use case will be passing an image directly to system, without using the webpage to upload it. The image will be of a vehicle with the license plate visible and of suficient. The response will be a string consisting of the regisitration number of the license plate. This will be used by the webpage to pass the image to the system, and for testing purposes to ensure the system is functioning correctly. **Criticality -** This use case is essential for the project, as this is how an image will be processed by the system. The rest of the use cases will depend on this being functional so it will be the highest priority. **Technical Issues -** The main technical issues with this use case will relate with to integrating the different components to the system and ensuring they work together correctly **Dependencies with other requirements -** The dependencies for this requirement will be a working webpage to access our system. #### 3.2.2 Passing one image through website **Description -** This use case will be the main one for our system, and how we will be demonstrating the project. A user will navigate to our public webpage, where they will be able to upload an image (Subject to the same restrictions as outlined in section 3.2.1) **Criticality -** This use case is essential for our project, as without it there would be no functionality on the webpage, rendering us unable to demonstrate the progress without using the command line. **Technical Issues -** The technical issues with this part of the project are in relation to hosting the website and backend. Many online services which offer hosting limit things such as app size, and access to system resources which will effect our system. One solution to this is hosting our system ourselves. **Dependencies with other requirements -** This requirement will be dependent on 3.2.1, as it will be used to send the image to the system. ### 3.3 Datasets Datasets were composed by taking samples from the below website: https://ai.stanford.edu/~jkrause/cars/car_dataset.html There are over 16,000 images in this dataset, we randomly selected images for training, testing and validation. ### 3.4 Performance **<center>Successful</center>*** ***Cache disabled:*** The below screenshot is the network log of a successful upload with cache disabled. As you can see, the runtime for finding the plate and extracting the text was 1.88 seconds. ![Imgur](https://i.imgur.com/SECL3gK.png) ***Cache enabled:*** With cache enabled, we can see a very significant 22.3% decrease in runtime, this was was the second time loading the page, so the HTML and CSS had already been loaded and does not need to be reloaded a second time. As the runtime was already quite low, any difference in load time will be quite significant in proportion to the overall process. ![Imgur](https://i.imgur.com/V1vwSzr.png) ***Successful waterfall:*** Here we see a full waterfall report of a successful upload with cache disabled. The runtime is 1.47 seconds. ![Imgur](https://i.imgur.com/7tcRvFa.png) **Initial runtimes** The below image shows that the runtime for the same image as displayed above would take over 20 seconds to process. This was due to the fact that the AI model was being loaded at runtime rather than access time. By loading this at website access time, the process runtime was 93% faster. ![Imgur](https://i.imgur.com/O25ZaBw.png) **<center>Blank - Empty POST request</center>** The below is the runtime on an empty POST request. The exception handler catches the upload before it enters the extraction function. This is covered in further detail in section 3.5 - Error handling. ![Imgur](https://i.imgur.com/1qYQmZ9.png) **<center>Unsuccessful - No plate detected</center>** **Cache disabled:** With cache disabled, the runtime is just over 750ms, this optimisation was made by exiting the find() function (called by extract()) early if there is no plate. More details on this in section 3.5 - Error handling. ![Imgur](https://i.imgur.com/M4HWuSD.png) **Cache enabled:** When cache is enabled, the runtime drops below 700ms. ![Imgur](https://i.imgur.com/KzrR5Pq.png) **Initial runtimes:** In the initial solution, this took over 13 seconds to run. This, again, was due to the fact that the trained model was only loading at runtime. Also, we did have functionality to display the image as well as the result, this significantly added to runtime due to the processing and formatting of the image type in order to have it display, as shown below. ![Imgur](https://i.imgur.com/rtutf0e.png) **<center>Unsuccessful - Plate detected but no text</center>** **Cache disabled:** With cache disabled, the runtime is just shy of 2 seconds, the reason that this is consistently the same runtime, if not, longer than the successful flow is due to the fact that it has to run through all of the code and then return the error. ![Imgur](https://i.imgur.com/ztIWCHl.png) **Cache enabled:** With cache enabled, the runtime decreases by about 200ms. ![Imgur](https://i.imgur.com/OR7LPuJ.png) **Initial runtime:** This was not supported in our initial solution, this was included in the optimisation update. ### 3.5 Error Handling **Purpose of this section:** Explain error handling, no license plate visible, wrong file type passed, etc. **Empty POST request:** If no image is uploaded and the submit button is pressed, then the function will follow the flow of being given an image with no valid number plate. This is handled by the following piece of code found in `deploy.upload()` ``` try: if request.method == 'POST': file = request.files['file'] filename = secure_filename(file.filename) file.save(filename) print("file received") return render_template("results.html", input = extract(filename)) except: print("error in upload") return render_template("results.html",input = extract(filename)) ``` **File types:** Only JPG and PNG formats are supported. Rather than writing back-end code to handle an invalid file type, the system will, instead, not allow the user to upload an invalid file type. This is handled in index.html ``` <input class="upload-file" accept="image/*" type="file" name="file" id="file" data-multiple-caption="{count} files selected" multiple/> ``` **No plate detected in uploaded image**: If an image is uploaded, but the software does not detect a number plate, there will be an <code>IndexError</code> thrown when trying to reference <code>detections[0]["box_points"]</code> as <code>detections</code> has a length of 0. In order to handle this, the function will return `None`. This is utilised `in deploy.extract()` when the `find()` result is assigned to the `coordinates` variable. If it is of type `None`, then the software will exit the function early and tell the user that there was no plate detected. This not only improves user experience, it also increases efficiency significantly (as mentioned in section 3.4). ``` #exception handler to catch no plate detected try: print("Plate detected") return detections[0]["box_points"] except IndexError: return None coordinates = find(img) if coordinates: image = Image.open(img) plate = image.crop((coordinates[0]-(image.size[0]*.05), coordinates[1]-(image.size[1]*.02), coordinates[2]+(image.size[0]*.05), coordinates[3]+(image.size[1]*.02))) plate.save("plate.jpg") print("Plate extracted") return recognize_license_plate("plate.jpg") else: return "No plate detected, please ensure that you have uploaded a clear image where the plate is clearly visible." ``` **Plate detected in image, but no text detected:** The only other flow for an upload is when a user uploads an image, the software finds a plate, but cannot make out the text. In this case, the user is advised to upload a better quality image as there is a plate, but the text isn't recognisable by the software. This is handled by the below code: ``` try: plate_num = str(texts[0]) plate_description = plate_num[plate_num.index("description"):plate_num.index("bounding_poly")] raw_num = plate_description[plate_description.index(':')+3:len(plate_description)-4].replace('\n',' ') except IndexError: raw_num = "Your plate was found, but we could not make out the text. Please upload a better quality image." return raw_num ``` As is the case with not detecting a plate, not detecting text will also throw an `IndexError`, in order to maintain consistency across the platform, this has been handled similarly. ### 3.6 Testing Due to the nature of creating our system and its use of machine learning, much of the testing is done while training our models. In addition to this we also done our tests on these models as verification, and for our own records. This section will cover the testing performed on the final system and its components. There was a significant amount of testing also done on the components created that are not in the final system, and information on these can be found in the testing document, which is linked in section 1.3. #### Plate Extraction | | Test 1 | Test 2 | Test 3 | | -------- | -------- | -------- | ------ | | Images Tested | 50 | 50 | 300 | | Results | 50% | 85% | 95% | | Model Used | 1 | 2 | 2 | | Minimun % Probability | 50% | 35% | 33% #### Sample Result Image ![Sample Result Image](https://i.imgur.com/5lS0FhQ.jpg) #### Comments: Test 1 used our first model generated, this was created after about 16 hours of training on a CPU. We tested at this point, while the loss was still fairly high, in order to check the progress of the training. This was done as we were still quite new to NN training and were unsure when to stop training to avoid overfitting. Test 2 and 3 used our second model taken from training which seemed to perform the best of all the models we had generated. The minimun percentage probabilty is the lowest confidence level that the system will still consider correct. We found that 33% gave the best results, while minimizing false positives. #### Conclusion: After the second and the third test we were quite happy with the results, and felt that we could proceed to the next stage of our project, marking plate extraction as complete. Different tests carried out, postman, NN built in validation. Unsuccesfull tests will be explained in section 4, such as Char extract + recog neural networks training ### Character Recognition using Google Vision | | Test 1 | | -------- | -------- | | Images Tested | 300 | | Results | 97% | #### Comments: We set up google vision and used the same test images that we used on our previous OCR tests. One issue encountered was that the google vision OCR would pick up some very small characters on license plates and their surrounds, however we didn't consider this as a failure if it still correctly picked up the rest of the plate. #### Conclusion: We were very happy with these results, and decided to move forward with using google vision as our OCR engine. We would have liked to do more tests but we were limited with how much we could use google vision before we would incur charges. ### Complete System on Ngrok | | Test 1 | | -------- | -------- | | Images Tested | 50 | | Average RunTime| 3 Seconds| #### Comments: This test was also time from when the upload button was clicked, until the result was displayed. This was done while the site was live, and from a different machine, on a different LAN, than the one hosting the website. #### Conclusion: Using Ngrok to host on our own machines allowed us to implement our optimizations using multi-threading. This resulted in much better performance and we were far under our 5 second goal. We decided that we were happy with these results, and would use Ngrok as our hosting solution. ### 3.7 Data Flow Diagram ![Data Flow Diagram](https://i.imgur.com/b2X06qk.png) ******** ## 4. Problems and Resolutions ### 4.1 Purpose of this Section This section will cover some of the problems that we ecountered during development and how we resolved them. We will be looking at our initial plan for the system, and the functionality we wanted to achieve. We will then look at how our system changed from this initial plan while still perserving the same functionality as much as possible. ### 4.2 Initial Plan Our initial plan was to use a combination of two OCR engines that we developed. The first was to train a neural net to classify characters in an image and return the results with a confidence level. The second was to create an OCR engine using the k-NN algorithm to also classify a character in an image passed to it and also return a confidence level. The system would then combine these two estimates, and their confidence levels to give greater accuracy then either system alone. Both of these systems did require the license plate to extract from the original image beforehand, but this did not change from our inital design to the final implementation. ### 4.3 Issues with OCR **K-NN Implementation** Our implementation of an OCR engine using k-NN used the Scikit learn library for Python. At first it looked very promising, as we successfully trained multiple models, first only on numbers and then on both letters and numbers. We tested it on images of characters and it performed with over 80% accuracy. Extremely happy with this early success we thought that k-NN would be the the main OCR engine for our system. However, we then noticed that most of test images were from a very shallow angle, and the ones it did fail on were from a steeper angle. We trained a new neural net to identify characters in an image (without classifying them) and used this to obtain more varied images of characters from steeper angles and found that our models performed very poorly on these. After doing further research we found that the k-NN was simply not suitable for images of characters from extreme angles. Even in examples of projects online, such as some found on github, which claimed “success” in reading license plates using k-NN, they only ever showed the system working with ideal images taken from extremely shallow angles. As we had set out to create a system which could read plates in difficult conditions, and despite our inital success with k-NN, we finally decided that our models simply didn’t meet the needs which we had set out. **NN Implementation** To begin creating a CNN for OCR we built up a dataset of 400 images of plates and went through the long process of labeling every individual character in these images. We then trained this using Google Colab for 60 epochs, and got the loss under 6. We were worried that our dataset was too small to get the results we wanted, but we decided that it was big enough to use as a proof of concept before we invested more time into creating a full dataset. After training was completed, we tested this new model using both images it had already seen, and images that were new to it. It fell far short of our already low expectations. While it did succefully classify many of the characters in an image, it also classified it as many other incorrect characters. On any sample image it usually found more than 10 times the number of actual characters. We increased the required confidence percentage for an identified character to be considered correct. For our first NN we had this set to 35% and we rarely got false positives, but with this NN we set it to 95% and still encountered huge amounts of incorrectly identified characters, and about twice as many characters as were actually in the image. After these tests it was clear that our dataset was simply to small. This made sense as we had 36 labels to train on, 1 for every letter and number. This huge number of labels meant that despite having many photos and many characters in each photo, the total number of examples of an individual label was quite low. Due to the huge amount of work hours required to generate even a smaller dataset, we decided it was unreasonable to create a large enough dataset to train the NN to classify characters with accuracy we desired, in the time we had alloted. Despite these setbacks we were quite happy with what we learned. Although the new network didn’t work, it did show that it was possible to accomplish with our code, and we simply didn’t have the required time to generate a dataset that was big enough. ### 4.4 Google Vision After the issues we ran into with creating our OCR engine, we began looking at other options. We first set up Tesseract and tested this on our extracted license plates. The results from this did not inspire much confidence in Tesseract, and after a bit more research we found that it was more geared for use with handwritten character recognition. After this we done a lot of research into different options and the only one that looked like it would work for us was Google Vision.We tested on 10 images and with a 100% success rate we were very happy and relieved. We went on to test with more images and the only issue encountered was that we were also recognizing extremely small characters present on the license plate surrounds, in images with higher quality. With this success we decided to use Google Vision as our OCR engine, and started implementing it into the rest of our system. The one downside with this solution was there was a limit on the number of images we could scan each month, so we would have to be mindful not to go over this limit while testing. ******** ## 5. Installation Guide ### 5.1 Website To use the website, simply navigate to the correct web address [here](https:www.regplatefinder.com/), select the browse button to select an image (if on mobile you will be given an option to select image, use the camera to take one). Once the image is selected, press the upload button and you will be brought to the result page display your license plate number. ### 5.2 Running our Models On our gitlab repository, we have made available some of the different models that we trained during the development of this system. This section will be a brief walkthrough of how to install the necessary dependencies, and run the models to see there performance. Images to test these models on will not be provided. This system has only been tested to work on Ubuntu 18.04, with Python 3.7, with the versions of the libraries specified. Ubuntu comes with Python 3.7 already installed. Install the PIP package manager with the following command: ``` sudo apt install python3-pip ``` After this has been successfully completed, run the following commands to install necessary libraries. ``` pip3 install tensorflow==1.14.0 ``` ``` pip3 install opencv-python ``` ``` pip3 install keras ``` ``` pip3 install imageai --upgrade ``` ``` pip3 install scikit-learn==0.22.1 ``` ``` pip3 install Pillow ``` After the libraries are installed, download the models you wish to use and store them in a reasonable place. Run the following code in order to run the KNN models. The KNN models are a .pkl filetype. The correct file paths will need to be entered in the code below. ``` import numpy as np import os import scipy.ndimage from skimage.io import imread from skimage.feature import hog from skimage import data, color, exposure, transform from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.externals import joblib knn = joblib.load('Path to KNN model you wish to use') def predict_char(char): char = imread(char) #read in image char = transform.resize(char, (50,50)) #resize image char = hog(color.rgb2gray(char), orientations=8, pixels_per_cell=(10, 10), cells_per_block=(5, 5)) #get features of image probabilities = knn.predict_proba(char.reshape(1,-1))[0] #get probabilities of predictions return knn.predict(char.reshape(1,-1))[0] #return prediction for things in sorted(os.listdir("Path to directory for test images")): print(predict_char("Path to directory for test images"+things)) ``` Run the following code to run a CNN model. The CNN models are a .h5 filetype. In addition to the model, you will also need the correct JSON file, which will be provided along with the model. The correct file paths will need to be entered in the code below. ``` from imageai.Detection.Custom import CustomObjectDetection from skimage.io import imread import matplotlib.pyplot as plt import matplotlib.patches as patches path = "path to test images" def find(image, i): detector = CustomObjectDetection() detector.setModelTypeAsYOLOv3() detector.setModelPath("Path to model") #set model detector.setJsonPath("Path to Json file") #set json detector.loadModel() #load model detections = detector.detectObjectsFromImage(input_image="{}{}.jpg".format(image,i), output_image_path="Path to output directory/{}.jpg".format(i), minimum_percentage_probability=35) #get detections over 35% probabilty for detection in detections: print(detection["name"], " : ", detection["percentage_probability"], " : ", detection["box_points"]) #print detections label, probability, and location for i in range(0, "Number of images"): #iterate through files to test print(find(path,i), i) ``` ******** ## 6. References ### 6.1 References SciPy - https://www.scipy.org/ Pillow - https://python-pillow.org/ Google Vision - https://cloud.google.com/vision Ngrok - https://ngrok.com/ KNN - https://towardsdatascience.com/scanned-digits-recognition-using-k-nearest-neighbor-k-nn-d1a1528f0dea ImageAI - https://github.com/OlafenwaMoses/ImageAI Flask - https://palletsprojects.com/p/flask/ Dataset - 3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.