Investigate how to improve performace EKYC API

# Investigate how to improve performace EKYC API ## Problem: - How to maximizing GPU Utilization? + Running a single model per GPU may be inefficient. + Running multiple models on a single GPU will not automatically run them concurrently to maximize GPU utilization. - Enabling Real-Time and Batch Inference: + There are two types of inference. If our application needs to respond to the user in real-time, then inference needs to complete in real-time too. Because latency is a concern, the request cannot be put in a queue and batched with other requests. On the other hand, if there is no real-time requirement, the request can be batched with other requests to increase GPU utilization and throughput. ## Solution: 1. NVIDIA TensorRT Inference Server + [Video demo](https://www.youtube.com/watch?v=1DUqD3zMwB4&feature=youtu.be) + Summary: * Without TensorRT Inference Server ![](https://i.imgur.com/I7LUnGF.png) Performances 1200 request image classification: ![](https://i.imgur.com/HjHa16Y.png) Increase request image classification: ![](https://i.imgur.com/oqR1Uzs.png) => Bottleneck: 5000 images/s * Deploy on TensorRT Inference Server ![](https://i.imgur.com/K75mnuY.png) ![](https://i.imgur.com/ZFkGLqZ.jpg) ![](https://i.imgur.com/ltVYHcd.jpg) ==> Increase Bottleneck to 15000 images/s + How to apply TensorRT Inference Server - Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes [video](https://www.youtube.com/watch?v=SekmR9YH4xQ) - 2. Leverage Redis to efficiently batch process + [Video Demo solutions](https://youtu.be/1uoHYcMZ7nc) + Github example project: https://github.com/shanesoh/deploy-ml-fastapi-redis-docker + Github example 2 : https://github.com/stix121/keras-rest-api # References 1. [Easily Deploy Deep Learning Models in Production](https://medium.com/dataseries/easily-deploy-deep-learning-models-in-production-13db48071578) 2. [Deploy Machine Learning Models with Keras, FastAPI, Redis and Docker](https://medium.com/analytics-vidhya/deploy-machine-learning-models-with-keras-fastapi-redis-and-docker-4940df614ece)