AnyoneAI - Project 3

# AnyoneAI - Project 3 # Project 3: Flask ML API ## Resources: **Videos from AnyoneAI:** - Docker tutorial by Lisando ([Link](https://clipchamp.com/watch/bCA81WGefvf)), Luca has the second part - Docker tutorial by Manu ([Link](https://we.tl/t-DdYPzYqmfA)) - Sprint_3 tutorial by Lisandro 1 ([Link](https://we.tl/t-xgzR0gADWZ)) - Sprint_3 tutorial by Lisandro 2 ([Link](https://we.tl/t-5KhS0lnCSR)) - Sprint_3 tutorial by Lisandro 3 ([Link](https://we.tl/t-Zx1VPc8cRu)) - Manu explica todo el proyecto ([Link](https://we.tl/t-aGWxJxZIk1)) **Other:** - Consumo de memoria ram en Windows con Docker ([Link](https://jonaser.dev/consumo-de-memoria-ram-en-windows-con-docker/)) - Part 1 — End to End Machine Learning Model Deployment Using Flask ([Medium](https://medium.com/geekculture/part-1-end-to-end-machine-learning-model-deployment-using-flask-1df8920da9c3)) - Part 2 — End to End Machine Learning Model Deployment Using Flask ([Medium](https://medium.com/geekculture/part-2-end-to-end-machine-learning-model-deployment-using-flask-a73c977221ee)) - Part 3 — End to End Machine Learning Model Deployment Using Flask ([Medium](https://medium.com/geekculture/part-3-end-to-end-machine-learning-model-deployment-using-flask-43639a64a9db)) - How To Deploy Machine Learning Model with Docker Container ([Medium](https://audhiaprilliant.medium.com/how-to-deploy-python-flask-application-with-docker-c12089ba3cd1)) - Deploying your deep learning model using Flask and Docker ([Medium](https://medium.com/@limwz.daniel/deploying-your-deep-learning-model-using-flask-and-docker-c05a6d1d96a5)) - The Annotated ResNet-50 ([Medium](https://towardsdatascience.com/the-annotated-resnet-50-a6c536034758)) - How to Train Your ResNet: The Jindo Dog ([Medium](https://medium.com/analytics-vidhya/how-to-train-your-resnet-the-jindo-dog-50551117381d)) - Image Classification API Creation using Tensorflow, Flask, MongoDB ([Medium](https://bhashkarkunal.medium.com/image-classification-api-creation-using-tensorflow-flask-mongodb-61a53835e62d)) ## Installation To run the services using compose: **In Terminal (WSL):** - Change directory to `Project_3` > `cd /mnt/c/Users/Matías/Documents/GitHub/anyoneai/Sprint_3/Project_3` - Copy `.env.original` and paste as `.env` name > `cp .env.original .env` - Agarrar todos los `build` y crear (componer todas las partes de nuestra arquitectura) > `docker-compose up --build -d` ![](https://hackmd.io/_uploads/rJg-y_642.png) ![](https://hackmd.io/_uploads/SJE4kd6E3.png) ![](https://hackmd.io/_uploads/S1pSkupE3.png) - To stop the services > `docker-compose down` ![](https://hackmd.io/_uploads/S1dF1XaNn.png) ![](https://hackmd.io/_uploads/H1Ki1XTN3.png) File `docker-compose.yml`: - First service is `api` - Image name: `flask_api` - Container name: `ml_api` - ... - Depends on: `redis` and `model` - Second service is `redis` - Image: `redis:6.2.6` - Third service is `model` - Image name: `ml_service` - ... - Depends on: `redis` ![](https://hackmd.io/_uploads/SyuhyE6E3.png) The architecture consists of three services: **Flask API:** It is built from the `flask_api` image and runs a Flask application. It depends on the Redis service and the ML Service. It is accessible on port 80 of the host, which is mapped to port 5000 of the container. It mounts the `./feedback` and `./uploads` directories as volumes within the container. **Redis:** It uses the `redis:6.2.6` image and provides a Redis database. The Flask API service depends on it. **ML Service:** It is built from the `ml_service` image. It also depends on the Redis service. It mounts the `./uploads` directory as a volume within the container. Note: The `UID` and `GID` variables are used for setting the user and group ownership of the containers' processes, allowing them to match the host user's permissions. # Project ## `Model` folder Inside this module, complete: 1. `predict()` function under `model/ml_service.py` file. Then run the tests corresponding to this module and check if they are passing correctly. 2. Then, go for the `classify_process()` function also under `model/ml_service.py` file. - File `model/ml_service.py` ### ResNet-50 > **ResNet-50**, with its deep architecture and skip connections, has achieved impressive performance on various computer vision tasks, including image classification, object detection, and image segmentation. It has become a widely adopted and influential model in the field of deep learning. The `preprocess_input` and `decode_predictions` functions are utility functions provided by the Keras library, specifically in the `tensorflow.keras.applications.resnet50` module, when using the ResNet50 model. Here's what these functions do: 1. `preprocess_input`: - Purpose: Prepares an image for input to the ResNet50 model. - Usage: It takes a NumPy array representing an image and applies preprocessing operations such as mean subtraction and channel-wise normalization. These operations ensure that the image is preprocessed in the same manner as the original ResNet50 model was trained on the ImageNet dataset. - Input: A NumPy array representing an image. The array should have shape (height, width, channels) where channels typically represent RGB values. - Output: A preprocessed NumPy array ready to be used as input to the ResNet50 model. 2. `decode_predictions`: - Purpose: Decodes the predictions made by the ResNet50 model into human-readable class labels. - Usage: Given a set of predictions generated by the ResNet50 model, this function maps the predicted class indices to their corresponding class labels. - Input: A NumPy array representing the predictions from the ResNet50 model. Typically, this array has shape (batch_size, num_classes). - Output: A list of tuples, where each tuple contains the class label, the corresponding class name, and the probability of the prediction. By default, `decode_predictions` returns the top 5 predictions. Both of these functions simplify the process of working with the ResNet50 model and make it easier to preprocess input images and interpret the model's predictions. ### Dockerfile - File `model/Dockerfile` Your Dockerfile consists of multiple stages for building different aspects of your application. In the first stage (`base`), it starts with the `python:3.8.13` base image. It creates a non-root user, sets the `PYTHONPATH` and `PATH` environment variables, adds the `requirements.txt` file, and installs the dependencies using `pip3`. Then, it copies the contents of the current directory (`./`) into the `/src/` directory in the container. In the second stage (`test`), it uses the `base` stage as the starting point and runs the `pytest` command to execute tests located in the `/src/tests` directory. In the third stage (`build`), it again starts with the `base` stage. The `ENTRYPOINT` instruction sets the command that will be executed when the container is run, in this case, running the `ml_service.py` script with `python3`. By using multiple stages, you can separate the build process into logical steps and optimize the final image by discarding unnecessary files and dependencies from earlier stages. # Views ![](https://hackmd.io/_uploads/Hk0aQOIH3.png) ![](https://hackmd.io/_uploads/Skd1NdLB2.png) ![](https://hackmd.io/_uploads/SJuiEuLSn.png) The diagram illustrates the flow of data and control between different components: 1. The user interacts with the web browser and accesses the frontend. 2. The Flask application handles incoming requests and delegates them to appropriate routes. 3. The routes module contains several routes (`index`, `display_image`, `predict`, `feedback`) that handle specific URLs and request methods. 4. The utilities module provides various helper functions. 5. The model API communicates with an external model service to obtain predictions. 6. The `middleware` module performs some processing or filtering before passing the request to the Flask application. 7. The Model Service is an external service that handles the machine learning model and provides prediction capabilities. Please note that the diagram is a high-level representation and doesn't include all the details of the code implementation. It focuses on illustrating the overall structure and flow of data. ### views.predict() Certainly! Here's a line-by-line explanation of the code: ```python @router.route("/predict", methods=["POST"]) def predict(): ``` This defines a route `/predict` with the `POST` HTTP method. It associates this route with the `predict()` function, which handles the logic for this route. ```python rpse = {"success": False, "prediction": None, "score": None} ``` A dictionary `rpse` is created with initial values indicating failure (`"success": False`) and no prediction or score available. ```python if "file" in request.files and utils.allowed_file(request.files["file"].filename): ``` This condition checks if a file named "file" exists in the `request.files` object and if the file's extension is allowed. The `allowed_file()` function from the `utils` module is used to determine if the file extension is allowed. ```python file = request.files["file"] ``` The file object is assigned to the variable `file` for further processing. ```python file_hash = utils.get_file_hash(file) ``` The `get_file_hash()` function from the `utils` module is called to generate a unique hash for the file based on its content. The resulting hash is stored in `file_hash`. ```python dst_filepath = os.path.join(current_app.config["UPLOAD_FOLDER"], file_hash) ``` The destination file path is created by joining the `UPLOAD_FOLDER` path (obtained from the Flask application's configuration) with the `file_hash`. This path represents where the uploaded file will be saved. ```python if not os.path.exists(dst_filepath): file.save(dst_filepath) ``` If the destination file does not already exist, the `file` is saved to the `dst_filepath` using the `save()` method. ```python prediction, score = model_predict(file_hash) ``` The `model_predict()` function is called, passing in the `file_hash` as an argument. This function is responsible for sending the file to a machine learning model for prediction and returning the prediction result (`prediction`) and the confidence score (`score`). ```python rpse["success"] = True rpse["prediction"] = prediction rpse["score"] = score return jsonify(rpse) ``` If the file was valid and a prediction was obtained, the values in `rpse` dictionary are updated accordingly. `"success"` is set to `True`, and the `prediction` and `score` values are assigned from the results obtained from `model_predict()`. The updated `rpse` dictionary is then converted to a JSON response using `jsonify()` and returned. ```python return jsonify(rpse), 400 ``` If the file was not provided or was not valid, the default `rpse` dictionary with failure values is returned as a JSON response with an HTTP status code of 400 (Bad Request). # Utils The MD5 hashing algorithm is used in the `get_file_hash` function to generate a unique identifier for a file based on its content. Here's why it's used: 1. **Uniqueness**: MD5 hashes are highly unlikely to collide, meaning that different files are very unlikely to produce the same hash. This property allows us to generate a unique filename based on the file content. 2. **Consistency**: The MD5 algorithm always produces the same hash for the same input. This ensures that if the same file is uploaded multiple times, it will always result in the same hash and therefore the same filename. This consistency is important for tracking and managing files. 3. **Security**: MD5 is a cryptographic hash function, but it is considered to be weak for security purposes due to its vulnerability to collision attacks. However, in this context, the primary purpose of using MD5 is not security but rather generating a unique identifier for files. It is not used for security-critical operations like password storage or data integrity checks. In the provided code, the MD5 hash is used to generate a new filename for the uploaded file by appending the file hash with its original extension. This ensures that files with the same content will have the same filename, allowing for efficient storage and retrieval. Additionally, the original filename is not used directly, which helps prevent any potential security risks or conflicts with existing filenames. It's worth noting that if security is a concern in your application, it's recommended to use stronger hashing algorithms like SHA-256 instead of MD5. --- ## 1. Install a VirtualEnv for Local Run **In Terminal WSL:** - In the `'model'` and `'api'` folder: - Install `pipenv` unsing `pip` > `pip3 install pipenv` - Activate VirtualEnv > `python3 -m pipenv shell` - Deactivate > `exit` - Install from `pipfile` or `requirements.txt` > `pipenv install` **Errors:** - AttributeError: module 'collections' has no attribute 'MutableMapping' ([Link](https://stackoverflow.com/questions/70943244/attributeerror-module-collections-has-no-attribute-mutablemapping)) ## 2. Install Redis - Installing Redis ([Link](https://redis.io/docs/getting-started/installation/)) - Install Redis on Linux ([Link](https://redis.io/docs/getting-started/installation/install-redis-on-linux/)) **In Terminal WSL:** - `sudo apt install lsb-release` - `curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg` - `echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list` - `sudo apt-get update` - `sudo apt-get install redis` ## 3. Run Local **In Terminal WSL:** - Flask Fronted > `python3 app.py` ![](https://hackmd.io/_uploads/ryZ_apPH2.png) **In other Terminal WSL:** - ML Service > `python3 ml_service.py` **In other Terminal WSL:** - Start Redis > `sudo service redis-server start` - Stop Redis > `sudo service redis-server stop` - `redis-cli` ![](https://hackmd.io/_uploads/HJOtaTvBn.png) ### Code - Code `ml_service.py` and `middleware.py` - Run Redis local > `db = redis.Redis(host='127.0.0.1, port=6379, db=0)` - Open `'172.21.58.135:5000/'` from `python3 app.py` ![](https://hackmd.io/_uploads/B1Rc7RvSn.png) - After you `Submit` ![](https://hackmd.io/_uploads/S1ju7Awrn.png) - You can solve it making this in the **(model) Terminal WSL:** - Delete the "uploads" directory and all its contents, including any subdirectories and files within it, without asking for confirmation. > `rm -rf uploads` - Create a symbolic link named "static" inside the "/uploads" directory. This link will point to the "../api/static" directory or file, allowing you to access the files within "static" through the "/uploads/static" path. > `ln -s ../api/static/uploads` - `ll` ![](https://hackmd.io/_uploads/BJqZLCvSn.png) - Works! But... I can't display in the UI. ![](https://hackmd.io/_uploads/rJ6GPCwS2.png) - **Error solved:** you have to change the time of Redis. - `api/settings.py` and `model/settings.py` ``` API_SLEEP = 5.05 ``` - `model/ml_service` ``` _, msg = db.brpop(settings.REDIS_QUEUE, timeout=10.0) ``` ## 4. Run Docker - Code `ml_service.py` and `middleware.py` - Run Redis Docker > ``` db = redis.Redis( host=settings.REDIS_IP, port=settings.REDIS_PORT, db=settings.REDIS_DB_ID ) ``` **In Terminal WSL:** - Show Docker > `docker ps` ![](https://hackmd.io/_uploads/S10gRkOSh.png) - Docker kill container > `docker kill ml_api project_3-model-1 project_3-redis-1` - Show active docker > `docker ps -a` ![](https://hackmd.io/_uploads/ByaLyg_H2.png) - Remove all stopped containers. > `docker container prune` - Docker compose up > `docker compose up` - Show logs > `docker container logs ml_api` - Show bash > `docker exec -it ml_api bash` - Stop and remove the containers, networks, and volumes created by the Docker Compose project defined in the current directory's Docker Compose file. Additionally, it will delete any Docker images associated with the services, including the ones that are not actively running containers. > `docker-compose down --volumes --rmi all` - Useful when you make changes to your Dockerfile or any dependencies and need to rebuild the Docker images. It ensures that your images are up to date and reflect the latest changes in your project. It's typically used before running the containers with docker-compose up to ensure you have the latest images for your services. > `docker-compose build` - Useful when you want to start all the containers defined in your Docker Compose file. It provides an easy way to bring up your multi-container application and ensures that the services are properly connected and functioning together. > `docker-compose up` - **Use this** > `docker-compose up --build` ## 5. Test Model - Test the `model` ``` $ cd model/ $ docker build -t model_test --progress=plain --target test . ``` The command you provided is used to build a Docker image with the tag "test_model" from the Dockerfile in the current directory. The `--progress=plain` flag is used to display a plain text progress output during the build process. The `--target test` flag specifies that only the build stage named "test" should be built, excluding any preceding stages in the Dockerfile. **Error 1°:** - When I run > `docker build -t model_test --progress=plain --target test .` ``` docker build -t test_model --progress=plain --target test . Sending build context to Docker daemon 48.13kB Step 1/15 : FROM python:3.8.13 as base ---> a08150c12a68 Step 2/15 : ARG UID ---> Using cache ---> 680c73051129 Step 3/15 : ARG GID ---> Using cache ---> 2bdeb9cd18c3 Step 4/15 : RUN addgroup --gid $GID app ---> Running in 2879e0735ae3 Value "app" invalid for option gid (number expected) ``` **Solution:** - Run > `docker build -t test_model --progress=plain --build-arg UID=501 --build-arg GID=570 --target test .` **Error 2°:** - `uploads/` file doesn't exists so you have to create it in the `model` folder. - I couldn't find the way to create it automatically if does not exist. ![](https://hackmd.io/_uploads/BkGa4ybS3.png) **Error 3°:** ``` #14 2.977 ==================================== ERRORS ==================================== #14 2.977 _____________________ ERROR collecting tests/test_model.py _____________________ #14 2.977 tests/test_model.py:3: in <module> #14 2.977 import ml_service #14 2.977 ml_service.py:7: in <module> #14 2.977 import settings #14 2.977 settings.py:9: in <module> #14 2.977 os.makedirs(UPLOAD_FOLDER) #14 2.977 /usr/local/lib/python3.8/os.py:223: in makedirs #14 2.977 mkdir(name, mode) #14 2.977 E FileExistsError: [Errno 17] File exists: 'uploads/' ``` - `uploads/` file already exists > so you have to delete it in the `model` folder. ## 5. Test API ``` cd api/ docker build -t flask_api_test --progress=plain --build-arg UID=501 --build-arg GID=570 --target test . ``` ![](https://hackmd.io/_uploads/S1BKOrYB3.png) ## 6. Integration end-to-end **Terminal WSL:** - Levantar el docker antes. - Root > `cd Project_3` - Install requirements.txt > `pip3 install -r tests/requirements.txt` - Run > `python3 tests/test_integration.py` ![](https://hackmd.io/_uploads/SJaWBJorh.png) ## 7. Stress testing with *Locust* For this task, you must complete the file `locustfile.py` from the `stress_test` folder. Make sure to create at least one test for: - `index` endpoint. - `predict` endpoint. ### Test scaled services You can easily launch more instances for a particular service using `--scale SERVICE=NUM` when running `docker-compose up` command (see [here](https://docs.docker.com/compose/reference/up/)). Scale `model` service to 2 or even more instances and check the performance with locust. Write a short report detailing the hardware specs from the server used to run the service and show a comparison in the results obtained for a different number of users being simulated and instances deployed. - ~~Ejecutar el > `docker-compose up --build~~ - Make file > `stress_test/requirements.txt` - Change directory > `cd stress_test` - Install it > `pip3 install -r requirements.txt` **Terminal WSL:** - Open `Docker Desktop` - Trabajar con dos docker a la vez > `docker compose up --scale model=2` **In other Terminal WSL:** - Run > `locust` - Change to `http://localhost:8089/` if you can't open with `http://0.0.0.0:8089/` ![](https://hackmd.io/_uploads/Hy2NuPYr3.png) ![](https://hackmd.io/_uploads/B1qWZpf8h.png) - Charts/Reports ![](https://hackmd.io/_uploads/rk5LzTGIn.png) ![](https://hackmd.io/_uploads/HyY9oxlPn.png) ## 8. Optional Batch - Optional `ml_service` Here's an explanation of the `predict_batch(image_names)` function with comments added to the code: ```python def predict_batch(image_names): print("Launching ML service BATCH...") """ Load image from the corresponding folder based on the image name received, then, run our ML model to get predictions. Parameters ---------- image_name : str Image filename. Returns ------- class_name, pred_probability : tuple(str, float) Model predicted class as a string and the corresponding confidence score as a number. """ x_batches = [] # List to store preprocessed image batches for image_name in image_names: class_name = None pred_probability = None # Preprocess the image # Get the image from the UPLOAD_FOLDER path_image = settings.UPLOAD_FOLDER + "/" + image_name img = image.load_img(path_image, target_size=(224, 224)) # Convert the PIL image to a Numpy array x = image.img_to_array(img) # Append the preprocessed image to the batch list x_batches.append(x) # Convert the batch list to a Numpy array nx_batches = np.stack(x_batches, axis=0) print(f"batch_image: {nx_batches.shape}") # Scale pixel values of the batch x_batchs = preprocess_input(nx_batches) # Make predictions using the ML model preds = model.predict(x_batchs) print("number_of_predictions: ", len(preds)) outputs = [] for pred in preds: class_pred = {} # Expand dimensions of the prediction batch array batch_pred = np.expand_dims(np.array(pred), axis=0) # Decode predictions and extract class name and probability res_model = decode_predictions(batch_pred, top=1)[0] class_pred["class_name"] = res_model[0][1] class_pred["pred_prob"] = round(res_model[0][2], 4) outputs.append(class_pred) return outputs ``` This function takes a list of image names as input and performs the following steps for each image: 1. Load the image from the corresponding folder based on the image name. 2. Preprocess the image by converting it to a Numpy array and resizing it to a specific target size. 3. Add the preprocessed image to a batch list. 4. Convert the batch list to a Numpy array. 5. Scale the pixel values of the batch. 6. Use the ML model to make predictions on the batch of images. 7. Iterate over the predictions and extract the predicted class name and probability for each image. 8. Store the class name and probability in a dictionary and append it to the `outputs` list. 9. Return the `outputs` list containing the predictions for all images in the batch. The function also includes print statements for debugging purposes to display the size of the image batch and the number of predictions made. - If you want to run this change `classify_process_batch()` ![](https://hackmd.io/_uploads/HJsvmCzU2.png) --- Install to see containers in vscode redis-cli LPUSH list_demo "img_1" LRANGE list1 -5 -1 ![](https://hackmd.io/_uploads/ry7Z2vpBn.png) ![](https://hackmd.io/_uploads/H1H50_pr3.png) ![](https://hackmd.io/_uploads/H18WkKaSn.png)