Credit Risk Analysis - project

![](https://hackmd.io/_uploads/rkizSXi93.png) Badge [source](https://shields.io/) # ~~Key findings: Service capable of predicting the credit scores of people based on financial transactional information.~~ ## Authors - [Pedrio Del Bianco](https://github.com/pedroodb) (Tutor) - [Oscar Amarilla](https://github.com/Oscar-Amarilla) (Collaborator) - [Yessid Rios](https://github.com/YessidR) (Collaborator) - [Fabian Moreno](https://github.com/FabianmorenoDS) (Collaborator) - [Matias Gomez](https://github.com/tutecode) (Collaborator) - [Christian Suca](https://github.com/christian6000) (Collaborator) ## Table of Contents - [Executive Summary](#Executive-Summary) - [Overview](#Overview) - [Dataset Access and Structure](#Dataset-Access-and-Structure) - [Run Notebooks locally](#Run-Notebooks-locally) - [Run FastAPI App with Docker](#Run-FastAPI-App-with-Docker) ## Executive Summary **Industries:** *fintech*, *banks*, *logistics*, *delivery apps*, *insurtech*, *many others* **Technologies and Tools:** *Supervised Learning*, *Deep Learning*, *HTTP APIs (FastAPI)*, *Scikit-learn*, *Pandas*, *Numpy*, *TensorFlow*, *Docker*, *Redis*, *HTML*, *CSS*. **Difficulty level:** ⭐⭐⭐ (3/5) The fintech ecosystem has experienced rapid growth in recent years and established itself as a key actor to meet the demands and needs of financial consumers. Growth was fueled by increasing demand for financial services not provided by the traditional financial sector, and increased demand for digital financial services because of the COVID-19 pandemic. In the US, companies that have taken the lead in these industries are [Stripe](https://stripe.com/docs/radar/risk-evaluation), [Affirm](https://www.affirm.com/business/blog/alternative-underwriting), [Brex](https://www.brex.com/), [Marqeta](https://www.marqeta.com/platform/riskcontrol), among others. In the Latam region payments unit [Mercado Pago](https://www.mercadopago.com.ar/) from Mercado Libre, companies like [Nubank](https://nubank.com.br/en/), [Creditas](https://www.creditas.com/), [d-local](https://dlocal.com/), [Clip](https://clip.mx/), [Ualá](https://www.uala.com.ar/), [Clara](https://www.clara.com/), and most recent ones like [Pomelo](https://pomelo.la/en/) are growing fast and in constant need of data analysis and modeling for problems like credit risk analysis, fraud detection, customer's churn prediction or behavioral models to predict untimely payments. Credit risk modeling is one of the most common uses of machine learning within this industry, whose goal is to use financial data to predict default risk. When a business or individual applies for a loan, the lender must evaluate whether the business can reliably repay the loan principal and interest. The machine learning model learns from data (such as firm information, financial statements, previous transactions, previous credit history, etc) and can accurately predict repayment probability for a given business loan applicant. This type of solution has very wide applicability across not only fintech but also many business sectors and industries like logistics, banks, delivery apps, freight cargo companies, insurtech, etc.) and could be easily adapted to any other "risk estimation" business challenges. ## Overview In a nutshell, this project will result in an API service that is backed by a machine learning model (or many models) that will accept a financial profile and will be able to predict a credit risk score for that profile. The individual components for this project are very similar to what you have implemented in the last three Projects. You can reuse as much code as you want to leave yourself more time for working on the core dataset and model training. ## Deliverables **Goal:** The main objective of this project is to build a service capable of predicting the credit scores of people based on financial transactional information. To properly test how this model would behave in a real environment we will also ask you to propose and run a simulation of your model, showing each profile in the test dataset and asking if it would assign a credit or not to that person, at then evaluate if a bank or fintech using that model will end up making more money or losing it. In order to graduate from the ML Developer Career, you have to approve the Main Deliverables. You are also welcome to complete the Optional Deliverables if you want to continue to add experience and build your portfolio, although those are not mandatory. **Main Deliverables:** 1. Exploratory Dataset Analysis (EDA) Jupyter notebooks and dataset 2. Scripts used for data pre-processing and data preparation 3. Training scripts and trained models. Description of how to reproduce results 4. The model trained for a credit score prediction 5. A simulation of your model making predictions on a testing dataset, documentation about the results and the simulation process 6. API with a basic UI interface for demo (upload user's transactional data and return a score prediction) 7. Everything must be Dockerized and ready to be deployed **Additional Optional Deliverables:** 1. Service should be properly secured with token-based authentication 2. Ability to retrain the model "online" with new data added by users ## Dataset Access and Structure The dataset to build this model(s) can be found in S3. To access the dataset you will have to use the boto3 library and here are the key and secret key that you can use for read-only access. - **Dataset:** ... - **Key:** ... - **Secret:** ... **Note:** - The dataset has 54 variables on each field and 50.000 rows of training data. - The dataset can be downloaded to your system and experimented on your local development system given its size. - Confidentiality notice. Please don't upload it to any other services or public drives as this is a private dataset. ### Guide to Download dataset 1. To access the dataset you will have to use the **Boto3** library and here are the key and secret key that you can use for read-only access. 2. Install **AWS-CLI**, using this [guide](https://docs.aws.amazon.com/es_es/cli/v1/userguide/cli-chap-install.html). 3. Then configure AWS with "key" and "secret" using the command aws configure 4. You'll be prompted to enter your access key, secret access key, default region, and output format. in region and output only with None 5. Finally use copy command from aws to local directory `aws s3 cp <s3_url> <local_destination>` **For example using this code:** `bash aws s3 cp --recursive s3://mys3url/ ./local-folder/ ` * You will have this structure in your folder: - LeaderBoard_Data.zip - Leaderboard_Submission_Example.zip - PAKDD2010_Leaderboard_Submission_Example.txt - PAKDD2010_Modeling_Data.txt - PAKDD2010_Prediction_Data.txt - 'PAKDD-2010 training data.zip' - PAKDD2010_VariablesList.XLS - Prediction_Data.zip **References:** You can analyze the state of the art and make your own research on the subject. Below you will find couple of good references: - [A comparative study on machine learning models combining with outlier detection and balanced sampling methods for credit scoring - Hongyi Qian, Shen Zhang, Baohui Wang, Lei Peng, Songfeng Gao, You Song](https://arxiv.org/abs/2112.13196) - [Sequential Deep Learning for Credit Risk Monitoring with Tabular Financial Data - Jillian M. Clements, Di Xu, Nooshin Yousefi, Dmitry Efimov](https://arxiv.org/abs/2012.15330) ## Run Notebooks locally - First you have to run the `src/notebooks` folder: - Import dataset / Exploratory Data Analysis: `eda.ipynb` - Encoding and Model: `model.ipynb` > In the **helper_functions** folder you can find all the functions from the notebooks ### Install a VirtualEnv **Terminal PowerShell/WSL:** - If first time: - on **Linux** > `pip3 install virtualenv` - on **Windows** > `pip install virtualenv` - Make `VirtualEnv`: - on **Linux/Mac** > `python3 -m virtualenv env` - on **Windows** > `python -m virtualenv env` - Activate `.env` Virtual Environment - on **Linux/Mac** > `source env/bin/activate` - on **Windows** > `./env/Scripts/activate` - Install libraries from **requirements.txt** > - on **Linux/Mac** >`pip3 install -r ./requirements.txt` - on **Windows** > `pip install -r ./requirements.txt` ## Run FastAPI App with Docker - First open `Docker` **Terminal PowerShell:** 0. Change directory to `/credit-risk-analysis` project > `cd credit-risk-analysis` 1. Build services from `docker-compose.yml` > `docker-compose build` 2. Up services from `docker-compose.yml` > `docker-compose up` **Endpoints:** 1. Run `Home` page > `http://localhost:8000` 2. Run `index` page > `http://localhost:8000/index` 3. Run `prediction` page > `http://localhost:8000/prediction` # Team name / company name ideas - Stonkskable # Project ideas - Part 1 — End to End Machine Learning Model Deployment Using Flask ([Medium](https://medium.com/geekculture/part-1-end-to-end-machine-learning-model-deployment-using-flask-1df8920da9c3)) - Part 2 — End to End Machine Learning Model Deployment Using Flask ([Medium](https://medium.com/geekculture/part-2-end-to-end-machine-learning-model-deployment-using-flask-a73c977221ee)) - Part 3 — End to End Machine Learning Model Deployment Using Flask ([Medium](https://medium.com/geekculture/part-3-end-to-end-machine-learning-model-deployment-using-flask-43639a64a9db)) # 1. Install a VirtualEnv **Terminal:** - If first time: - on **Linux** > `pip3 install virtualenv` - on **Windows** > `pip install virtualenv` - Make `VirtualEnv`: - on **Linux/Mac** > `python3 -m virtualenv env` - on **Windows** > `python -m virtualenv env` - Activate `.env` Virtual Environment - on **Linux/Mac** > `source env/bin/activate` - on **Windows** > `./env/Scripts/activate` - Install libraries from **requirements.txt** > - on **Linux/Mac** >`pip3 install -r ./requirements.txt` - on **Windows** > `pip install -r ./requirements.txt` # Presentation ideas - Anyone AI [presentation](https://docs.google.com/presentation/d/1-Qw1GZgRSdzvAyKzAzCitjIoRC_D2Tzn/edit?rtpof=true&sd=true) - Introduction (first slide) - Agenda (second slide) start with a presentation about team (us), agenda, introduction, project overview (description about the project and introduction), 1st slide -> project title, names, fotos? ideas for project description (situation with banks and loans, using screenshots) and then provide a solution Create a story about the problematic and explain why the software is necessary.. CREATIVITY IS IMPORTANT... 25 - 30 slides (standard for any project) requirements: what we do/did (be technical not boring) whom are we directing the project/audience ![](https://hackmd.io/_uploads/S1n2F-5Kh.png) ![](https://hackmd.io/_uploads/r1bz9-cK3.png) Agenda Presentation: - First slide: Start a presentation about the team can you do a list of creative names. We are Oscar, Fabian, Matias, Christian and Yessid. Can you describe them as differentes superheroes of Machine Learning and Data Science. Certainly! Here's a list of creative superhero names for your team members, along with a brief description of their unique abilities in the realm of Machine Learning and Data Science: Oscar, the Data Dynamo: Possessing an incredible knack for data manipulation and analysis, Oscar wields his extraordinary powers of feature engineering and data preprocessing. With a single stroke of his keyboard, he transforms raw data into valuable insights, uncovering hidden patterns and trends. Fabian, the Algorithm Alchemist: Fabian possesses an uncanny ability to concoct powerful algorithms that can conquer complex problems. With his magical coding skills, he brews up innovative solutions, weaving together decision trees and random forests to unlock the secrets hidden within the data. Matias, the Neural Knight: Matias is a master of deep learning and neural networks. His proficiency in constructing and training intricate architectures empowers him to unravel the deepest layers of data. With his neural sword, he slices through complexity, enabling accurate predictions and classification. Christian, the Visualization Virtuoso: Armed with his visualization superpowers, Christian has the ability to transform complex data into captivating visual stories. His enchanting charts and interactive dashboards help stakeholders understand complex concepts at a glance, making data insights accessible to all. Yessid, the Model Magician: Yessid possesses an extraordinary talent for modeling and optimization. His magical touch fine-tunes machine learning models, achieving exceptional performance and accuracy. With a wave of his wand, he conjures up ensemble methods and hyperparameter tuning to create truly powerful predictions. These superheroes of Machine Learning and Data Science come together as an unstoppable team, combining their unique abilities to tackle the challenges of credit risk analysis. They are a force to be reckoned with, dedicated to harnessing the power of data and machine learning to transform the financial landscape. Feel free to use these creative superhero names and descriptions to introduce your team members and add an engaging touch to your presentation. 0. Start a presentation about the team can you do a list of creative names. We are Oscar, Fabian, Matias, Christian and Yessid and can you describe it as differentes superhoures of Machine Learning and Data Science. 1. Project Description: Credit Risk Analysis. 2. Scope and Main Deliverables. 3. Challenges. 4. Exploratory Data Analysis 5. Models 6. Hyperparameters tuning 7. Metrics 8. The arquitect of the services (API, ML Service, Redis) 9. Live Demo. 10. Conclusion and next steps # Data collection ## Install AWS CLI If you need to connect to AWS using the AWS CLI, you can configure it with your AWS credentials to authenticate and access AWS resources. Here's how you can set up AWS CLI and use it to interact with S3: 1. **Install AWS CLI**: If you haven't installed AWS CLI yet, follow the official documentation specific to your operating system to install it. You can find installation instructions here: [Installing the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) 2. **Configure AWS CLI**: After installing AWS CLI, you'll need to configure it with your AWS credentials. Open a command prompt or terminal and run the following command: ``` aws configure ``` It will prompt you to enter your AWS Access Key ID, AWS Secret Access Key, default region name, and default output format. Enter the corresponding values based on your AWS account and preferences. 3. **Download files from S3**: Once AWS CLI is configured, you can use the `aws s3 cp` command to download files from S3. Here's an example: ``` aws s3 cp s3://anyoneai-datasets/credit-data-2010/credit-data.csv ./credit-data.csv ``` Replace `s3://anyoneai-datasets/credit-data-2010/credit-data.csv` with the S3 object's location and `./credit-data.csv` with the destination file path on your local system. This command will download the file from the specified S3 bucket and store it locally. Make sure that you have the necessary permissions to access the S3 bucket and read the desired objects. By using the AWS CLI, you don't need to write code to interact with S3 directly. The AWS CLI provides a convenient command-line interface for working with AWS services. Let me know if you have any further questions or need additional assistance! # Structure Of course! I'd be happy to help you with your Credit Risk Analysis project using machine learning. Credit risk analysis is an important task in the financial industry, and machine learning techniques can be valuable in automating and enhancing this process. To get started, let's go through the typical steps involved in a credit risk analysis project: 1. **Data Collection**: Gather the necessary data for your analysis. This may include historical credit data, financial statements, customer information, and economic indicators. Ensure that the data is reliable and relevant to your analysis. 2. **Data Preprocessing**: Clean and preprocess the data to handle missing values, outliers, and inconsistencies. This step may involve data cleaning, data transformation, feature scaling, and encoding categorical variables. 3. **Feature Selection/Engineering**: Select relevant features that can potentially impact credit risk. You can also create new features based on domain knowledge or feature engineering techniques to improve the predictive power of your models. 4. **Model Selection**: Choose appropriate machine learning algorithms for credit risk analysis. Some popular models for this task include logistic regression, decision trees, random forests, gradient boosting methods (such as XGBoost or LightGBM), and neural networks. 5. **Model Training**: Split your data into training and validation sets. Train your chosen models on the training data and evaluate their performance using appropriate evaluation metrics (such as accuracy, precision, recall, F1-score, or area under the ROC curve). 6. **Model Evaluation**: Compare the performance of different models and select the best-performing one(s) based on your evaluation metrics. Consider factors like interpretability, computational complexity, and business requirements when choosing the final model. 7. **Model Deployment**: Once you have selected a model, deploy it in a production environment. Ensure that the model is well-integrated with existing systems and monitor its performance over time. 8. **Model Interpretation**: It's crucial to understand and interpret the predictions made by your model. This will help you explain the results to stakeholders and comply with regulatory requirements, such as explainability of AI models in certain industries. 9. **Model Maintenance and Monitoring**: Continuously monitor the performance of your deployed model and update it as needed. Monitor for concept drift, changes in data distribution, or changes in the risk landscape. Remember, the success of your credit risk analysis project depends on the quality of your data, the choice of appropriate models, and continuous evaluation and improvement of your models. It's also essential to adhere to regulatory guidelines and industry best practices throughout the process. If you have specific questions or need further assistance with any of these steps, feel free to ask! # Part 2 — End to End Machine Learning Model Deployment Using Flask ## How to build a user interface for loan approval prediction app and set up the backend using flask After developing a robust machine learning model using xgboost in the [previous part](https://medium.com/geekculture/part-1-end-to-end-machine-learning-model-deployment-using-flask-1df8920da9c3), in this article, we will focus our attention on the development of the user interface and the backend logic. Thus, we will discuss a lot about the HTML, CSS, and Python logic using flask. All of the necessary files are able to be downloaded so you can demonstrate this tutorial on your own computer. Good luck! Let’s scroll down. --- ### Our motivation As data scientists, we’ve developed a machine-learning model for loan approval prediction. Now, we must create a friendly user interface for the loan approval prediction app which will be used by the user to automatically make an initial decision about the customer loan request. Fortunately, there are many websites that provide the HTML template just for free. As a developer, we don’t even need to design both HTML and CSS from scratch. For this tutorial, we use the template that comes from [colorlib](https://colorlib.com/wp/template/colorlib-regform-7/). For the tutorial, you can first download the following repo. It also includes the documentation of the previous article — the notebook and pickle files. > Github repo - [Loan approval prediction app](https://github.com/audhiaprilliant/loan-approval-prediction) ### Project initiation If you have read the previous article, you might notice that there are some files that are not yet available like HTML and CSS files for user interfaces. These files are included in this tutorial in the appropriate folders. - `static/*` — contains files that will be used for the templates. It can be CSS files, fonts, images, JavaScript files, etc. Note that inside the static folder, we can create another folder (single or multiple folders), for instance in our case there is a folder for CSS, fonts, and images. - `static/css/*` — store all CSS files to customize the web page theme, design, layout, etc. - `static/fonts/*` — has two folders, namely `material-icon` and `poppins`. As we can see, the former folder has another folder namely `css` (contains configuration for designing the font style) and `fonts` (list of font files used on our web page). However, the latter also contains a list of font files for the web page. - `static/images/*` — is a folder used for storing images used in our web pages, starting from the icon file as a logo to images as a part of contents or illustration. - `template/*` — is intended for storing templates which are in HTML files. We would create 3 templates for our web page where they have their own purpose in providing information to users. - `template/index.html` — is like a home page and in our case is intended to be a loan approval form in which users will fill the necessary questions for the customers who want to request a loan. - `template/pediction.html` — will be showing the result of loan approval prediction whether customers who request a loan are granted or not. Therefore, this page will appear only if a user has completed filling a form in `index.html`. - `template/error.html` — is triggered if something beyond expectation in our script happens. Instead of showing up the 404 error page, it’s nice to have this error page. Thus, a user can proceed to the main page. - `app.py` — Python script for running the loan approval prediction app. This file is created in this tutorial ``` LOAN APPROVAL PREDICTION APP ├── bin │ └── xgboostModel.pkl ├── data │ ├── columns_set.json │ ├── loan_test.csv │ └── loan_train.csv ├── notebook │ └── Machine Learning Model Dev.ipynb ├── static │ ├── css │ │ └── style.css │ ├── fonts │ │ ├── material-icon │ │ │ ├── css │ │ │ │ ├── material-design-iconic-font.css │ │ │ │ └── material-design-iconic-font.min.css │ │ │ └── fonts │ │ │ └── Material-Design-Iconic-Font.tff │ │ └── poppins │ │ └── poppins-v5-latin-300 │ └── images │ ├── favicon.ico │ ├── signin-image.jpg │ └── signup-image.jpg ├── template │ ├── error.html │ ├── index.html │ └── prediction.html ├── .gitignore ├── app.py ├── README.md └── requirements.txt ``` ### Build user interface for web application As mentioned in the previous section, we create 3 HTML files, for the home page, prediction page, and error page. They are having a specific purpose in our app. But, for this tutorial, two main pages to focus on are the `index.html` and `prediction.html`. First, for the `index.html` file. In `line 26`, you might see that the HTML form tag has two main attributes `action` and `method`. - `action` — determines where the data that has been input by the user will be sent. In this case, we will send the data to `prediction.html` thus we declare the location of this file in the form tag `action="/prediction`". However, we can specify the location using `url_for`. - `method` — a method used to send the data to `prediction.html`. By default, it will be a `GET` method. In this case, we can use the `POST` method so we declare `method="POST"`. You can read more about the HTTP request methods [here](https://www.freecodecamp.org/news/http-request-methods-explained/). To map the data coming from `index.html` using a `POST` method to `app.py` file, we must specify the attribute of `name` in the HTML input tag. To understand, look at the snippet. ``` # index.html <form action="/prediction" method="POST">  <div class="form-group"> <label for="name"></label> <input type="text" name="name_user" placeholder="Your Name"> </div> . . . </form> ``` The data input by users in the HTML input tag with an attribute of `name="name_user"` will be mapped in `app.py` as follows. ``` # app.py @app.route('/prediction', methods = ['POST']) def predict(): if request.method == 'POST': name = request.form['name_user'] . . . ``` To show the result processed in `index.html`, this is done by syntax in line 119. We try to render `prediction.html` file and pass the parameter of `prediction`. To inspect the `index.html` file, you can download it using the following link. > [HTML file](https://raw.githubusercontent.com/audhiaprilliant/loan-approval-prediction/main/template/index.html) for a loan approval form ![](https://hackmd.io/_uploads/ByxWSEmch.png) The `pediction.html` file is for showing the prediction of customer loan approval status. In `line 32`, the prediction statement appears with a value that has been passed by the parameter of `prediction` in `index.html`. > [HTML file](https://raw.githubusercontent.com/audhiaprilliant/loan-approval-prediction/main/template/prediction.html) for a prediction page ![](https://hackmd.io/_uploads/HJl1UH4mch.png) A page for showing the loan approval prediction (Image by Author) In `error.html`, there is just a hypertext reference to navigate users to the main page (`index.html`) if something errors happen. > [HTML file](https://raw.githubusercontent.com/audhiaprilliant/loan-approval-prediction/main/template/error.html) for an error page if something erros happen. ![](https://hackmd.io/_uploads/SJ16BN7qn.png) A page for the condition when the errors happen (Image by Author) ### Set up the backend of a web application using flask The `app.py` has a hundred line codes to run the application. We will talk about line by line — the logic, required input, expected outputs and how these codes build the application. In `line 1–15`, we try to import the necessary packages to develop a `app.py` file for the loan approval prediction app. These packages are as follows. - `flask` — a web application framework written in Python. It has a lot of functionality to develop a web application with only using a few lines of codes in Python. - `pandas` — this package is useful in data frame manipulation. Inside the core Python script, we will do some data manipulation to produce the desired output for predictions. - `numpy` — it’s often used to help `pandas` to do data frame manipulation. In a simulation, data manipulation with `numpy` is able to save computational costs in programming compared to `pandas`. However, in this case, `numpy` purpose is only for a helper. - `logging` — try to monitor every task in our script, for instance, to check if something error happens somewhere in our script, it will send a log message so we can directly and easily trace and debug the error. In working with web development, having a logging system can save our time in working with debugging. - `joblib` — used for loading a machine learning model in a pickle file. - `json` — commonly used in Python to parse, read, and write a JSON file. In this tutorial, it only has a single purpose which is to load a column schema of `column_set.json` to map the data that has been input by a user via a loan approval form. - `sys` — provides functions and variables to Python runtime env. - `os` — this package helps us interact with the operating system, such as how to change directory, create a new folder, remove folders, create a list of folders or files within a specific folder, etc. `Line 18` try to store a root project directory into a variable namely `current_dir`. It’s similar to the directory where the `app.py` file is located which we called is as `/` (re: slash). After importing the necessary packages, our first task to do is to create a flask object by running syntax in `line 21`. In this syntax, we declare the directory for both *static* and *template*. In `line 24–25`, we try to add a logger in our `app.py` script. Read more about logging [here](https://docs.python.org/3/library/logging.html). `Line 28–37` is a function to help us predict the loan approval request in which we must pass the data frame in the argument of `data`. In `line 30`, we specify the directory where the machine learning model is located, which is in `bin` folder. Next, we concatenate the model directory with root using syntax in `line 32`. In `line 36`, after the model has been loaded into Python env, it will try to predict the loan status. In the template folder, we have 3 HTML files, one of them is `index.html` which will be the root URL. The function of `app.route()` is to map the function into a specific URL. In this case, `line 40–42` try to map the function of `home()` which is `index.html` file into a root URL (`/`). In general, `line 46–124` is written to process the data that has been input by the user in the loan approval form and pass the result on a new page of `prediction.html`. A detailed description of the syntaxes is listed below. - `Line 47` for filtering the data coming from loan approval loans if it’s in the POST method. Other than that, the below script will not be run but directed to the error notification page (`error.html`). - `Line 49–60` map the data from the loan approval form into specific variables, for instance, the data having `id="name"` in `index.html` will be stored in a variable of `name`. - In `line 64–69`, we try to load the data schema in `columns_set.json`. It’s useful to map the data that has been parsed in `line 49–60`. This schema is stored in a variable of `schema_cols`. - By running syntaxes in line `73–80`, we try to parse the value in dependent columns into a specific key in `schema_cols`. - `Line 82–89` is similar with the `line 73–80` but made for parsing the value in property area columns into a specific key in `schema_cols`. - Other columns besides dependent and property area are parsed directly into specific keys in `schema_cols`. That is carried out by running the syntaxes in `line 91–100`. - From data in `schema_cols`, we try to convert them into a data frame using syntaxes in `line 103–106`. It produces a data frame having only a single row, namely `df`. - In `line 110`, we try to predict loan status from a customer whose data has been stored in `df` using a Python function of `ValuePredictor`. The prediction is stored in a variable of result — it’s only a value of either 1 or 0. - As the result appears, we will show this result on the prediction page which states whether the loan has been granted or not by a bank. If the result equals 1, then the loan is not granted and otherwise — the statement is stored in a variable of `prediction`. It is what `line 113–116` do. - Finally, after all, values have been mapped, `line 119` try to pass the variable of `prediction` into a prediction page. https://gist.githubusercontent.com/audhiaprilliant/1071f0d2cfd595ae6e4e216e67366d6a/raw/73b50e82eaf34b52b68157dc79f441da3daa621c/machine_learning_deployment_13.py ```python # Flask from flask import Flask, render_template, request # Data manipulation import pandas as pd # Matrices manipulation import numpy as np # Script logging import logging # ML model import joblib # JSON manipulation import json # Utilities import sys import os # Current directory current_dir = os.path.dirname(__file__) # Flask app app = Flask(__name__, static_folder = 'static', template_folder = 'template') # Logging app.logger.addHandler(logging.StreamHandler(sys.stdout)) app.logger.setLevel(logging.ERROR) # Function def ValuePredictor(data = pd.DataFrame): # Model name model_name = 'bin/xgboostModel.pkl' # Directory where the model is stored model_dir = os.path.join(current_dir, model_name) # Load the model loaded_model = joblib.load(open(model_dir, 'rb')) # Predict the data result = loaded_model.predict(data) return result[0] # Home page @app.route('/') def home(): return render_template('index.html') # Prediction page @app.route('/prediction', methods = ['POST']) def predict(): if request.method == 'POST': # Get the data from form name = request.form['name'] gender = request.form['gender'] education = request.form['education'] self_employed = request.form['self_employed'] marital_status = request.form['marital_status'] dependents = request.form['dependents'] applicant_income = request.form['applicant_income'] coapplicant_income = request.form['coapplicant_income'] loan_amount = request.form['loan_amount'] loan_term = request.form['loan_term'] credit_history = request.form['credit_history'] property_area = request.form['property_area'] # Load template of JSON file containing columns name # Schema name schema_name = 'data/columns_set.json' # Directory where the schema is stored schema_dir = os.path.join(current_dir, schema_name) with open(schema_dir, 'r') as f: cols = json.loads(f.read()) schema_cols = cols['data_columns'] # Parse the categorical columns # Column of dependents try: col = ('Dependents_' + str(dependents)) if col in schema_cols.keys(): schema_cols[col] = 1 else: pass except: pass # Column of property area try: col = ('Property_Area_' + str(property_area)) if col in schema_cols.keys(): schema_cols[col] = 1 else: pass except: pass # Parse the numerical columns schema_cols['ApplicantIncome'] = applicant_income schema_cols['CoapplicantIncome'] = coapplicant_income schema_cols['LoanAmount'] = loan_amount schema_cols['Loan_Amount_Term'] = loan_term schema_cols['Gender_Male'] = gender schema_cols['Married_Yes'] = marital_status schema_cols['Education_Not Graduate'] = education schema_cols['Self_Employed_Yes'] = self_employed schema_cols['Credit_History_1.0'] = credit_history # Convert the JSON into data frame df = pd.DataFrame( data = {k: [v] for k, v in schema_cols.items()}, dtype = float ) # Create a prediction print(df.dtypes) result = ValuePredictor(data = df) # Determine the output if int(result) == 1: prediction = 'Dear Mr/Mrs/Ms {name}, your loan is approved!'.format(name = name) else: prediction = 'Sorry Mr/Mrs/Ms {name}, your loan is rejected!'.format(name = name) # Return the prediction return render_template('prediction.html', prediction = prediction) # Something error else: # Return error return render_template('error.html', prediction = prediction) if __name__ == '__main__': app.run(debug = True) ``` ### Run and test the application The loan approval prediction app development has been accomplished and after that, it would be application testing. Thus, we are going to run this app using a terminal. Firstly, open our terminal (Linux) or CMD in Windows and export the `FLASK_APP` env variable with our main Python script name— `app`. On some occasion when the `FLASK_APP` is not been set yet, this automatically try to add `app` or `wsgi` Python file into config. ``` $ export FLASK_APP=app ``` The FLASK_ENV is set to development to enable the debug mode which will help us improve the application. However, in production, this env variable must be set to production. ``` $ export FLASK_ENV=development ``` To run the application in development mode, we can use the command of `flask run` in the terminal. This command can find the app automatically because we’ve set the env variable in `FLASK_APP` and `FLASK_ENV`. ``` $ flask run ``` ![](https://hackmd.io/_uploads/SJAa6mPc2.png) This is how our loan approval prediction app works! ![](https://hackmd.io/_uploads/Sk5AT7wch.png) # Part 3 — End to End Machine Learning Model Deployment Using Flask > How to deploy a flask application to Heroku via Heroku CLI and git After we have created a user interface for loan approval prediction in the previous article, the app can be tested on our local computer to check for any errors and result in expectations. However, our goal has yet to be reached because it is not deployed. It means that the app must be running on the user’s computer. That’s not really efficient for business. So, in this article, we will try to deploy our loan approval prediction in Heroku — platform as a service (PaaS). This article will cover how to register, the prerequisites, and setup preparation for deployment. ---------- ### Our motivation To optimize the operational activity in determining whether a customer who applies for a loan is granted or not, as Data Scientists, we understand that non-technical users don’t need to undertake technical activities. Thus, the application must be deployed on a server or cloud and non-technical users are given access to it. In the implementation, the loan approval verification becomes more efficient and operational cost is reduced significantly. This is the last part of the series of loan approval prediction projects. Heroku is selected as the online platform for application deployment. In a real case, we should consider other options like DigitalOcean, Amazon Web Services (AWS), Google Cloud Platform (GCP), etc. > Github repo - [Loan approval prediction app](https://github.com/audhiaprilliant/loan-approval-prediction) ![](https://hackmd.io/_uploads/HyUd85uqh.png) ### Prerequisites To follow the tutorial thoroughly, we should install `git` on our local computer and sign up for `Heroku`. Practically we can deploy a Python app using GitHub by configuring GitHub integration in the Deploy tab in the Heroku dashboard. However, in this article, we will try to demonstrate how to deploy a Python app using `git`. - `git` — open-source version control system that enables us to track and record app or script development. Developers around the world widely use it. Using this software, we can also collaborate with people just like writing in Google Docs, Google Slides and so on - `Heroku` — platform as a service (PaaS) that can be used to run applications in the cloud. Heroku is based on [Amazon Web Services (AWS)](https://aws.amazon.com/). Developers can use Heroku for free, often to deploy personal projects or minimum viable product (MVP) for product pitches. Before using Heroku, you must [sign up](https://signup.heroku.com/). However, for business needs, we must upgrade the pricing ### Project initiation There is only one additional file that must be included in the LOAN APPROVAL PREDICTION APP folder, namely `Procfile`. You may notice that we use gunicorn for Python app deployment. It is a Python WSGI HTTP server that is needed to run the Python app because a traditional web server, like Apache, can not run them. > Note — WSGI is an acronym of Web Server Gateway Interface - `Procfile` — it means a process file that is a single file that declares the commands executed by the Heroku app on startup. It’s required for the Heroku app. It only contains one line with a format as follows ``` web:gunicorn module_name_where_app_instance_exists:name_of_app_instance ``` - `runtime.txt` — a text file to specify the Python version to run the scripts. Sometimes, if we don’t create this file, the deployment will meet an error in the Python version ``` LOAN APPROVAL PREDICTION APP ├── bin │ └── xgboostModel.pkl ├── data │ ├── columns_set.json │ ├── loan_test.csv │ └── loan_train.csv ├── notebook │ └── Machine Learning Model Dev.ipynb ├── static │ ├── css │ │ └── style │ ├── fonts │ │ ├── material-icon │ │ │ ├── css │ │ │ │ ├── material-design-iconic-font.css │ │ │ │ └── material-design-iconic-font.min.css │ │ │ └── fonts │ │ │ └── Material-Design-Iconic-Font.tff │ │ └── poppins │ │ └── poppins-v5-latin-300 │ └── images │ ├── favicon.ico │ ├── signin-image.jpg │ └── signup-image.jpg ├── template │ ├── error.html │ ├── index.html │ └── prediction.html ├── .gitignore ├── app.py ├── Procfile ├── README.md ├── requirements.txt └── runtime.txt ``` ### Install git on the local computer For Windows users, you can go to the [git download page](https://git-scm.com/downloads) and install it on your local computer. In order to open the git, you can right-click on any folder and the option `Git Bash` Here will appear (it will show the git bash terminal). While if you are a Linux user, you can open the terminal and run the following command. ``` $ sudo apt-get update $ sudo apt-get install git ``` To ensure that the installation is successful, we can run the `git --version` command. If the git version appears, the git is installed. ``` $ git --version ``` ### Install Heroku CLI on a local computer After installing git, we need to install Heroku by running the following commands on the terminal or if you are a Windows user, in the git bash terminal. It will help us connect local git with remote Heroku. ``` $ curl https://cli-assets.heroku.com/install.sh | sh ``` Just like with git, to ensure that the installation is done, we can run the `heroku --version` command. The Heroku version will appear as if the installation is successful. ``` $ heroku --version ``` ### Login to Heroku via CLI Remember that we signed up for Heroku in the previous section. By running the following command, you will be asked to open a browser to log in, just hit enter. ``` $ heroku login ``` ![](https://hackmd.io/_uploads/S1T-_c_qh.png) After the browser opens, type your username & password to log in. Now, the Heroku commands we run in the terminal will affect the remote Heroku. ![](https://hackmd.io/_uploads/BkDfu5u5n.png) > The Heroku interface after logging in to Heroku via CLI (Image by Author) ### Deploy Python app in Heroku via CLI For the application in Heroku, we should create a unique app name that will be the app address in the form `http://[app name].herokuapp.com/`. However, if another user already uses our app name, we are asked to enter it again with another name (unique). ``` $ heroku create -a loan-prediction-form ``` Once, the previous command runs and it’s successful, the following output will appear on our terminal. It generates two outputs: - Initial application address - Empty remote repository on Heroku ![](https://hackmd.io/_uploads/HJy3u9u9h.png) Output from Heroku CLI after creating a new app (Image by Author) Our next task is to create a local repository of our application script, in the LOAN APPROVAL PREDICTION APP folder. By running the following command on the terminal, we create an empty git local repository which generates a `.git` folder for git operation of versioning and collaboration. ``` $ git init ``` ![](https://hackmd.io/_uploads/rk6auc_cn.png) Output from git after initialization (Image by Author) Since we initialize git in the existing project, all files and folders are still in a working stage or known as a working directory— not untracked. The command of `git add .` make all the untracked files and folders move into the staging stage — tracked. Furthermore, the `git commit` command pushes files and folders into a local repository. ``` $ git add . && git commit -m "Initial commit" ``` As the remote and local repositories are already created, we must make a *connection* between them. It is intended that once we modify files in a local repository, we can easily inform the modification to a remote repository and the application changes. ``` $ heroku git:remote -a loan-prediction-form ``` ![](https://hackmd.io/_uploads/r1PGtcdq3.png) Output from Heroku CLI after creating Heroku git remote (Image by Author) The `git remote -v` tells us about the remote repository for fetching and pushing the changes made in local or remote repositories. If the project also connects to Github, we can find the GitHub repository using this command. `$ git remote -v` ![](https://hackmd.io/_uploads/SJSrK9_c2.png) Output from git indicating that the project has connected with Heroku git remote (Image by Author) The final task is to push the existing project into a remote repository on Heroku. The `git push` command also triggers the application deployment on Heroku. If we notice the output in the terminal, once the existing project is pushed into Heroku, the deployment begins by first checking the Python version written in `runtime.txt`. For the free tier, Heroku has a limitation on storage (max 500MB) and the number of applications deployed (up to 5 apps). Since our app size exceeds the soft limit (300MB), it will affect the boost time. ``` $ git push heroku master ``` ![](https://hackmd.io/_uploads/SkXFKqd5h.png) Sample output indicating that the deployment in Heroku is successful (Image by Author) # How to Deploy Machine Learning Model with Docker Container Basic understanding of docker and how to run Python flask application as a docker container Do you want to simplify your Python Flask application deployment and ensure consistent performance across different environments? Look no further! Dive into the world of Docker and learn how to deploy your Flask application with ease. Unleash the power of containerization and create portable, reproducible, and efficient deployments. In this article, we’ll walk you through the entire process, from setting up Docker to deploying your Flask application. Discover the game-changing benefits of Docker for Flask deployment and elevate your application development to new heights. ### Introduction to Docker A Docker container is a lightweight and portable software package that contains everything needed to run a piece of software, including the code, runtime, system tools, and libraries, in a self-sufficient, isolated environment. It is created from a Docker image, which is a snapshot of a pre-configured system that can be run on any machine with Docker installed. Think of a Docker container as a virtualized, standalone executable package that encapsulates an application and all its dependencies, allowing it to run consistently across different environments without being affected by the underlying host system. Containers are isolated from each other and from the host system, which provides consistency and reproducibility in software development and deployment. Let me describe the Docker container with the analogy for children. Imagine you have a toy box with lots of different toys, such as action figures, cars, and building blocks. Each toy has its own special features, like the color, shape, and how it moves. Now, imagine you want to play with your toys at your friend’s house, but you can’t take your entire toy box with you. Instead, you decide to pack a small bag with just a few toys that you need for your playdate. In a similar way, a Docker container is like a small bag that holds a special toy (i.e., an application) and everything it needs to run, such as its own set of instructions (like how the toy should be played with), tools (like batteries or tools needed to assemble the toy), and accessories (like extra parts or add-ons for the toy). Just like you can easily carry the small bag of toys to your friend’s house and play with it there, Docker containers can be easily moved and run on different computers without having to carry around all the extra stuff that the toy (or application) doesn’t need. ### Docker functionalities Docker containers offer a wide range of functionalities that make them a popular choice for modern software development and deployment. Some of the key functionalities of Docker containers include: Isolation — Containers provide an isolated environment for running applications, which means that the dependencies, libraries, and configurations of one container do not interfere with other containers or the host system Portability — Docker containers are portable, which means that they can be easily moved and deployed across different systems, environments, and platforms without modification Efficiency — Containers are lightweight and efficient, as they share the host system’s OS kernel and only package the application-specific dependencies Flexibility — Containers provide flexibility in terms of packaging and versioning of applications and dependencies Reproducibility — Docker containers allow for reproducible builds, where the application and its dependencies are packaged into a single container image with a defined configuration Scalability — Containers are designed for scalability, as they can be easily replicated and scaled horizontally to handle varying levels of workload Easy Deployment — Docker containers simplify the deployment process by encapsulating all the dependencies and configurations needed to run an application into a single container image ### Docker components To be familiar with Docker container, you should understand the basic components when you are interacting with Docker environment. The following list will cover only the components that you deal with in this article. Docker Image — it is a snapshot of a pre-configured system that contains all the dependencies, libraries, and configurations needed to run an application Docker Container — container is a lightweight, standalone, and executable package created from a Docker image. It encapsulates an application and all its dependencies, including the runtime, libraries, and configurations, in a self-contained environment Docker Registry —Docker registry is a central repository for storing and sharing Docker container images. Docker Hub is the default public Docker registry provided by Docker, and it hosts a vast collection of Docker images contributed by the Docker community ### Prerequisites Before running a containerized Python app using Docker, there are a few prerequisites that you need to have in place: Docker Hub Account (optional) — If you want to store and share your Docker images with others, you may need to create an account on Docker Hub (https://hub.docker.com/). Docker Hub is the default public Docker registry that hosts Docker images, and it allows you to push and pull Docker images to and from the registry Python App Code — You need to have the code for your Python app ready. This includes all the dependencies, libraries, and configurations required to run your app. Make sure your app code is organized and ready to be packaged into a Docker image Dockerfile — You need to have a Dockerfile that defines the configuration for building the Docker image of your Python app. The Dockerfile specifies the base image, application code, dependencies, and other configurations required for running the app in a container Docker CLI (Command-Line Interface) — You should be familiar with the Docker CLI to interact with Docker Engine and manage Docker containers. This includes commands for building Docker images, running containers, managing volumes and networks, and monitoring containerized apps Once you have these prerequisites in place, you can use Docker to create a Docker image of your Python app, run it as a container, and deploy it in any environment that supports Docker, making your app portable and scalable. How to install Docker ``` sudo apt update sudo apt install apt-transport-https ca-certificates curl software-properties-common ``` - Add Docker repository ``` curl -fsSL https://download.docker.com/ubuntu/gpg | sudo gpg ``` - Install Docker ``` sudo apt update sudo apt install docker-ce ``` - Start and enable Docker service ``` sudo systemctl start docker sudo systemctl enable docker ``` - Verify Docker installation ``` docker --version ``` That’s it! You now have Docker installed and ready to use on your Ubuntu machine. You can run Docker commands to create and manage Docker containers, build Docker images, and run containerized applications. Setting up the folders and files for Flask application Here’s a description of the files in the “LOAN APPROVAL PREDICTION APP” directory: bin— This directory contains a single file named xgboostModel.pkl, which is a pickled machine learning model file. It likely contains a trained XGBoost model for loan approval prediction data— This directory contains three files: columns_set.json , loan_test.csv and loan_train.csv static— This directory contains three subdirectories: css (contains CSS files used for styling the web application), fonts (contains font files used in the web application, including material-icon and poppins font families) and images (contains image files used in the web application, including favicon.ico, signin-image.jpg, and signup-image.jpg) template— This directory contains three HTML files: error.html (displays an error page for the web application), index.html (represents the main page or home page of the web application) and prediction.html (represents the page for displaying loan prediction results in the web application) app.py— This is a Python script file that likely contains the main application logic for the web application, including handling HTTP requests, processing data, and running the machine learning model for loan approval prediction requirements.txt— This is a text file that likely contains a list of Python dependencies or packages required for running the web application Setting up Dockerfile and building Docker image To set up a Docker container for the “LOAN APPROVAL PREDICTION APP” files provided, you can follow these steps. Firstly, create a Dockerfile. Create a Dockerfile in the same directory where our “LOAN APPROVAL PREDICTION APP” files are located. Here’s an example of a Dockerfile that you can use as a starting point: ``` # Use a base image FROM python:3.9 # Set the working directory WORKDIR /app # Copy the app files into the container COPY . /app # Install dependencies RUN pip install --no-cache-dir -r /app/requierements.txt # Expose the necessary port EXPOSE 5000 # Start the application CMD ['python', 'app.py'] ``` *This Dockerfile uses the official Python 3.9 image as the base image, sets the working directory to `/app`, copies the files from the current directory into the container, installs the dependencies from `requirements.txt` file, exposes port 5000 (assuming your app runs on this port), and starts the `app.py` script as the main command to run the application* ``` docker build -t loan-prediction-app . ``` Next step is building the Docker Image. Navigate to the directory where your Dockerfile is located in the terminal, and run the following command to build the Docker image: *This command builds the Docker image using the Dockerfile in the current directory and tags it with the name 'loan-prediction-app'* Running the Docker container Once the Docker image is built, you can run a Docker container from it using the following command: ``` docker run -p 5000:5000 loan-prediction-app ``` *This command runs a Docker container from the “loan-prediction-app” image and maps port 5000 of the host to port 5000 of the container, assuming your app runs on port 5000* After the container is running, you should be able to access the Loan Prediction App by navigating to `http://localhost:5000` in your web browser. ### Pushing Docker image to Docker Hub To push a Docker container named *“loan-prediction-app”* to Docker Hub, follow these steps: - Login to Docker Hub. Open terminal and login to Docker Hub using the `docker login` command. You'll need to provide your Docker Hub username and password when prompted. For example: ``` docker login ``` - Tag your Docker container. Before pushing your Docker container to Docker Hub, you need to tag it with the appropriate Docker Hub repository name. The basic syntax for tagging a Docker container is as follows: ``` docker tag loan-prediction-app [docker_hub_username]/loan-prediction-app:[tag] ``` Replace `[docker_hub_username]` with your Docker Hub username, and `[tag]` with an optional tag name for the image (e.g., `latest`, `v1.0`, etc.). For example: ``` docker tag loan-prediction-app my_username/loan-prediction-app:latest ``` Push the Docker container. After tagging your Docker container, you can push it to Docker Hub using the docker push command. The basic syntax for pushing a Docker container is as follows: ``` docker push [docker_hub_username]/loan-prediction-app:[tag] ``` Replace `[docker_hub_username]` with your Docker Hub username, and `[tag]` with the tag name that you used during the tagging step. For example: ``` docker push my_username/loan-prediction-app:latest ``` Wait for the push to complete. Docker will upload the Docker container to Docker Hub, which may take some time depending on the size of the container and your internet connection speed. Once the push is complete, you’ll see a success message indicating that the container has been pushed to Docker Hub. After following these steps, your ***“loan-prediction-app”*** Docker container should be pushed to Docker Hub and available for others to download and use. ![](https://hackmd.io/_uploads/Sywf0d253.png) ![](https://hackmd.io/_uploads/SyjM0_n9h.png) ![](https://hackmd.io/_uploads/BkhmCd2c3.png) ![](https://hackmd.io/_uploads/r1eNCOhch.png) ![](https://hackmd.io/_uploads/Hk8NRd353.png) ![](https://hackmd.io/_uploads/SyZH0d392.png) ![](https://hackmd.io/_uploads/r1uv0O252.png) ![](https://hackmd.io/_uploads/H1AvCOhcn.png) ![](https://hackmd.io/_uploads/Sk-o0O3qn.png) ![](https://hackmd.io/_uploads/HJ1hCO35h.png) ![](https://hackmd.io/_uploads/SyNh0O3qh.png) ![](https://hackmd.io/_uploads/Bki3CO39n.png) ![](https://hackmd.io/_uploads/rkvpRd292.png) ![](https://hackmd.io/_uploads/rJVJ1tnq3.png) ![](https://hackmd.io/_uploads/SJAkyK2c3.png) ![](https://hackmd.io/_uploads/SJQxyth92.png) ![](https://hackmd.io/_uploads/r1L-kFhc2.png) ![](https://hackmd.io/_uploads/rkWmJFh9n.png) ![](https://hackmd.io/_uploads/SkI71tn53.png) ![](https://hackmd.io/_uploads/B1-VkF25n.png)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.