How to Over-engineer Your Application Infrastructure and "Lifecycle"

# How to Over-engineer Your Application Infrastructure and "Lifecycle" :::info :warning: This article only represent me, an indie software engineer who likes to explore new things. Read it with a grain of salt! ::: Do you ever found yourself building applications with different dependencies and environments that you find yourself -- subsconciously -- creating a microservices-alike infrastructure? At the beginning, you might say *"ah yes, adding 1 or 2 more services to do task x and y is ok"* and suddenly you are in the middle of bunches of containers which you can barely manage, let alone CI/CD, releases and all tedious stuffs that follows. Relax, I've experienced that. In fact, that's the sole motive which drives me to write this article. This article will guide you on how to *orchestrate* your application throughout its lifecycle by leveraging automations. >*note: This is meant for small to small-medium size apps. beyond should consider orchestration platform, such as docker swarm or k8s* ## :computer: System Infrastructure Before we begin on the guide itself, I will explain briefly the infrastructure which this article is based on. My project basically runs on nodeJS, it has a frontend service, a backend service, a database and a search engine service. The rough design looks like below ![f-Container Diagram.drawio](https://hackmd.io/_uploads/SyTUs7o3T.png) System Infrastructure **For simplicity**, let's discard the service container ouside the dash-bordered area above and focus on the inside area. The Application lifecycle is pretty usual which is: development and testing -> staging -> production (release) Though it seems simple, each stage requires pretty much complex task where you have to follow software engineering best practice and principle. Now that you've get a hint of what the system infrastructure looks like, let's move on onto the next step, **containerization**. ## :whale: Containerization When you are creating apps that is meant to be used publicly or remotely, usually the idea is to host it somewhere in the "cloud" (I know there are serverless architecture, but we won't talk about it) and of course the problem is how do we ensure the apps can run on the cloud -- or rather i'd say *Virtual Private Server* (VPS) -- regardless of the OS, language-specific runtime and others dependencies on that VPS. This is the ultimate idea of containerization where you can wrap your app inside an isolated environment that runs operating system based on your choice and or your app compatibility. You can think of it like running Virtual Machine on your computer, but with less resource consumption and capabilities. As you may have known, Arguably the most popular containerization go-to service is **docker**. Please [install](https://docs.docker.com/engine/install/) docker on your computer and read a bit about [docker](https://docs.docker.com/get-started/overview/) before going further. ### Fundamental Docker ![F1aC3GaaQAAWazN](https://hackmd.io/_uploads/SJ0G7Ej26.jpg) docker diagram taken from bytebytego There are some terminologies we have to understand first, 1. **image** : data containing OS and the application to be run 2. **container** : an isolated environment where we run our images on 3. **host** : in this context, refer to your computer that runs the docker service 4. **docker registry** : the location of respective built images 5. **Dockerfile**: file which contains of configuration and command to build the image To have your app containerized, first of all, you need to actually build the image for it. to build the image itself, you need **Dockerfile** which tells docker service how to build the image. below is the example. ```dockerfile= FROM node:18-alpine WORKDIR /app COPY . . RUN yarn install --production CMD ["node", "src/index.js"] EXPOSE 3000 ``` The topmost statement `FROM` tells docker service from where to pull what (image). in this case, docker wil pull from node (registry) which is publicly available, and it will pull the image with tag `18-alpine`. on the next line , we are telling docker to set `/app` as our current working directory. the `COPY` is self-explaining, it takes 2 arg `COPY [SOURCE_PATH] [DESTINATION_PATH]`. `SOURCE_PATH` context is your computer (docker host) and `DESTINATION_PATH` context is path inside your image. On the next line there's `RUN` command which runs the given command in the shell. `CMD` is similar to `RUN` command, the different is that `RUN` is ran in the image building process or context, whereas `CMD` is ran when your contianer starts. lastly there's `EXPOSE` command which tells your container to open the specified container port to host. to build image using this Dockerfile we can then run, ```shell docker build -t [TAG_NAME] [Dockerfile_PATH] ``` When the image is build, you can create and run container for the corresponding image using ```shell docker run --detached --name [CONTAINER_NAME] --port [HOST_PORT]:[CONTAINER_PORT] [IMAGE_TAG_NAME] ``` the `--detached` option tells docker to run this container in the backround so that we can continue to use current shell for other activities. the `--port` command serves like port forwarding. it forward connection from `HOST_PORT` to `CONTAINER_PORT` and vice versa so that we can access our app from our computer (host). Yay! you now can containerize and run your own app. if you fail to do any of those, then you have to debug where you might have been wrong. It usually related to bad configuration in Dockerfile and mismatch port mapping :crossed_fingers: There is actually another fundamental docker command `docker push`. this command is used when you want to publish your built images to registry in docker hub. I won't discuss it as it will make the article longer. ### Advanced Docker Now that you have understand basic usage of docker, it's pretty much simple onward. It even become easier when you use `docker-compose`. Let's recall the steps 1. Build the image based on our configuration 2. create the container for our image and run it Doing all of these are now trivial, but creating the **configuration** that is minimal and works is a different story. When we have many containers to run our storage and resource will lessen. of course we then want or need to create the container as minimal as possible with relative same performance. That's only one problem, when you delve deeper into docker you will be faced into more problems such as docker volumes, network and security. Below we will discuss these stuff concisely. #### Creating minimal image There is 2 approach for this, 1. Use minimal OS image 2. Create your own configuration The first step is suitable when you don't want to overcomplex-ify your build process. You can just pull image that's using small "OS" that still have decent enough development support for app. such as `alpine` and other debian-based image. the drawback when you use the (most) minimal existing image is sometimes your image will be missing some dependencies or libraries needed by your application, so you still have to do some debuging in the dockerfile configuration. > The second step is rabbit hole. I suggest you from the beginning don't meddle around too much or deep with these option. there's threshold where your effort doesn't worth the reduction build size, touch some grass OK? The second step, will require you to understand the fundamental concept of your image and your app. Here, we still try to use the (most) minimal existing image, such as `alpine`, but then we will choose what to include and what to omit from our image. furthermore, to achieve this from Dockerfile, we can use the fancy term "**multi-stage build**" where basically we chain the image creation and use something from previously build image artifact. Let's use example ```dockerfile= FROM node:20-alpine as builder ENV NODE_ENV build USER node WORKDIR /home/node COPY package.json . COPY yarn.lock . RUN yarn install --frozen-lockfile COPY --chown=node:node . . RUN yarn prisma generate \ && yarn build \ && yarn --production # --- FROM node:20-alpine ENV NODE_ENV production RUN apk add curl USER node WORKDIR /home/node COPY --from=builder --chown=node:node /home/node/package.json ./ COPY --from=builder --chown=node:node /home/node/yarn.lock ./ COPY --from=builder --chown=node:node /home/node/node_modules/ ./node_modules/ COPY --from=builder --chown=node:node /home/node/dist/ ./dist/ HEALTHCHECK --interval=45s --timeout=10s \ CMD curl -f http://localhost:3000 || exit 1 EXPOSE 30000 CMD ["node", "dist/main.js"] ``` So in the dockerfile above there are 2 stages, build and production. the first `FORM` statement marks the beginning of build stage. it process to build the app like usual. Now, the next stage (2nd statement `FROM`) is pretty much the same for the OS image, but it is actually taking only the **important** built scripts and binary from `builder` artifact, making it more space efficient. The actual size different really depends on your app and OS image. Just remember, know when to stop "optmizing" and ship your app faster! #### Using Volumes Your host computer has its own filesystem inside with common folder, so do the container. in docker you can set your host designated or certain folder to be referenced and used by (from inside) the container. This is called `mounting` the volume. All changes happened inside the container will also applied to the actual folder (volume) on your host. To mount volume, you can use this options on your `docker run` command ```shell -v [PATH_TO_HOST_VOLUME]:[PATH_TO_CONTAINER_VOLUME] ``` If you need to make sure that your host volume to be not changed, you can tell docker to mark it using `readonly` marker as shown below ```shell -v [PATH_TO_HOST_VOLUME]:[PATH_TO_CONTAINER_VOLUME]:ro ``` #### Docker Network I won't discuss about the docker daemon internal network (the docker sock), but rather the inter-container network. As you have known, container is isolated from the outside (environment). So, by default there's no way container can communicate with other container. fortunately, docker comes with the solution that is `docker netwok`. Simply put, you can make a docker network and use it as bridge to provide communication for containers. You can achieve this by running this following command ```shell docker network create --attachable --driver bridge [NETWORK_NAME] ``` `--attachable` means that your network is attachable individually by any container that "wants" to attach. the `--driver bridge` option tells docker that this network is used to bridge (manage) communication between containers. To attach container to specific network you can use the following option when `docker run` ```Shell --net [NETWORK_NAME] ``` Now, your containers can communicate to each other as long as they are attached to the same network. for example, container A and B are attached to network named `AB_CHANNEL` thus each can communicate using `AB_CHANNEL:DESTINATION_CONTAINER_PORT` for the connection string. As for protocol and URI it depends on the application. #### Docker Healthcheck When you containerized your app and run it in the background, it then becomes harder and tedious how to know or check that your app inside the container is (still) running perfectly fine. For this we can leverage docker `HEALTHCHECK` command. ```dockerfile= HEALTHCHECK --interval=45s --timeout=10s \ CMD curl -f http://localhost:3000 || exit 1 EXPOSE 3000 CMD ["node", "dist/main.js"] ``` Using the same dockerfile above as the example, here we can see the `HEALTHCHECK` command used. the configuration above is telling to do command passed to `CMD` that is `curl -f http://localhost:3000`. if this return response other than success bash will than execute the latter command that is exit 1 (stopping) the process. This is done periodically, we can set the interval and timeout custom-ly as you can see above. The output would looks like this (notice the health status in STATUS column) ```shell CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d117a6f3ee9b REDACTED/pegon-be:latest "docker-entrypoint.s…" 10 hours ago Up 10 hours (healthy) 0.0.0.0.3000:->3000/tcp, :::3000->3000/tcp pegon-be-prod ab13eaad07a7 containrrr/watchtower "/watchtower pegon-b…" 12 hours ago Up 12 hours (healthy) 8080/tcp watchtower ``` If you have reached here, then you basically have learn the basic-medium concept of docker. it's recommended that you actually try to dockerize and containerize your own app so you ***actually*** get all of these fancy things. ## :gear: Automating Your App Lifecycle After we learn about containerization, We then will learn how to handle all of these containers throughout the lifecycle i mentioned in the beginning. ### Continuous Integration / Continuous Delivery ![CI/CD Pipeline](https://cms-cdn.katalon.com/large_banner_6_3cc61378b6.png) CI/CD Pipeline taken from <a href="https://katalon.com/resources-center/blog/ci-cd-introduction" target="_blank">Katalon</a> Continuous Integration / Continuous Delivery or CI/CD is just a fancy term to states *"Automatically apply changes to your app running in production with additional steps"*. CI/CD is critically important especially when you are doing agile development, because you have to focus developing the app in fast-paced environment. CI/CD Helps you to, let say, build and containerize your app and deploy it automatically, without you yourself actually doing all of the tedious docker command. Ha! pretty neat right! There are many CI/CD Framework and service, For example Jenkins, but that's too complex. For starters, Let's use the easiest and popular ones, such as Github Actions or Gitlab CI/CD Pipeline. In this example We'll be using Gitlab CI/CD. ### Gitlab CI Gitlab comes with CI/CD Pipeline to a certain "degree" for free, you have to upgrade your account by subscribing to some of their plan for beyond capabilities. For a project, you can access their CI/CD Panel from settings like this (see highlighted) ![image](https://hackmd.io/_uploads/S168pLonp.png) There you can configure all related settings to CI/CD. In this article I won't be talking much about Gitlab CI, because it's going to be out of context, rather we will discuss how to leverage this Free CI/CD Pipeline. To be able to use Gitlab Pipeline, we need 2 things * The service that run our job for each specified stages (Runner) * The script that configure the stage and jobs The service mentioned on the 1st point is called runner and the script mention in the 2nd point is gitlab-ci.yml. To install and configure runner for certain project, you can refer directly to the official gitlab guidance [here](https://docs.gitlab.com/runner/install/). The gitlab-ci.yml is just a basic yaml file similar to `docker-compose.yml`, it basically looks like this ```yaml= build-job: stage: build script: - echo "Hello, $GITLAB_USER_LOGIN!" test-job1: stage: test script: - echo "This job tests something" test-job2: stage: test script: - echo "This job tests something, but takes more time than test-job1." - echo "After the echo commands complete, it runs the sleep command for 20 seconds" - echo "which simulates a test that runs 20 seconds longer than test-job1" - sleep 20 deploy-prod: stage: deploy script: - echo "This job deploys something from the $CI_COMMIT_BRANCH branch." environment: production ``` *note: I won't discuss too much about optimization, advanced gitlab-ci configuration, etc, because this not my expertise, nor is it my usual CI/CD Pipeline* ### Application Lifecycle There are many convention, paradigm or principle regarding this matter. I personally, doesn't care that much ***as long as*** we are always consistent with the chosen convention and able to develop solution to deal with the drawbacks it has (I know this is kind of abstract). For this article, we are going to use the (I called it like this) 3-branch driven CI/CD, that is `Development`, `Staging`, and `Production`. 1. **Development** This branch will be the environment for any changes related to development. Test will also be done on this stage. 2. **Staging** This branch serves as kind of proxy between the **development** and **production**. Test and Static Code Analysis may be done on this stage. If succeeded, App will be deployed on a temporary server. 3. **Production** This branch serves as main App that is ready for public usage. Changes to this branch will be only those that passes Staging. To help this convention, supplementary, we will be using stages as explained below #### 1. Testing Stage As the name state, this stage will run test, typically unit test and so on. If there's any failing test, stage should (and always should) fail and exit. thus making your CI fully exits. It's best to use testing framework that can generates result in xml or html format so that you can parse or do postprocessing for, perhaps, data visualization. Below is the example gitlab-ci.yml ```yaml= Test: stage: test image: node:16-alpine before_script: - npm install - echo DATABASE_URL=$DATABASE_URL >> .env - echo DATABASE_CLIENT=$DATABASE_CLIENT >> .env - echo APP_KEYS=$APP_KEYS >> .env - echo NODE_ENV=development >> .env script: - npm run test after_script: - rm -rf *.env artifacts: when: always reports: coverage: '/Total.*?([0-9]{1,3})%/' only: - dev - staging ``` For this stage the runner will be using `node:16-alpine`. To begin with, let's focus to just 3 things, the `before_script`, `script` and `after_script`. > Don't get fooled by this order naming (before and after). these are total order, means that command ran 'before_script' stops before 'script' ran and so on. If you want to setup interactive task or task with sessions, You have to do it altogether. (for example, inside script you want to do command 'A' in ssh session, You can't do ssh connect on 'before_script' it has to be done in 'script') The `before_script` is actually echoing key=value pair of secrets to .env file which will be used as envars, and then the `script` does test itself and when it finishes, the stage will delete the `.env` file. The `artifacts` properties is used when you want to stored something, e.g the file containing test result. you can use this artifact on the subsequent stage on the same pipeline for any purpose. lastly, there's `only` property which means "only do this stage if there's any changes to either dev or staging branch" #### 2. Static Code Analysis Stage This stage is **critically important** if you are building developing apps that runs **trade or transaction**, **contains sensitive datas** and has **high user retention**, becasue static code analysis using well-known tool can give you insight regarding bad-practice , including known security vulnerabilities for some framework, thus we can anticipate releasing "bad" product to the production. I myself as per i wrote this article, still haven't implement this. I did use sonarqube if you know, but I just don't like the experience of using it. currently I'm still looking for better ones. If you guys know something, please do leave a comment and tell me! #### 3. Staging Stage When you have some new sets of feature that has passes the stages before, you can use this stage to do integration to current version app and see how the App behaves with new changes. There are many ways to implement this. What I do is I simply make the runner connect using ssh to my temp server, and then just do `docker run` or `docker compose up -d` Below is the example ```yaml= Staging: stage: staging image: node:16-alpine before_script: - chmod 400 $SSH_KEY - apk update && apk add openssh-client - ssh -o StrictHostKeyChecking=no -i $SSH_KEY_STAGING $GCP_USERNAME@$GCP_STATIC_IP - | if [ ! -d "pegon-be" ]; then git clone -b staging $SSH_GIT_REPO fi - cd pegon-be - docker compose down - git pull origin staging - rm -f .env && rm -f meilisearch.env script: - echo DATABASE_URL=$DATABASE_URL >> .env - echo DATABASE_CLIENT=$DATABASE_CLIENT >> .env - echo APP_KEYS=$APP_KEYS >> .env - echo NODE_ENV=$NODE_ENV >> .env - echo ADMIN_JWT_SECRET=$ADMIN_JWT_SECRET >> .env - docker compose up -d --build after_script: - docker system prune -f only: - staging ``` #### 4. Building Stage This stage is pretty straighforward We only need to tell runner to build docker image and push it to our specified repository and registry. The script roughly would look like this ```yaml= Build_and_Publish: stage: build image: docker:latest services: - docker:dind before_script: - echo $DOCKER_PASSWORD| docker login -u $REGISTRY_USER --password-stdin docker.io script: - ls - docker build -t $REGISTRY_USER/$IMAGE_NAME:$IMAGE_TAG . - docker push $REGISTRY_USER/$IMAGE_NAME:$IMAGE_TAG tags: - dind only: - main ``` >DO NOT, I repeat, DO NOT inject your SECRETS directly into the image on building, this is a bad security practice, because then your secrets will be there inside the image on the registry that god-knows what will happen There's something different here. You can see now we state that there's service needed to run this stage job, that is `docker:dind` which stands for "docker in docker". I myself don't know why they have to handles running docker in docker differently, perhaps it has something to the virtualization or docker internal stuffs. For now, let's just take it for granted. #### 5. Production Deployment Usually, You will have to include this inside your pipeline jobs/stage, but right now, I am using a different method. I am using [Watchtower](https://containrrr.dev/watchtower/), a service that automatically pull and update your running container if there are changes detected on the repository. Under the hood it works like cron job that listen to container and image tag. To learn more about this, Please refer to the [docs](https://containrrr.dev/watchtower/usage-overview/). In my case, I pair Watchtower with Discord, and instant messaging social media platform. Watchtower can then send me notification through discord Webjhook if there's any new event, like this ![image](https://hackmd.io/_uploads/BkzsSYoha.png) Voila! We have basically finished our automation. You can now rest assured, and touch some grass :green_heart: ## Conclusion Containerization and Automating your App (Lifecycle) is always a great idea. It might seems complex and tedious in the beginning, but It will saves you a lot throughtout the Software Development Life Cycle. All of the tools and steps elaborated above are not absolute, nor is it the best-practice approved by people in the industry. Thererfore, We as software engineers have to follow the consensus from the project stakeholders.