Try   HackMD

/dev/log for cloud-1 (1st attempt)

Week 1

Sandbox provisioning

One of the tools that was suggested is Ansible, which is an automation tool (something like a remote command executor). Like any other new tech that invovles infra, I made a virtual machine as a sandbox to experiment with it.

I also found this book which I will be using as my guide.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

I also installed ansible according to the book, which this command handled everything for us

sudo pip install ansible

The only prerequisite was python and pip.

Vagrant (server) setup

To have something to act as my "server" to deploy applications on, the guide suggested that we use vagrant as a starter, and so I did. Too bad the book was abiut outdated on the versioning and I had to improvise some bits myself.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

We did have the vm running and we are able to ssh it. It is important to know the SSH credentials because it will be used by ansible to access the machines. Here are the SSH credentials that I have obtained

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Before we are able to test the connectivity between ansible and our VM or any server, we need to first let ansible know about our server. We can do that by creating an inventory file

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The first file is the hosts file, which defines the hostnames as well as the SSH credentials. The second file is the config file, which tells ansible how to behave.

Once everything seems good, we can use this to test the connectivity.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Writing a playbook

So now that my ansible instance can connect to my vm, its time to use that connection to do some actual commands. With that said, we define the imperative instructions using a yaml file, which follows a certian format that constitites it as a playbook.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Here, the ansible will SSH into every machine in group webservers and then install and configure nginx using builtin ansible modules. To run this playbook use the command ansible-playbook playbook.yml.

Once the command completes, you should be able to access the webpage on localhost

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

TLS configuration

To support http in production, we would need to buy a certificate from a certificate authority. For dev, we can generate our own using lets encrypt. I ran the commands

openssl req -x509 -nodes -days 3650 -newkey rsa:2048 \ -subj /CN=localhost \ -keyout files/nginx.key -out files/nginx.crt

To generate a new SSL cert, and changed the playbook to

---
- name: Configure webserver with nginx and tls
  hosts: webservers
  become: True
  vars:
    key_file: /etc/nginx/ssl/nginx.key
    cert_file: /etc/nginx/ssl/nginx.crt
    conf_file: /etc/nginx/sites-available/default
    server_name: localhost
  tasks:
    - name: Install nginx
      ansible.builtin.apt:
        name: nginx
        update_cache: yes
        cache_valid_time: 3600

    - name: Create directories for ssl certificates
      ansible.builtin.file:
        path: /etc/nginx/ssl
        state: directory

    - name: Copy TLS key
      ansible.builtin.copy:
        src: files/nginx.key
        dest: "{{ key_file }}"
        owner: root
        mode: '0600'
      notify: restart nginx

    - name: Copy TLS certificate
      ansible.builtin.copy:
        src: files/nginx.crt
        dest: "{{ cert_file }}"
      notify: restart nginx

    - name: Copy nginx config file
      ansible.builtin.template:
        src: templates/nginx.conf.j2
        dest: "{{ conf_file }}"
      notify: restart nginx

    - name: Enable configuration
      ansible.builtin.file:
        dest: /etc/nginx/sites-enabled/default
        src: "{{ conf_file }}"
        state: link
      notify: restart nginx

    - name: Copy index.html
      ansible.builtin.template:
        src: templates/index.html.j2
        dest: /usr/share/nginx/html/index.html
        mode: '0644'

  handlers:
    - name: restart nginx
      ansible.builtin.service:
        name: nginx
        state: restarted

which unlocks a lot of new features such as handlers, templating, facts variables and so on At this point, we should be able to connect to our website after this in HTTPS.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

And there we go

Week 2

Requirement gathering and Planning of infrastructure

According to the subject, we have gathered the requirements such that we need to have

  • High frontend server avaiability
    • Fault tolerance
    • Load balance and autoscaled
    • CDN
  • Database and data persistence
    • Data survives container restarts
    • Data survives machine destruction
    • Automated backup / snapshots ?
  • Firewall to allow only my IP to SSH
  • Phpmyadmin container needs to be installed on frontend server or standalone server
  • docker-compose to contain all containers build instrctions
  • TLS support (LetsEncrypt in wordpress or Nginx wrapper ?)
  • Ansible playbook to redeploy containers (I dont think this is needed)

I dont know if thats correct, but theres only 1 way to find out I guess

Solution discovery

There are many cloud providers out there which can do this for free, with the most notorious ones begin AWS and GCP. No azure because its windows based and no scaleway because im not french.

After comparing the free tier plans, i noticed that GCP has more flexibilty but less features than AWS. Since this is a short term project, I gave AWS the edge since flexibilty isnt going to give me an advantage if I dont plan to maintain my project or scale it further.

We need something to host our containers, so I plan to use the EC2 instances for general purpose computing to host the containers.

For load balancing with auto-scaling, we can use ELB to provide the routing logic to a EC2 auto scaling resource.

For CDN, AWS has CloudFront which is needed to improve load times of static files and have security features suck as AWS shield standard and WAF.

To make sure I dont have any data loss, I will store my data for MySQL in an EBS Volume which is isolated from the EC2 instances and it is recoverable after the EC2 instance is destroyed.

My initial plan is like so

DB instance creation

While creating your first EC2 instance, you need to create a new key pair to be able to SSH into it. Here it is on the EC2 instance creation screen

For now, Im going to allow internet access to the SSH port for debugging. In production, this will live inside a VPC and will only be exposed to the frontend servers.

I am able to SSH into the DB instance using the command
ssh -i ssh/jng.pem ec2-user@ec2-52-64-169-82.ap-southeast-2.compute.amazonaws.com where ssh/jng.pem is the private key i get from the AWS console when I create a new key pair.

However, the VM is barren currently, No git or docker, only yum and python3. This might be a good time to write an ansible playbook to bootstrap the EC2 instance.

Playbook to bootstrap the database EC2 instance

I need to create a host file to tell ansible which server to connect to, the contents are below

[database]
dbserver ansible_host=<REDACTED> ansible_user=ec2-user ansible_private_key_file=../ssh/jng.pem

I also made an ansible.cfg to provide context to the ansible application

[defaults]
inventory = ./dbhosts
host_key_checking = False

Sanity check to test connectivity ansible database -m ping

Create a playbook with the contents

# playbooks/bootstrap.yaml
---
- name: Bootstrap a database EC2 instance
  hosts: database
  become: True
  
  tasks:
    - name: Install Git package
      yum:
        name: git
        state: present

To run the playbook, run ansible-playbook playbooks/bootstrap.yaml

After running, you can SSH into the machine manually to verify that git is indeed installed

I proceeded to write a docker-compose.yml with the instructions to start a Mysql container

version: "3.9"
services:
  mysql:
    image: mysql:latest
    volumes:
      - ./mysql_data:/var/lib/mysql # change this to EBS soon
    environment:
      - MYSQL_ROOT_PASSWORD=password

and I modified the playbook to install docker, docker compose and to run the container

# playbooks/bootstrap-db.yaml
---
- name: Bootstrap a database EC2 instance
  hosts: database
  become: True
  
  tasks:
    - name: Install Git package
      yum:
        name: git
        state: present

    - name: Install Docker package
      yum:
        name: docker
        state: present

    - name: Run and enable docker
      service:
        name: docker
        state: started
        enabled: true

    - name: Install or upgrade docker-compose
      get_url:
        url: "https://github.com/docker/compose/releases/download/v2.21.0/docker-compose-linux-x86_64"
        dest: /usr/local/bin/docker-compose
        mode: 'a+x'
        force: yes
      when: docker_compose_current_version is not defined or docker_compose_current_version is version(docker_compose_version, '<')

    - name: Clone repo with build instructions
      git:
        repo: https://github.com/neosizzle/cloud-1.git
        dest: /home/ec2-user/cloud-1
        clone: yes
        update: yes

    - name: Creates mysql data directory
      file:
        path: /home/ec2-user/cloud-1/docker/mysql_data
        state: directory

    - name: Run docker-compose up on docker/docker-compose.yml
      command: docker-compose up -d mysql
      args:
        chdir: /home/ec2-user/cloud-1/docker

And now we should be able to see the container running in the EC2 instance

Auto backup

To simulate a hardware failure, lets make the instance inaccesible by disabling its network interface sudo ifconfig enX0 down

Now to make sure this incident gets detected, we can use CloudWatch to monitor the EC2 instance status and generate alerts if something is not right (in this case our instance is unreachable)

With this in place, we should now create a task to run once an alert appears. To take care of the first part create a task to run, we are able to use Lambda functions, which are code that runs on the cloud which are triggered by events happening in your infra, which is the second part (once an alert appears).

Here is an example of a bootstrap lambda function, which just prints hello world for the time being. I want to configure this to run once an alert is generated by my CloudWatch.

To make my cloudwatch events detected by the lambda function, I need to make an eventbridge where the source is cloudwatch and the target is the lambda function I made.

And once the two is ready, I added that eventbridge as a trigger to my lambda function

Lets goo, looks like I got a hit

Fuckup 1

You cant have an alarm for multiple instances together, which means everytime a new instance is created a new alarm must be created. This is tedious. Im just going to reboot the instance on fail instead.

And as we can see, the error state resolved itself

And the database container started up.

phpmyadmin and Security groups

So now we will create an EC2 instance to run phpmyadmin, with the compose file below

# ... phpmyadmin: image: phpmyadmin/phpmyadmin environment: PMA_HOST: 172.31.0.82 # Put the private ip address for the db instance here PMA_PORT: 3306 PMA_USER: root PMA_PASSWORD: password ports: - 80:80 volumes: - /sessions restart: always

We will also create an ansible playbook to automate the provisioning of the instance, with the contents below

# playbooks/bootstrap-phpmyadmin.yaml --- - name: Bootstrap a phpmyadmin EC2 instance hosts: phpmyadmin become: True tasks: - name: Install Git package yum: name: git state: present - name: Install Docker package yum: name: docker state: present - name: Run and enable docker service: name: docker state: started enabled: true - name: Install or upgrade docker-compose get_url: url: "https://github.com/docker/compose/releases/download/v2.21.0/docker-compose-linux-x86_64" dest: /usr/local/bin/docker-compose mode: 'a+x' force: yes when: docker_compose_current_version is not defined or docker_compose_current_version is version(docker_compose_version, '<') - name: Clone repo with build instructions git: repo: https://github.com/neosizzle/cloud-1.git dest: /home/ec2-user/cloud-1 clone: yes update: yes - name: Run docker-compose up on docker/docker-compose.yml command: docker-compose up -d phpmyadmin args: chdir: /home/ec2-user/cloud-1/docker

When all is said and done, we should be able to connect to the public IP of the phpmyadmin instance

And as we might have noticed, there seems to be an error with the configuration for the mysql server connectivity. This is becase we are trying to connect to our database instance at 172.31.0.82 which is not exposed to other instances.

After I added the rule to allow the phpmyadmin instance, I should be able to view the data now.

Wordpress

This is getting abit repetitive, we will create a docker-compose section and an ansible playbook to provision a wordpress container.

There will be new directories created :

  • ./ngninx.conf will contain the files needed for our nginx webserver configuration
  • ./ssl will contain our self-signed ssl credentials
  • ./wordpress_data will contain wordpress installation files
# docker-compose.yml # --- wordpress: image: wordpress:latest volumes: - ./wordpress_data:/var/www/html # change this to EBS soon environment: WORDPRESS_DB_HOST: 172.31.0.82:3306 # Put the private ip address for the db instance here WORDPRESS_DB_USER: wordpress WORDPRESS_DB_PASSWORD: wordpresspassword WORDPRESS_DB_NAME: wordpress webserver: depends_on: - wordpress image: nginx:latest volumes: - ./wordpress_data:/var/www/html - ./nginx-conf:/etc/nginx/conf.d - ./ssl:/etc/nginx/ssl ports: - "80:80" - "443:443"
# playbooks/bootstrap-wp.yaml --- - name: Bootstrap a wordpress EC2 instance hosts: wp become: True tasks: - name: Install Git package yum: name: git state: present - name: Install Docker package yum: name: docker state: present - name: Run and enable docker service: name: docker state: started enabled: true - name: Install or upgrade docker-compose get_url: url: "https://github.com/docker/compose/releases/download/v2.21.0/docker-compose-linux-x86_64" dest: /usr/local/bin/docker-compose mode: 'a+x' force: yes when: docker_compose_current_version is not defined or docker_compose_current_version is version(docker_compose_version, '<') - name: Clone repo with build instructions git: repo: https://github.com/neosizzle/cloud-1.git dest: /home/ec2-user/cloud-1 clone: yes update: yes - name: Creates wordpress_data directory file: path: /home/ec2-user/cloud-1/docker/wordpress_data state: directory - name: Creates ssl directory file: path: /home/ec2-user/cloud-1/docker/ssl state: directory - name: Run command to generate self signed cert command: openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 365 args: chdir: /home/ec2-user/cloud-1/docker/ssl - name: Run docker-compose up on docker/docker-compose.yml command: docker-compose up -d wordpress webserver args: chdir: /home/ec2-user/cloud-1/docker
# docker/nginx-conf/default.conf server { listen 80; server_name _; # public ip of wordpress server location / { return 301 https://$host$request_uri; } } server { listen 443 ssl; server_name _; # public ip of wordpress server ssl_certificate /etc/nginx/ssl/cert.pem; ssl_certificate_key /etc/nginx/ssl/key.pem; location / { proxy_pass http://wordpress; } }

Of course, dont forget to change the security group for the database instance to allow your wordpress to connect

And with the configuration above, we get these

This isnt exactly correct, since the CSS should be loaded as well. Upon inspecting the network tab, turns out there are some hostname errors

Our nginx server just reverse-proxies all request to the wordpress container. While the nginx container itself knows where to route to, the client does not. Hence we are unable to get things like fonts and CSS.

Then I remembered for our inception project, we copied the pre-installed files into nginxs static folder, and we would just use that as the root instead.

For the php files, we will route it to wordpresses fpm as a CGI controller.

Hence, we will change our docker-compose like so:

# ... wordpress: image: wordpress:php8.2-fpm volumes: - ./wordpress_data:/var/www/html # change this to EBS soon environment: WORDPRESS_DB_HOST: 172.31.0.82:3306 # Put the private ip address for the db instance here WORDPRESS_DB_USER: wordpress WORDPRESS_DB_PASSWORD: wordpresspassword WORDPRESS_DB_NAME: wordpress restart: always

which only exposes the CGI application at port 9000. We will handle the static files on nginx side. We change the nginx config to :

# docker/nginx-conf/default.conf server { listen 80; server_name _; # public ip of wordpress server location / { return 301 https://$host$request_uri; } } server { listen 443 ssl; server_name _; # public ip of wordpress server ssl_certificate /etc/nginx/ssl/cert.pem; ssl_certificate_key /etc/nginx/ssl/key.pem; root /var/www/html/; # declare index files to serve index index.html index.htm index.nginx-debian.html index.php; location / { # First attempt to serve request as file, then # as directory, then fall back to displaying a 404. try_files $uri $uri/ =404; } location ~ \.php$ { fastcgi_split_path_info ^(.+\.php)(/.+)$; #regex to split path route fastcgi_pass wordpress:9000; #proxy to fastcgi server running on another container fastcgi_index index.php; #index file to server include fastcgi_params; #include other vital default fastcgi parameters fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; #path on container to obtain all php scripts fastcgi_param PATH_INFO $fastcgi_path_info; #another vital fastcgi param which i have no idea what it is yet } }

And viola, looks better now


Autoscaling and load balancing

Before we create an autoscaling groupm a Launch template needs to be made to tell new ec2 instances how to provision themselves.

They consist of just the initial configuration of the ec2 instance (OS to use, security rules) and a userdata script. The userdata script is a shell script that in run as root to install and run stuff.

The user data script that I used for my launch template is :

#!/bin/bash # Update the instance and install Docker yum update -y yum install -y git docker # Start and enable Docker service service docker start chkconfig docker on # Install Docker Compose dependencies yum install -y gcc libffi-devel python3-devel openssl-devel libxcrypt-compat # Install libcrypt package yum install -y libcrypt # Install Docker Compose curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose chmod +x /usr/local/bin/docker-compose # Clone the repository git clone https://github.com/neosizzle/cloud-1.git /home/ec2-user/cloud-1 # Create necessary directories mkdir /home/ec2-user/cloud-1/docker/wordpress_data mkdir /home/ec2-user/cloud-1/docker/ssl # Generate SSL certificate cd /home/ec2-user/cloud-1/docker/ssl openssl req -x509 -nodes -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -subj "/C=MY/ST=Selangor/L=Kuala Lumpur" # Start Docker Compose cd /home/ec2-user/cloud-1/docker docker-compose down docker-compose up -d wordpress webserver

I launch a test ec2 instance and got it to work, but its not detecting any sessions because of the abscence of load balancing and domain name

To get it to work, I have to create an auto scaling group that specifies the minimum instances I want. It also creates a Target Group to group all our aws instances to specify the routed destinations (port 443 of those instances)

Then, I will need to make 2 Load balancers for HTTP and HTTPS traffic. The HTTP load balancer will redirect to the HTTPS load balancer and the HTTPS load balancer shall distribute traffic to the target group.

Then, I would need to create a default SSL certificate key pair for the loadbalancers that are listening on https using openssl req -x509 -nodes -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 and then register them to ACM

Everything looks OK, just that for some reason my wordpress cant access wp-admin, and just redirects me to the login page even after successful login. I suspect its something to do with my nginx

I applied the fix like so, its also a good time to test the auto scaling feature now.

# docker/nginx-conf/default.conf server { listen 80; server_name _; # public ip of wordpress server location / { return 301 https://$host$request_uri; } } server { listen 443 ssl; server_name _; # public ip of wordpress server ssl_certificate /etc/nginx/ssl/cert.pem; ssl_certificate_key /etc/nginx/ssl/key.pem; root /var/www/html/; # declare index files to serve index index.html index.htm index.nginx-debian.html index.php; location / { # First attempt to serve request as file, then # as directory, then fall back to displaying a 404. try_files $uri $uri/ =404; } location ~ \.php$ { fastcgi_split_path_info ^(.+\.php)(/.+)$; #regex to split path route fastcgi_pass wordpress:9000; #proxy to fastcgi server running on another container fastcgi_index index.php; #index file to server include fastcgi_params; #include other vital default fastcgi parameters fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; #path on container to obtain all php scripts fastcgi_param PATH_INFO $fastcgi_path_info; #another vital fastcgi param which i have no idea what it is yet fastcgi_param PATH_TRANSLATED $document_root$fastcgi_path_info; # I was missing this param } }

Actually it was not fixed until I enabled this option in my target group settings

And turns out my instances gets created if i stop them and the target group auto updates, which is nice.




Auto scaling and Load balancing verification

To test the load balancing feature, I will display the docker logs for nginx containers side by side and make sure both of them are getting hits when I made a request to the load balancers DNS.

As you can see, both the servers get hits when i make a request, hence validating the load balancer.

demo here

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

I also made an alarm to alert the system when any of the EC2 instances in the auto scale is down and added a dynamic scaling policy which is triggered by the alart. Here are the activity logs to show that its working

Notifications

I created an SNS topic to allow my infra to send out emails, and I attatched it to the CloudWatch Alarm like so

And once an instance fails, here is the email

Week 3

CDN

I set up AWS Cloudfront to act as a CDN to my system.

I also installed WAF on cloudfront with default settings which by alone does not do anything, however if properly configured with other services like AWS shield, it can protect against common attacks like DDOS, SQL Injection, XSS and much more.


This is unfortunate, I guess ill need to reach out to support for this matter. Guess Ill have to wait for now.

Solved within 2 hours, but I reread the pdf and found out i might not need it lol

Security

I have changed all the SSH rules to only allow my own IP, and that should be it.

Cost containment