Try   HackMD

Kubernetes Installation Report

Overview

written by:
modified by:

In order to get better understanding about how Kubernetes work, here are basic concept of terms that will be mentioned several times in this documentation.

  • Docker

    Docker is a open-source containerization platform that allows developers to package applications and their dependencies into lightweight, portable containers that can be easily deployed across different environments, such as development, testing, and production.
    Docker provides a simple and consistent way to build, distribute, and run applications, making it easier to manage and deploy applications across multiple environments. It also provides tools for managing images (the base building block of containers), networking, and storage.

  • Kubernetes

    Kubernetes is an open-source platform developed by Google and now maintained by CNCF. It automates deployment, scaling, and management of containerized applications across multiple hosts. It abstracts the underlying infrastructure and provides tools for developers to focus on writing code and operations teams to manage the infrastructure and applications in a consistent and reliable way.

    To learn more about Kubernetes, please refer to this documentation.

  • Containerd

    Containerd is an open-source container runtime designed to provide a low-level interface for managing containers on a host. It is lightweight and portable, making it useful for various deployment scenarios. Containerd offers a consistent API for managing container images, containers, and their associated resources. One of its key features is its support for the Open Container Initiative (OCI) runtime and image specification, which ensures standardized container image building, sharing, and running across different container runtimes and platforms.

  • CNI (Container Network Interface)

    CNI plugins are responsible for setting up the network namespace, configuring network interfaces, and routing traffic between containers and the external network. The CNI specification allows for multiple networking plugins to be used in a Kubernetes cluster, giving users the flexibility to choose the networking solution that best suits their needs.

    When a container is created in Kubernetes, the Kubernetes networking system invokes the CNI plugin to set up the networking for the container. The CNI plugin can be used to configure different types of network setups, such as overlay networks or network bridges, depending on the requirements of the containerized application.

    ***

    This documentation covers the Docker installation on Ubuntu 20.04. In addition, there are also some Kubernetes installation steps and the error I encounter during the process.

    Note: This documentation mostly refers to this link with a little bit of modification for errors handling

Part I: Time Configuration

Sometimes the server time sets on Ubuntu do not match our local time. It is important to synchronize it so that we can access secured websites we will use for installation.

  1. We start by run administration as root

    ​​​​sudo su
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  2. Update all packages to the latest version

    ​​​​sudo apt-get update && sudo apt-get -y install aptitude
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Optional: Although we run the administration as root, it is useful to still included sudo on most of the commands we run.

  3. Do a safe upgrade

    ​​​​sudo aptitude update && sudo aptitude -y safe-upgrade
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  4. Check if swap is enabled

    ​​​​sudo swapon --show
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    If there is no output shown, try to run this command:

    ​​​​free -h
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  5. Disable it with

    ​​​​sudo swapoff -a
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  6. Then remove the existing swapfile

    ​​​​sudo rm /swapfile
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Swap is a part of hard drive storage to temporarily store data that can no longer hold in RAM. This process will be frutiful for Cgroup driver configuration on later step.

  7. Remove the line below from /etc/fstab

    ​​​​sudo sed -i 's[/swap.img[#/swap.img[' /etc/fstab
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  8. Install the Ubuntu ntp package

    ​​​​sudo aptitude -y install ntp systemd-timesyncd
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Note: You can also do the installation one-by-one to ensure all the packages are established. NTP package will be used to adjust server's time with our local timezone.

  9. Restart the ntp daemon

    ​​​​sudo systemctl restart ntp
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  10. Check the status of the ntp daemon with

    ​​​​sudo systemctl status ntp
    ​​​​ntpq -p
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  11. Make sure time is matched with your local time

    ​​​​date
    

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    If time is not updated to the current time, DO NOT continue and update/change time manually!

Part II: Install Container Runtime

  1. If your server is behind proxy you can set it up first in /etc/environment (optional)
export HTTP_PROXY=http://proxy-ip:proxy-port
export HTTPS_PROXY=http://proxy-ip:proxy-port

If you are using Wireguard as your VPN, the proxy's IP and port are shown below

  1. Install dependencies
    ​​​​sudo aptitude -y install apt-transport-https ca-certificates curl software-properties-common socat jq httpie git sshpass bash-completion
    

Docker Runtime

  1. Add docker's official GPG key

    ​​​​curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    

  2. Add docker's repository

    ​​​​sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
    

  3. Update the package cache

    ​​​​sudo aptitude update
    

  4. To install a specific version of Docker Engine, start by list the available versions in the repository

    ​​​​apt-cache madison docker-ce | awk '{ print $3 }'
    

  5. Select the desired version and install version 19.03.15

    ​​​​sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io
    

    You can also choose the newer version as you wish

  6. Fix the package version so that a distribution upgrade leaves the package at the correct version

    ​​​​sudo aptitude hold docker-ce docker-ce-cli containerd.io
    

Containerd Runtime

  1. Load two modules in the current running environment and configure them to load on boot

    ​​​​sudo modprobe overlay
    ​​​​sudo modprobe br_netfilter
    

  2. Add modules to config

    ​​​​cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
    ​​​​overlay
    ​​​​br_netfilter
    ​​​​EOF
    

  3. Configure required sysctl to persist across system reboots

    ​​​​cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
    ​​​​net.bridge.bridge-nf-call-iptables  = 1
    ​​​​net.ipv4.ip_forward                 = 1
    ​​​​net.bridge.bridge-nf-call-ip6tables = 1
    ​​​​EOF
    

  4. Apply sysctl parameters without reboot to current running enviroment

    ​​​​sudo sysctl --system
    

  5. Install containerd packages

    ​​​​sudo apt-get update 
    ​​​​sudo apt-get install -y containerd
    

  6. Create a containerd configuration file

    ​​​​sudo mkdir -p /etc/containerd
    ​​​​sudo containerd config default | sudo tee /etc/containerd/config.toml
    

  7. Edit the configuration at the end of this section in /etc/containerd/config.toml

    ​​​​    ...
    ​​​​      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    

  8. Around line 112, change the value for SystemCgroup from false to true.

    ​​​​    ...
    ​​​​    SystemdCgroup = true 
    ​​​​    ...
    

  9. Restart containerd with the new configuration

    ​​​​sudo systemctl restart containerd
    

Part III: Install Kubernetes Components

  1. If you are behind a proxy, set the proxies:

    ​​​​$ cat <<EOF >> $HOME/.bashrc
    ​​​​export HTTP_PROXY="http://proxy-ip:proxy-port"
    ​​​​export HTTPS_PROXY="http://proxy-ip:proxy-port"
    ​​​​export NO_PROXY="localhost,127.0.0.1,192.168.8.218,10.244.0.0/16,10.96.0.1,10.96.0.10"
    ​​​​EOF
    ​​​​$ source $HOME/.bashrc
    

    Proxy configurations in this documentation are all optional, you can adjust it if they are asked in further step (see part V, step 7)

    Make sure that the following IP address (or ranges) are part of the NO_PROXY list:

    • IP address of the server (192.168.8.218 in the above example)
    • 10.244.0.0/16: address range of Flannel CNI (if you use Calico or Weave as CNI, adapt accordingly)
    • 10.96.0.1 and 10.96.0.10: default private addresses (Cluster IP) for Kubernetes and kube-dns services
  2. Add the kubernetes repo:

    ​​​​curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    

    If you get output as below, you can go on.

    ​​​​OK
    
  3. Add version to kubernetes list

    ​​​​echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
    

  4. Update the apt repository:

    ​​​​sudo aptitude update
    

  5. List all version of kubernetes

    ​​​​curl -s https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages | grep Version | awk '{print $2}' 
    

  6. Install kubernetes command line tools version 1.23.12

    ​​​​VERSION=1.23.12-00
    ​​​​sudo aptitude -y install kubectl=$VERSION kubelet=$VERSION kubeadm=$VERSION
    

  7. Avoid that a distribution upgrade also upgrades the command line tools

    ​​​​sudo aptitude hold kubelet kubeadm kubectl
    

  8. The images can now be pulled

    ​​​​sudo kubeadm config images pull
    

Part IV: Initiate Kubernetes Deployment

  1. Now create the kubernetes cluster. Modify in the command below the apiserver-advertise-address. This should be the IP address of your server:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --kubernetes-version=v1.23.12 --apiserver-advertise-address=<host-ip>

After a while, you should see in the logging the following line, indicating that the above command was successful:

Your Kubernetes control-plane has initialized successfully!

Do not continue if you don't see this message!

Failing Preflight Check

If you encounter error as below

I0110 01:44:44.249623    3162 version.go:256] remote version is much newer: v1.26.0; falling back to: stable-1.25
[init] Using Kubernetes version: v1.25.5
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR Port-6443]: Port 6443 is in use
        [ERROR Port-10259]: Port 10259 is in use
        [ERROR Port-10257]: Port 10257 is in use
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR Port-10250]: Port 10250 is in use
        [ERROR Port-2379]: Port 2379 is in use
        [ERROR Port-2380]: Port 2380 is in use
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher    

you need to reset all first and remove the existing folders

sudo kubeadm reset 
sudo rm -r /var/lib/etcd

then you can execute the kubeadm init again

  1. Finally, Create a config file for kubernetes in the home directory and set the correct permissions:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config
cat <<EOF >> $HOME/.bashrc
export KUBECONFIG=$HOME/.kube/config
EOF
source $HOME/.bashrc

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

Part V: Deploy CNI

Flannel

  1. Install Flannel
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
  1. We need to remove the taint on the master node of the cluster, so it can schedule pods (including the coredns and Flannel pods)
kubectl taint nodes --all node-role.kubernetes.io/master
  1. The cluster should now be up and running
$ kubectl cluster-info
Kubernetes master is running at https://192.168.x.x:6443
KubeDNS is running at https://192.168.x.x:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
  1. Optionally, install bash completion for kubernetes
sudo aptitude -y install bash-completion
echo "source <(kubectl completion bash)" >> $HOME/.bashrc

Calico

  1. Download the Calico networking manifest for the Kubernetes API datastore
curl https://raw.githubusercontent.com/projectcalico/calico/v3.24.5/manifests/calico.yaml -O
  1. Apply the manifest using the following command
kubectl apply -f calico.yaml
  1. The cluster should now be up and running
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.45.71:6443
CoreDNS is running at https://192.168.45.71:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

When you are behind proxy, its possible that you will facing issue such as below:

Failed to create pod sandbox: rpc error: code = Unknown de80e9528307d6921db8610aa8acd30d4f9ff9299a385": plugin type="calico" failed (add): error getting ClusterInformation: Get ormations/default": Service Unavailable

this due to calico will propagate proxy including pod CIDR. So to solve the issue need to follow these steps:

  1. Uninstall kubernetes
  2. Set NO_PROXY on /etc/systemd/system/containerd.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=<proxy-ip>:port"
Environment="HTTPS_PROXY=<proxy-ip>:port"
Environment="NO_PROXY=localhost,127.0.0.1,10.0.0.0/8"
EOF

in above case using 10.0.0.0/8 since we are using 10.244.0.0/16 when doing kubeadm init. Change according to your CIDR.

  1. Reboot

Part VI: Taints

If your pod does not get scheduled, probably your controller has taint on it prevent them to schedule pods. Lets untaint it.

$ kubectl describe node <nodename> | grep Taints
$ kubectl taint nodes --all node-role.kubernetes.io/master-

You can run kubectl get nodes to find your node name

Part VII: Join node

To join the cluster, a node should already have kubelet, kubeadm, kubectl and container runtime onboard. If it doesn't you can repeat the steps from Configuration-and-Additional-Packages
up until Install Kubernetes Components section. Then continue steps below:

  1. Generate token
$ kubeadm token generate
hiyiz8.j7uyt9s11w7oioe4
  1. Create token cacert to join
$ kubeadm token create hiyiz8.j7uyt9s11w7oioe4 --print-join-command

kubeadm join 192.168.45.71:6443 --token hiyiz8.j7uyt9s11w7oioe4 --discovery-token-ca-cert-hash sha256:603fb24e0077feeee4cceb7274cd4b18f4d2bae674f9830fbbeb7e584ea8be44
  1. Then use the command to join

If node status still NotReady after a while it means something is wrong with the node.

 NAME         STATUS   ROLES           AGE   VERSION
controller   Ready    control-plane   75m   v1.25.4
edge         NotReady    <none>       56m   v1.25.4

To solve this do these steps:
1. Drain node
kubectl drain node <node_name>
2. Delete node
kubectl delete node <node_name> --ignore-daemonsets --delete-local-data
3. Go to worker node, do uninstall kubernetes and then clean all existing directories and files
4. Reboot worker node
5. Do the kubeadm join again

  1. Finally add role to node
kubectl label node <node_name>  node-role.kubernetes.io/worker=worker

Part VIII: Install Helm

The installation of helm, the package manager for kubernetes is quite straightforward:

$ curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > get_helm.sh
$ chmod 700 get_helm.sh
$ ./get_helm.sh --version v3.8.2

Appendix

Uninstall Kubernetes

  1. Stopping and uninstalling the kubernetes cluster:
sudo kubeadm reset
  1. Then clean all existing directories and files
rm -rf $HOME/.kube/config*
sudo ip link set vxlan.calico down
sudo ip link delete vxlan.calico
sudo rm -rf /var/lib/cni/
sudo rm -rf /etc/cni/net.d
  1. Reboot
sudo reboot

Note: if you already have helm charts, these need to removed first.

Side Notes

This notes provide errors that I encountered during the installation process with the solution.

Unable to add Kubernetes repo (solved)

When I tried to add Kubernetes repo as below

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

It took a long time until an error notification came up like this.

I suspect this is due to the proxy setting mentioned on the former step. I have tried other combinations (i.e. deleted the Flannel CNI IP address, diactivate and reactivate the tunnel VPN and change the listening port). However, the result came up with the same error.

Solution: Solution was provided by Fransiscus Bimo.

Failing prelight check (solved)

I tried to create Kubernetes cluster by using the following command (Part IV, step 1)

$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --kubernetes-version=v1.23.12 --apiserver-advertise-address=192.168.8.218

However, the output only shows error messages as below

[init] Using Kubernetes version: v1.23.12
[preflight] Running pre-flight checks
        [WARNING HTTPProxy]: Connection to "https://192.168.8.218" uses proxy "http://10.6.0.29:59005". If that is not intended, adjust your proxy settings
        [WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://10.6.0.29:59005". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
        [WARNING HTTPProxyCIDR]: connection to "10.244.0.0/16" uses proxy "http://10.6.0.29:59005". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

I have tried to remove the existing folder and restart it using this command, but the output still came out with the same error message.

sudo kubeadm reset 
sudo rm -r /var/lib/etcd

I also have re-checked the proxy IPs that I have adjusted earlier in (part II, step 1) and (part III, step 1)

export HTTP_PROXY=http://proxy-ip:proxy-port
export HTTPS_PROXY=http://proxy-ip:proxy-port
$ cat <<EOF >> $HOME/.bashrc
export HTTP_PROXY="http://proxy-ip:proxy-port"
export HTTPS_PROXY="http://proxy-ip:proxy-port"
export NO_PROXY="localhost,127.0.0.1,<server-ip>.218,10.244.0.0/16,10.96.0.1,10.96.0.10"
EOF
$ source $HOME/.bashrc
  1. Server IP: 192.168.8.218 (provided earlier by Bimo)
  2. Proxy IP: 10.6.0.29 (as shown in part II, step 1)
  3. Proxy port: 59005 (by the current time I write this note)

It has already been added as mentioned. Nevertheless, it still shows the same error message.

Solution: Remove all the proxies configuration. Try to run the command again, and now the warning notification has disappeared, but the error still occurs. It turns out I have not configured the Cgroup driver. So, by setting the container runtime and kubelet to the appropriate system, we can reset and then restart the process as the following commands:

$ sudo mkdir /etc/docker
$ cat <<EOF | sudo tee /etc/docker/daemon.json
>{
>  "exec-opts": ["native.cgroupdriver=systemd"],
>  "log-driver": "json-file",
>  "log-opts": {
>    "max-size": "100m"
>  },
>  "storage-driver": "overlay2"
>}
>EOF
sudo systemctl enable docker
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo kubeadm reset
sudo kubeadm init

Now the Kubernetes cluster has been successfully created.