Installing OKD with Assisted installer

Getting started

Assisted Installer is a web service, which helps installing OCP/OKD via UPI method. The service generates discovery ISO, which registers the hosts and accepts commands from the centralized service.
This method prevents most common mistakes (i.e. network misconfiguration, missing DNS records) and visualized installation process. Unlike IPI method, Assisted Installer relies on user infrastructure, thus giving more options when it comes to architecture choices.

Running Assisted Service

git clone https://github.com/openshift/assisted-service/

Customize service URLs:

[vrutkovs@centos8stream assisted-service]$ git diff
diff --git a/deploy/podman/okd-configmap.yml b/deploy/podman/okd-configmap.yml
index 0fc523cf..206ab5a5 100644
--- a/deploy/podman/okd-configmap.yml
+++ b/deploy/podman/okd-configmap.yml
@@ -3,7 +3,7 @@ kind: ConfigMap
 metadata:
   name: config
 data:
-  ASSISTED_SERVICE_HOST: 127.0.0.1:8090
+  ASSISTED_SERVICE_HOST: 34.118.84.65:8090
   ASSISTED_SERVICE_SCHEME: http
   AUTH_TYPE: none
   DB_HOST: 127.0.0.1
@@ -16,7 +16,7 @@ data:
   DUMMY_IGNITION: "false"
   ENABLE_SINGLE_NODE_DNSMASQ: "false"
   HW_VALIDATOR_REQUIREMENTS: '[{"version":"default","master":{"cpu_cores":4,"ram_mib":16384,"disk_size_gb":120,"installation_disk_speed_threshold_ms":10,"network_latency_threshold_ms":100,"packet_loss_percentage":0},"worker":{"cpu_cores":2,"ram_mib":8192,"disk_size_gb":120,"installation_disk_speed_threshold_ms":10,"network_latency_threshold_ms":1000,"packet_loss_percentage":10},"sno":{"cpu_cores":8,"ram_mib":16384,"disk_size_gb":120,"installation_disk_speed_threshold_ms":10}}]'
-  IMAGE_SERVICE_BASE_URL: http://127.0.0.1:8888
+  IMAGE_SERVICE_BASE_URL: http://34.118.84.65:8888
   IPV6_SUPPORT: "true"
   LISTEN_PORT: "8888"
   NTP_DEFAULT_SERVER: ""
@@ -24,8 +24,8 @@ data:
   POSTGRESQL_PASSWORD: admin
   POSTGRESQL_USER: admin
   PUBLIC_CONTAINER_REGISTRIES: 'quay.io'
-  SERVICE_BASE_URL: http://127.0.0.1:8090
+  SERVICE_BASE_URL: http://34.118.84.65:8090

These URLs must be reachable by the nodes so that the agent could report status and fetch commands from the server.

If you're running in a disconnected environment please update FCOS URLs in OS_IMAGES and image pullspecs in RELEASE_IMAGES and OKD_RPMS_IMAGES

Lets run it:

[vrutkovs@centos8stream assisted-service]$ make deploy-onprem OKD=true
podman play kube --configmap deploy/podman/okd-configmap.yml deploy/podman/pod.yml
Pod:
3e7ae70150dd78758e59131516c356ee63797d06addce81e51ce84904918573c
Containers:
8ec9753fa785b890cfa64f4f0430ee1f0638e1f97cad0cb28caf5a935af57a70
bc9d7aed1dd98116ba4879d27fee00a8e15257274a594681caba90fabaace9eb
17e97906412439f51167afa55c56f95bea2378bcbca196a799078b3d503e3da1
caa66751a213ed0a35bb578838fcee7d59392a48bbcb66d3aaf9ed2fbd5eaf6d

./hack/retry.sh 90 2 "curl -f http://127.0.0.1:8090/ready"
curl -f http://127.0.0.1:8090/ready
curl: (56) Recv failure: Connection reset by peer
> failed with exit code 56, waiting 2 seconds to retry...

Once health checks are passing, ASSISTED_SERVICE_HOST can be opened in the browser:


Discovering hosts

Click "Create new cluster" and fill in the form:


Click "Next" below and click "Add Hosts" button:

We're using "minimal" image, which would pull rootfs from builds.fedoraproject.org and has ssh key embedded.
Click "Generate Discovery ISO"


The dialog exposes discovery ISO used to boot every node. Note, that we don't need any Ignition to boot it.

Lets create a 8 core 16GB RAM VM:

$ wget -O discovery_image_okd.iso 'http://34.118.84.65:8888/images/7ebf7402-e5c2-4332-a90c-5a007034b322?arch=x86_64&type=minimal-iso&version=4.9'
--2022-02-12 15:58:50--  http://34.118.84.65:8888/images/7ebf7402-e5c2-4332-a90c-5a007034b322?arch=x86_64&type=minimal-iso&version=4.9
Connecting to 34.118.84.65:8888... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94828544 (90M) [application/octet-stream]
Saving to: ‘discovery_image_okd.iso’

discovery_image_okd.iso                         100%[======================================================================================================>]  90.44M   375MB/s    in 0.2s    

2022-02-12 15:58:50 (375 MB/s) - ‘discovery_image_okd.iso’ saved [94828544/94828544]

$ sudo mv discovery_image_okd.iso /var/lib/libvirt/images
$ sudo virt-install   --autostart   --virt-type=kvm   --name master   --memory 16500   --vcpus=16   --cdrom=/var/lib/libvirt/images/discovery_image_okd.iso   --disk path=/var/lib/libvirt/images/master.qcow2,size=150,bus=virtio,format=qcow2   --events on_reboot=restart   --boot hd,cdrom   --noautoconsole
WARNING  No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.

Starting install...

Domain is still running. Installation may be in progress.
You can reconnect to the console to complete the installation process.

Wait for machine to request an address:

$ sudo journalctl -b -f -u libvirtd
-- Logs begin at Fri 2022-01-28 10:33:32 UTC. --
Feb 12 15:59:33 centos8stream dnsmasq-dhcp[29740]: DHCPOFFER(virbr0) 192.168.122.97 52:54:00:85:af:dd
Feb 12 15:59:33 centos8stream dnsmasq-dhcp[29740]: DHCPDISCOVER(virbr0) 52:54:00:85:af:dd
Feb 12 15:59:33 centos8stream dnsmasq-dhcp[29740]: DHCPOFFER(virbr0) 192.168.122.97 52:54:00:85:af:dd
Feb 12 15:59:33 centos8stream dnsmasq-dhcp[29740]: DHCPREQUEST(virbr0) 192.168.122.97 52:54:00:85:af:dd
Feb 12 15:59:33 centos8stream dnsmasq-dhcp[29740]: DHCPACK(virbr0) 192.168.122.97 52:54:00:85:af:dd

Update hosts entries, restart DNSMasq and ssh on the node:

$ sudo vi /etc/hosts && sudo pkill -SIGHUP dnsmasq && ssh core@master.okd.this.host
**  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  ** **  **  **  **  **  **  **
This is a host being installed by the OpenShift Assisted Installer.
It will be installed from scratch during the installation.

The primary service is agent.service. To watch its status, run:
sudo journalctl -u agent.service

To view the agent log, run:
sudo journalctl TAG=agent
**  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  ** **  **  **  **  **  **  **
Fedora CoreOS 34.20210626.3.1
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

[systemd]
Failed Units: 1
  selinux.service
[core@localhost ~]$ 

This is a plain FCOS host with kubelet/crio installed in overlayFS:

[core@master ~]$ rpm-ostree status
State: idle
Deployments:
● ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 34.20210626.3.1 (2021-07-14T14:49:01Z)
                    Commit: 252fffde6f56d183a3c51c05a0c602b61011f6cb4de23a58313ba3b0023dc360
              GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39
$ ls -l /usr/bin/kubelet /usr/bin/crio
-rwxr-xr-x. 1 root root  73593408 Feb  7 06:58 /usr/bin/crio
-rwxr-xr-x. 1 root root 121561792 Feb  3 16:10 /usr/bin/kubelet

These binaries are extracted from okd-rpms image, see journalctl -b -u okd-overlay.service.

This VM now runs Assisted Installer agent:

[core@master ~]$ sudo systemctl status agent
● agent.service
     Loaded: loaded (/etc/systemd/system/agent.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/agent.service.d
             └─wait-for-okd.conf
     Active: active (running) since Sat 2022-02-12 16:02:11 UTC; 4min 50s ago
    Process: 1559 ExecStartPre=/usr/local/bin/agent-fix-bz1964591 quay.io/ocpmetal/assisted-installer-agent:latest (code=exited, status=0/SUCCESS)
    Process: 1658 ExecStartPre=podman run --privileged --rm -v /usr/local/bin:/hostbin quay.io/ocpmetal/assisted-installer-agent:latest cp /usr/bin/agent /hostbin (code=exited, status=0/SUCC>
   Main PID: 2123 (agent)
...

And on Assisted Installer UI we see host details:

Click Next to continue


On "Networking" page set cluster networking settings:

Click Next to proceed to review page and click Install cluster

Bootstrap in place

Assisted Installer is fetching selected OKD release, extracts installer and runs it to generate Ignition files. It also ensures that registered hosts can pull images for selected release.


After bootstrap ignition is generated, the service would pass it to the agent, it would apply it on the host (without a reboot) and run bootkube.service:

[core@master ~]$ sudo journalctl -b -u bootkube | head -n10
-- Journal begins at Sat 2022-02-12 15:59:26 UTC, ends at Sat 2022-02-12 16:16:24 UTC. --
Feb 12 16:14:28 random-hostname-74890eca-63ef-4d07-baa3-6ffe3b6e6d97 systemd[1]: Started Bootstrap a Kubernetes cluster.
Feb 12 16:14:29 random-hostname-74890eca-63ef-4d07-baa3-6ffe3b6e6d97 podman[11656]: 2022-02-12 16:14:29.187467435 +0000 UTC m=+0.145509404 container create 1eb62c1c42337c05f452a0ce5edbc219608221e37ab83a3c8b007916ad1b5305 (image=quay.io/openshift/okd@sha256:ce42e3e42c19b2d97f51221a65e1f97191b6f51f6e8552ba339dace501e29d1a, name=charming_margulis, io.openshift.release=4.9.0-0.okd-2022-01-29-035536, io.openshift.release.base-image-digest=sha256:45828d66e36c763d63c851c9c037a16c3bb2df50b26f86e5da461d8f0af225df)
Feb 12 16:14:29 random-hostname-74890eca-63ef-4d07-baa3-6ffe3b6e6d97 podman[11656]: 2022-02-12 16:14:29.259999082 +0000 UTC m=+0.218041023 container init 1eb62c1c42337c05f452a0ce5edbc219608221e37ab83a3c8b007916ad1b5305 (image=quay.io/openshift/okd@sha256:ce42e3e42c19b2d97f51221a65e1f97191b6f51f6e8552ba339dace501e29d1a, name=charming_margulis, io.openshift.release=4.9.0-0.okd-2022-01-29-035536, io.openshift.release.base-image-digest=sha256:45828d66e36c763d63c851c9c037a16c3bb2df50b26f86e5da461d8f0af225df)
...


When bootkube.service is finished, the agent would pack etcd database and manifests in ignition, patch master.ign and write FCOS image to disk, applying the modified master.ign on the host.
This would allow the host to run as a first master without a running bootstrap host (so called "bootstrap-in-place" method)


The VM would now stop, waiting for user to detach the discovery ISO and boot from disk. On boot, the host would start machine-config-daemon-firstboot to pivot into expected OS (unlike overlay in discovery ISO phase it needs to be persisted):

$ sudo virsh start master
Domain 'master' started

$ ssh core@master.okd.this.host
Fedora CoreOS 34.20210626.3.1
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

[core@master ~]$ sudo journalctl -b -f -u machine-config-daemon-firstboot
-- Journal begins at Sat 2022-02-12 16:23:06 UTC, ends at Sat 2022-02-12 16:24:39 UTC. --
Feb 12 16:24:08 master systemd[1]: Starting Machine Config Daemon Firstboot...
Feb 12 16:24:08 master machine-config-daemon[1543]: I0212 16:24:08.415726    1543 update.go:1897] Running: systemctl start rpm-ostreed
Feb 12 16:24:09 master machine-config-daemon[1543]: I0212 16:24:09.860076    1543 rpm-ostree.go:325] Running captured: rpm-ostree status --json
...

Assisted Installer UI shows the host is in "Rebooting" stage, which is expected to last until Assisted Agent is running via kubernetes static pod.

Finalizing the installation

After another reboot OKD machine-os is applied:

$ ssh core@master.okd.this.host
Fedora CoreOS 34
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

Last login: Sat Feb 12 16:24:12 2022 from 192.168.122.1
[core@master ~]$ sudo rpm-ostree status
State: idle
Deployments:
● pivot://quay.io/openshift/okd-content@sha256:7755b7626fe2316173e4ccc7723eeac2438212e98ff16388e57d10380c4a319d
              CustomOrigin: Managed by machine-config-operator
                   Version: 49.34.202201282225-0 (2022-01-28T22:29:16Z)

  fedora:fedora/x86_64/coreos/stable
                   Version: 34.20210626.3.1 (2021-07-14T14:49:01Z)
                    Commit: 252fffde6f56d183a3c51c05a0c602b61011f6cb4de23a58313ba3b0023dc360
              GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39

Check that kubelet.service is running and wait for assisted-installer-controller to become active so that installation would proceed to the next phase:

[core@master ~]$ sudo su
[root@master core]# export KUBECONFIG=/etc/kubernetes/bootstrap-secrets/kubeconfig 
[root@master core]# # oc get pods -n assisted-installer -w
NAME                                     READY   STATUS    RESTARTS   AGE
assisted-installer-controller--1-ghfrj   0/1     Pending   0          12m
assisted-installer-controller--1-ghfrj   0/1     Pending   0          14m
assisted-installer-controller--1-ghfrj   0/1     ContainerCreating   0          14m
assisted-installer-controller--1-ghfrj   1/1     Running             0          14m

Voila

Shortly afterwards UI would update the status:

and when all operators would rollout UI would show console URL and credentials:

Select a repo