Airship 2 Troubleshooting Guide

Overview

This note provides a place where the Airship community can add tips, gotchas, workarounds, etc. for Airship 2. Ultimately this will be included in the Troubleshooting Guide as part of the Airship documentation suite, but we wanted to provide an easy & flexible way to gather information as Airship 2 rolls out.

The note is structured around the deployment lifecycle stages and includes (but is not limited to) potential troubleshooting areas. Please feel free to add any information for the stage or troubleshooting area. There is a free form section below for any additional troubleshooting tips as well as future topics as the platform continues to evolve.

To provide consistency & help organize the issues, we ask that you copy & use the following template.

Problem: Brief Description of the issue

Phase: Phase where the issue occurred

Error: Any error messages that were received

Probable Cause: Root cause of the issue

Solution: Fixes, workarounds or changes needed to resolve the issue

Thank you for your contributions!

Lifecycle Stages

Initialization Stage

Summary: Set up the local environment to run Airship and work with your site’s document set.
Potential troubleshooting areas: Proxy settings

Preparation Stage

Summary: Prepare the artifacts necessary to deploy and lifecycle the site.
Potential troubleshooting areas: Manifests, Image Builder - image generation

Ephemeral Lifecycle Stage

Summary: Creates a single-node in-memory Kubernetes cluster on one of the hosts to establish a physical foothold in the new environment.
Potential troubleshooting areas: Phases - Phase execution, Image Builder - Applying the ISO image, CAPI/K8s - Provisioning the ephemeral cluster

Problem: SSH to the Ephemeral Node

Phase: Deploying the Ephemeral Node

Issue: "How to SSH into the Ephemeral node"

Probable Cause: NA.

Solution: The following step will guide you how to SSH to the Ephemeral node.

  1. You need to retrieve the password for deployer and root users for the Ephemeral node.
    Execute airshipctl phase render iso-cloud-init-data > /tmp/baremetal-iso.yaml
  2. Edit the /tmp/baremetal-iso.yaml and search for isoImage
    You should find something like
isoImage: passwords: deployer: Lefo3_NidoQuhy root: Tope1/VoluMugo
  1. Now you can SSH to the control plane node: ssh deployer@10.23.25.101 and use the deployer password
  2. [Optional] you can change the shell to a more friendly one, e.g., execute bash

Problem: Unable to connect to the server: x509

Phase: Deploying the Ephemeral Node

Error: "Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "Kubernetes API")"

Probable Cause: The issue might be due to the use of an old kubeconfig file, which contains a CA certificate that is not matching the Ephemeral cluster's (API server) CA certificate.

Solution: Delete your current kubeconfig file and have it regenerated for your current Ephemeral node/cluster deployment.

Target Cluster Lifecycle

Summary: Complete provisioning the fully realized target cluster.
Potential troubleshooting areas: Phases - Phase execution, Image Builder - Applying the QCOW image, Manifests - cloud-init, CAPI/K8s - Provisioning the target cluster

Problem: SSH to the Control Plane Node

Phase: Deploying the Control Plane Node

Issue: "How to SSH into the Control Plane node"

Probable Cause: NA.

Solution: The following step will guide you how to SSH to the Control Plane node.

  1. You need to retrieve the Private Key for the Control Plane node.
    Execute airshipctl phase render controlplane-ephemeral > /tmp/controlplane-ephemeral.yaml
  2. Edit the /tmp/controlplane-ephemeral.yaml and search for privateKey
  3. Copy the private key into a file, e.g., controlplane-private-key and remove all leading spaces.
  4. You need to restrict the permissions to your private key file.
    Execute chmod 400 controlplane-private-key
  5. Now you can SSH to the control plane node: ssh -i controlplane-private-key deployer@10.23.25.102
  6. [Optional] you can change the shell to a more friendly one, e.g., execute bash

NOTE: you can execute sudo commands in the terminal if needed.

Target Workload Lifecycle

Summary: Delivers Kubernetes workloads to the target cluster.
Potential troubleshooting areas: Phases - Phase execution, Helm - Deploying workloads via helm operator, Svcs/Apps - deploying workloads in the cluster.


Other Troubleshooting Tips

Feel free to add anything here that doesn't fit into a lifecycle stage. We can map it to the appropriate troubleshooting area.


Future topics:

Physical Validation

Summary: Validate the target infrastructure in the site to ensure it can properly receive a deployment and run workloads.

Sub-clusters

Summary: Information on SIP, ViNO & sub-cluster deployment

Day 2 Operations

Summary: Using Host Config Operator for Day 2 operations

Select a repo