owned this note
owned this note
Published
Linked with GitHub
# Airship 2 Troubleshooting Guide
[TOC]
## Overview
This note provides a place where the Airship community can add tips, gotchas, workarounds, etc. for Airship 2. Ultimately this will be included in the Troubleshooting Guide as part of the Airship documentation suite, but we wanted to provide an easy & flexible way to gather information as Airship 2 rolls out.
The note is structured around the deployment lifecycle stages and includes (but is not limited to) potential troubleshooting areas. Please feel free to add any information for the stage or troubleshooting area. There is a free form section below for any additional troubleshooting tips as well as future topics as the platform continues to evolve.
To provide consistency & help organize the issues, we ask that you copy & use the following template.
:::info
#### Problem: Brief Description of the issue
***Phase:*** Phase where the issue occurred
***Error:*** Any error messages that were received
***Probable Cause:*** Root cause of the issue
***Solution:*** Fixes, workarounds or changes needed to resolve the issue
:::
Thank you for your contributions!
## Lifecycle Stages
### Initialization Stage
**Summary:** Set up the local environment to run Airship and work with your site’s document set.
**Potential troubleshooting areas:** Proxy settings
### Preparation Stage
**Summary:** Prepare the artifacts necessary to deploy and lifecycle the site.
**Potential troubleshooting areas:** Manifests, Image Builder - image generation
### Ephemeral Lifecycle Stage
**Summary:** Creates a single-node in-memory Kubernetes cluster on one of the hosts to establish a physical foothold in the new environment.
**Potential troubleshooting areas:** Phases - Phase execution, Image Builder - Applying the ISO image, CAPI/K8s - Provisioning the ephemeral cluster
:::info
#### Problem: SSH to the Ephemeral Node
***Phase:*** Deploying the Ephemeral Node
***Issue:*** "*How to SSH into the Ephemeral node*"
***Probable Cause:*** *NA*.
***Solution:*** The following step will guide you how to SSH to the Ephemeral node.
1. You need to retrieve the password for `deployer` and `root` users for the Ephemeral node.
Execute `airshipctl phase render iso-cloud-init-data > /tmp/baremetal-iso.yaml`
2. Edit the `/tmp/baremetal-iso.yaml` and search for `isoImage`
You should find something like
```yaml=
isoImage:
passwords:
deployer: Lefo3_NidoQuhy
root: Tope1/VoluMugo
```
3. Now you can SSH to the control plane node: `ssh deployer@10.23.25.101` and use the `deployer` password
4. [Optional] you can change the shell to a more friendly one, e.g., execute `bash`
:::
:::info
#### Problem: Unable to connect to the server: x509
***Phase:*** Deploying the Ephemeral Node
***Error:*** "*Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "Kubernetes API")*"
***Probable Cause:*** The issue might be due to the use of an old **kubeconfig** file, which contains a CA certificate that is not matching the Ephemeral cluster's (API server) CA certificate.
***Solution:*** Delete your current **kubeconfig** file and have it regenerated for your current Ephemeral node/cluster deployment.
:::
### Target Cluster Lifecycle
**Summary:** Complete provisioning the fully realized target cluster.
**Potential troubleshooting areas:** Phases - Phase execution, Image Builder - Applying the QCOW image, Manifests - cloud-init, CAPI/K8s - Provisioning the target cluster
:::info
#### Problem: SSH to the Control Plane Node
***Phase:*** Deploying the Control Plane Node
***Issue:*** "*How to SSH into the Control Plane node*"
***Probable Cause:*** *NA*.
***Solution:*** The following step will guide you how to SSH to the Control Plane node.
1. You need to retrieve the Private Key for the Control Plane node.
Execute `airshipctl phase render controlplane-ephemeral > /tmp/controlplane-ephemeral.yaml`
2. Edit the `/tmp/controlplane-ephemeral.yaml` and search for `privateKey`
3. Copy the private key into a file, e.g., `controlplane-private-key` and remove all leading spaces.
4. You need to restrict the permissions to your private key file.
Execute `chmod 400 controlplane-private-key`
5. Now you can SSH to the control plane node: `ssh -i controlplane-private-key deployer@10.23.25.102`
6. [Optional] you can change the shell to a more friendly one, e.g., execute `bash`
>NOTE: you can execute `sudo` commands in the terminal if needed.
:::
### Target Workload Lifecycle
**Summary:** Delivers Kubernetes workloads to the target cluster.
**Potential troubleshooting areas:** Phases - Phase execution, Helm - Deploying workloads via helm operator, Svcs/Apps - deploying workloads in the cluster.
---
## Other Troubleshooting Tips
Feel free to add anything here that doesn't fit into a lifecycle stage. We can map it to the appropriate troubleshooting area.
___
## Future topics:
### Physical Validation
**Summary:** Validate the target infrastructure in the site to ensure it can properly receive a deployment and run workloads.
### Sub-clusters
**Summary:** Information on SIP, ViNO & sub-cluster deployment
### Day 2 Operations
**Summary:** Using Host Config Operator for Day 2 operations