# Adding bare metal nodes to platform vSphere
There are scenarios where a customer may need to add bare metal, platform "none", nodes to a vSphere(or any other cloud for that matter) cluster.
## Configuration
1. Install a platform vSphere cluster
2. Download the RHCOS Live CD which aligns with the installed version of OpenShift.
3. Obtain or create a worker.ign file. This will be used to bootstrap the bare metal node.
4. Boot the new bare metal host from the RHCOS Live CD.
5. Install RHCOS:
```bash=
coreos-installer install /dev/sdX --insecure-ignition --ignition-url=https://path-to-worker-ignition --platform=metal
```
6. Reboot the node
7. If the node fails to get a hostname, reboot the node and type `e` to edit the boot command line. See https://access.redhat.com/solutions/5500131 for further details
8. Once at the emergency mode prompt:
`vi /etc/hostname`
9. Enter the desired host name and exit.
10. Press <CTRL+D> to resume booting
11. Approve CSRs for the node
12. Apply a taint to the node to block workloads from being scheduled:
```yaml=
spec:
taints:
- key: bare-metal/no-vmware
value: 'true'
effect: NoExecute
```
Note: The [vSphere CSI driver daemonset](https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/4479e2418f38cb93b5da4df7e043aff71a20cccc/manifests/vanilla/vsphere-csi-driver.yaml#L565-L569) tolerates all taints. I was able to disable it by making the operator unmanaged and removing the tolerations.
## Results
The node `bare-metal-worker-1` is being hosted on libvirt in my home lab. As can be seen below, all operators are available. Storage is no longer progressing as I had to spin it down to edit the daemonset.
```log=
oc get nodes;oc get co 162-0-168-192.in-addr.arpa: Tue Feb 4 13:12:43 2025
NAME STATUS ROLES AGE VERSION
bare-metal-worker-0 Ready worker 65m v1.31.4
bare-metal-worker-1 Ready worker 43m v1.31.4
ci-op-rvanderp-abc-kqwmr-master-0 Ready control-plane,master,worker 4h4m v1.31.4
ci-op-rvanderp-abc-kqwmr-master-1 Ready control-plane,master,worker 4h4m v1.31.4
ci-op-rvanderp-abc-kqwmr-master-2 Ready control-plane,master,worker 4h4m v1.31.4
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.18.0-rc.7 True False False 3h39m
baremetal 4.18.0-rc.7 True False False 4h1m
cloud-controller-manager 4.18.0-rc.7 True False False 4h4m
cloud-credential 4.18.0-rc.7 True False False 4h4m
cluster-autoscaler 4.18.0-rc.7 True False False 4h1m
config-operator 4.18.0-rc.7 True False False 4h2m
console 4.18.0-rc.7 True False False 3h46m
control-plane-machine-set 4.18.0-rc.7 True False False 4h1m
csi-snapshot-controller 4.18.0-rc.7 True False False 4h2m
dns 4.18.0-rc.7 True False False 3h50m
etcd 4.18.0-rc.7 True False False 3h59m
image-registry 4.18.0-rc.7 True False False 3h49m
ingress 4.18.0-rc.7 True False False 3h50m
insights 4.18.0-rc.7 True False False 4h1m
kube-apiserver 4.18.0-rc.7 True False False 3h58m
kube-controller-manager 4.18.0-rc.7 True False False 3h59m
kube-scheduler 4.18.0-rc.7 True False False 3h58m
kube-storage-version-migrator 4.18.0-rc.7 True False False 4h2m
machine-api 4.18.0-rc.7 True False False 3h58m
machine-approver 4.18.0-rc.7 True False False 4h2m
machine-config 4.18.0-rc.7 True False False 4h
marketplace 4.18.0-rc.7 True False False 4h1m
monitoring 4.18.0-rc.7 True False False 3h48m
network 4.18.0-rc.7 True False False 4h2m
node-tuning 4.18.0-rc.7 True False False 43m
olm 4.18.0-rc.7 True False False 4h1m
openshift-apiserver 4.18.0-rc.7 True False False 3h56m
openshift-controller-manager 4.18.0-rc.7 True False False 3h57m
openshift-samples 4.18.0-rc.7 True False False 3h55m
operator-lifecycle-manager 4.18.0-rc.7 True False False 4h1m
operator-lifecycle-manager-catalog 4.18.0-rc.7 True False False 4h1m
operator-lifecycle-manager-packageserver 4.18.0-rc.7 True False False 3h50m
service-ca 4.18.0-rc.7 True False False 4h2m
storage 4.18.0-rc.7 True True False 4h2m VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
```
The vSphere cloud controller manager complains about the node but doesn't raise any alerts.
```log=
I0204 18:12:19.442201 1 search.go:76] WhichVCandDCByNodeID nodeID: bare-metal-worker-1
E0204 18:12:19.511542 1 datacenter.go:107] Unable to find VM by DNS Name. VM DNS Name: bare-metal-worker-1
E0204 18:12:19.511584 1 search.go:181] Error while looking for vm=bare-metal-worker-1(byName) in vc=81-84-38-10.in-addr.arpa and datacenter=cidatacenter-nested-0: No VM found
I0204 18:12:19.511607 1 search.go:186] Did not find node bare-metal-worker-1 in vc=81-84-38-10.in-addr.arpa and datacenter=cidatacenter-nested-0
I0204 18:12:19.511644 1 search.go:72] WhichVCandDCByNodeID by IP
I0204 18:12:19.511653 1 search.go:76] WhichVCandDCByNodeID nodeID: bare-metal-worker-1
E0204 18:12:19.516869 1 datacenter.go:90] Unable to find VM by IP. VM IP: bare-metal-worker-1
E0204 18:12:19.516903 1 search.go:181] Error while looking for vm=bare-metal-worker-1(byIP) in vc=81-84-38-10.in-addr.arpa and datacenter=cidatacenter-nested-0: No VM found
I0204 18:12:19.516911 1 search.go:186] Did not find node bare-metal-worker-1 in vc=81-84-38-10.in-addr.arpa and datacenter=cidatacenter-nested-0
E0204 18:12:19.516920 1 nodemanager.go:160] WhichVCandDCByNodeID failed using VM name. Err: No VM found
E0204 18:12:19.516927 1 nodemanager.go:205] shakeOutNodeIDLookup failed. Err=No VM found
E0204 18:12:19.516936 1 node_controller.go:285] Error getting instance metadata for node addresses: error fetching node by provider ID: node not found, and error by node name: node not found
```
## Action Items
- [ ] talk to upstream about the daemonset selector
- [ ] can we just update our downstream operator to allow it to be configured?
- [ ] can we just disable CSI altogether?