owned this note
owned this note
Published
Linked with GitHub
# FCOS infrastructure requirements
In the *ideal* scenario, we'd like to have an OpenShift cluster for every architecture we want to support. This is unrealistic at this point, so we're currently working on a model like this:
1. Primary x86_64 pipeline running in an OpenShift cluster (in a Jenkins instance to start, though that may change eventually), ideally on bare metal with virtualization support, though nested virt would be OK too.
- We need some level of control at the OpenShift project level. E.g. need to be able to modify build configs or deployment configs easily.
- Uptime/SLE comparable to the rest of the Fedora infra.
- Ideally, we would not maintain this cluster, though we could if needed. E.g. one path forward is to just give us some hardware and we manage it ourselves.
2. From the primary pipeline, we would SSH to multi-arch machines (i.e. not part of the cluster) that have `/dev/kvm` and `podman`, ideally Fedora CoreOS or Fedora, to build artifacts.
- Ideally, we would not maintain these nodes, though we could if needed.
- DM, if it's Fedora CoreOS OS maintenance wouldn't really be an issue, but yeah, not having to maintain the hardware if something goes bad would be nice.
- we'd need to configure zincati to try to not do updates in the middle of a pipeline build
- JL: bootstrapping issue: can't be FCOS to start with if we don't yet have FCOS on multi-arch :)
- DM: well we do have dev builds that we could start with!
- Uptime/SLE comparable to the rest of the Fedora infra
Q&A:
1. Q [kevin] - In Fedora Infra we only have openshift 3.11 and it's running on vm's. I don't think this will meet the above needs?
(we do plan to deploy a 4.x bare metal cluster to replace these, but it's not available yet)
- JL: Ouch. Hmm does it support nested virt at least? It's not ideal, but it should work for (1) until we move to bare metal. I think we checked this already in the past, but don't remember the outcome. :|
Re. versioning, OCP 4.x is preferred (because it's what we actively work on as a team), but personally wouldn't call it a dealbreaker. DM may disagree. :)
- JL: do you have a timeline re. when this bare metal 4.x cluster will be installed?
- K: Well, we have the hardware racked and mgmt access, we are just waiting for cycles to do the actuall install and setup. I would hope in the next month or so?
2. Q [kevin] - Would the fedora infra openshift app constraints be ok? everything in git, etc?
- JL: Perhaps. But IIRC the permissions there are quite restrictive. E.g. you can't `oc create -f testpod.yaml` or `oc exec` to test things. Would it be possible to loosen the RBACs for a specific project only?
- K: I suppose, but the idea was to have that production cluster very locked down so we can easily redeploy it and know nothing would be lost. We keep hoping to have a development cluster up soon in AWS, but it keeps hitting stalls. ;( )
- DM: for our purposes we have optimized being able to start our pipelines from scratch. Fedora infra need not worry about re-deploying our pipeline if the cluster goes boom. We just need the access to create things ourselves.
3. Q [kevin] - For the arch nodes, does that need root access, or just a user who can run podman and access /dev/kvm?
- JL: rootless podman + /dev/kvm is all we need.
- K: Does it need ssh access while it's running? or just to start the job and save off results?
- JL: Dusty can definitively answer that, but I think it'd require constant SSH access. We might be able to tweak the model though if that's an issue for some reason.
- DM: The model we're using right now continuously uses the SSH connection.
4. Q [kevin] - For the arch nodes, no other config needs to be done directly on them? just ssh and podman and you make all your changes in containers, which you manage elsewhere?
- JL: Yup exactly! (With some expectation that the podman installed there is kept somewhat up to date with the latest release in Fedora so there isn't too big a skew.)
- K: is the ssh access just needed to start the jobs? or does it interactively do things?
- DM: The model we're using right now continuously uses the SSH connection.
5. Q [kevin] - How idle would the arch nodes be? Is the pipeline running things most of the time? or just sporadically? I ask this because I wonder if it's worth exploring time sharing koji builders (ie, disable the builder in koji, run FCOS pods, re-enable in koji and it just does both part time)
- JL: It's idle most of the time. We have development and mechanical builds going every X hours, and production builds are every two weeks. So I don't think we necessarily need dedicated builders.
6. Q [kevin] - where does the ssh for arch nodes come from? The openshift pipeline? Which means it would be better for all of them to be in the same place?
- JL: Yes, we would SSH from the pipeline to the node. I guess they don't *have* to be in the same physical place as long as there's SSH access possible.
7. Q [kevin] - any storage needs?
- JL: On the OpenShift side, we would need a PVC with let's say... 10G at least, basically just to persist Jenkins state and logs. From Jenkins, we provision pods which need a lot of ephemeral storage (e.g. let's say 30G to be safe). For the multi-arch nodes we SSH into, we'll probably need a similar amount of ephemeral storage, though I'll let DM answer that one.
- DM: Is storage a big concern? If we could get 100G everywhere (openshift PV and on the multiarch nodes under /home/ then I'd think we'd be fine).
Outcome:
- short term: we can use the AWS ARM instance to produce non-prod builds
- slightly less short term: ARM hardware in CentOS CI we can SSH into
- long term: OCP4 bare metal cluster in Fedora infra from which we can SSH to Koji builders... need to flesh out Koji integration
AI:
- Dusty will go ahead with using AWS ARM hardware for now
- Kevin to reply on thread after looking at HW inventory re. whether to do prod builds in AWS or use ARM HW in cluster