owned this note
owned this note
Published
Linked with GitHub
# OCP/Openshift contacts and resources
###### tags: `Design`
## Design of production-level CI system for Podified Control Plane
* Assess which currently available systems have potential to investigate for the Openstack operators
* Collaborate with QE and the development teams currently working on operators and reach consensus on what system would work end-to-end testing
* Work with the OpenStack Operators team to help develop CI for the initial Operator to be prototyped (someone to align with Chris Jones & team) – this will allow other DFGs to have a pattern to follow as they take on stewardship of their own Operators.
* Organization ask: not to develop two separate systems in upstream/dev/CI and QE
* Open questions for operator testing
* What defines an operator of acceptable quality
* Can Zuul check/gate still be used for testing some aspects - like individual containers
* What generic testing can be used on all operators
* What reporting system can be used
* Is DLRN still part of the system
* General CI system open questions
* Do we have a requirement of testing upstream/downstream
* Is there any system in place already in Director Operator that can be reused
* Do we have a defined test matrix of versions
* What is our timeline here
* Is tempest still a requirement/the definitive end-level test collection
* Who do you know who works on Openshift
* What team do they work on?
* Ronelle Landy (contact from Mike Orazi)
* Jeff Burke (see forwarded email)
* Emilien presentation (shift on stack) https://docs.google.com/presentation/d/1IwUg1AbXuGJhvrcmEhWMz_yljcDcSOWvW7-ZVMTHzg0/edit?skip_itp2_check=true&pli=1#slide=id.p
* Chandan Kumar
* Praveen Kumar - Code Ready Containers
* Aditya Konarde - OpenShift AppSre's manager
* Sinny Kumari - OpenShift (currently on maternity leave)
* Amol Kahat
* Deepak Punia - Openshift
* Avadhoot Sagare - Openshift Storage
* Sandeep Yadav
* Arvind Iyengar - Openshift QE
* Martin Schuppert - Podified control plane(Member of Chris jones team)
* Jaison raju - AM performance and scale
* Do they do CI?
* Would they be willing to talk with our team?
* Add your name as contact
* Podified Control Plane - Team meeting Tuesday 1pm UTC
* https://docs.google.com/document/d/1LKkcql0VtvP4fy23uPk8yydbFT6DFOQdIjgkvivtUz8/edit#heading=h.o42u2wtpb6gl
* QE presentation: https://docs.google.com/presentation/d/1hI9XvNM7xHe89r-mWkO47YIN-WIQ08mo0a1ufdBl7-E/edit#slide=id.p
* CI flow for current OSPdO : https://docs.google.com/presentation/d/1i3kRLFrBu1yvtyLXR5Tp84kff9NfDg7ZzprrqMWkYUk/edit#slide=id.g8ebb39171e_0_13
* ShiftonStack CI ppt
https://docs.google.com/presentation/d/1IwUg1AbXuGJhvrcmEhWMz_yljcDcSOWvW7-ZVMTHzg0/edit?skip_itp2_check=true&pli=1#slide=id.p
* Notes from Jeff's response:
Essentially, we are looking for :
- what automation platforms are already in use in Red Hat (prow?) and if they can be used in both upstream and downstream
Prow is in use by the TP and ART (OpenShift teams)
- where the teams get the hardware/resources to test
Depends on what kinds of tests you mean here. CVP has access to do some specific testing (CVP Actually uses the same environment as TP and DPTP) We call them to use clusters. Other teams have dedicated resources, some team have both BM (Beaker) and Cloud resources. That is not a CPaaS or CVP issue persay. How we handle it is once the package is build and available we put a message on the UBM (Universal Message Bus) at that point in time anyone can be listening on the bus and react to a message.
- is anyone taking advantage of the environments available from PSI?
The services like CVP are running on the BM instance of PSI. The testing resources CVP uses are from the OCP PSI instance. So yes we are taking advantage of them. One of the biggest limitations for PSI is you need to have quota, the resources are assigned to you. As opposed to something like Beaker where you can dynamically ask for hardware and release it for someone else when you are done.
* Is there anyone on your team who will be willing to come and give our team a presentation on using your current tools?
Can you give me a little more data on what you are delivering? does it have RPMS or are you doing multistage build? Is it a Managed Service? This will help us understand the overview needed.
### Meeting notes
* dan: https://github.com/openstack-k8s-operators/osp-director-dev-tools (test platform - can drive baremetal openshift install)
* udi: https://github.com/openshift/hive/blob/master/docs/using-hive.md
### How to test an operator
* https://docs.ci.openshift.org/docs/
* https://docs.ci.openshift.org/docs/architecture/ci-operator/
* https://github.com/openshift/release/blob/master/ci-operator/jobs/openstack-k8s-operators/osp-director-operator/openstack-k8s-operators-osp-director-operator-master-presubmits.yaml
## Tasks
* Point of Contact
* mixed rhel - marios
* external compute - chandan
* podified - sandeep/doug
* Tasks
* how to deploy keystone operator
* how to test operator
* installation
* running it
* standard tests or framework or guide?
* workflow (upgrade)
* requirements gathering - document (rlandy)
* openshift/k8
* upstream/downstream
* expectation of zuul inclusion/github actions
Focus on single operator test (e.g keystone)
Testing Envs
* crc
* microshift
* sean's roles: https://github.com/SeanMooney/ansible_role_devstack/blob/master/ansible/deploy_microshift.yaml
* https://github.com/openshift/assisted-installer (not sure)
Space Requirements
* 8 GB and 100 GB disk space
Where
* not upstream/openstack/opendev
* internal sf? (PSI stage)
Notes on github actions (marios):
* github native like the lint pipeline https://github.com/openstack-k8s-operators/osp-director-dev-tools/actions
### Shift on stack (Emilien - vexx workflow) 07/07
* https://docs.google.com/presentation/d/1IwUg1AbXuGJhvrcmEhWMz_yljcDcSOWvW7-ZVMTHzg0/edit?skip_itp2_check=true&pli=1#slide=id.p
* https://github.com/openshift/release/blob/master/ci-operator/step-registry/openstack/provision/bastionproxy/openstack-provision-bastionproxy-commands.sh
* https://github.com/openshift/release/blob/master/ci-operator/step-registry/openstack/provision/bastionproxy/openstack-provision-bastionproxy-ref.yaml
* https://github.com/openshift/release/blob/master/ci-operator/step-registry/openstack/conf/generateconfig/openstack-conf-generateconfig-commands.sh
* stable payloads - look into
* promtion workflows - investigate
* cluster profiles and periodics:
* https://github.com/openshift/release/search?q=openstack-vh-mecha-az0
* https://github.com/openshift/release/blob/1a076190703a870eb3874eb72a0808a9e57b5e6d/ci-operator/config/shiftstack/shiftstack-ci/shiftstack-shiftstack-ci-main__periodic-4.11.yaml
* https://docs.ci.openshift.org/docs/how-tos/onboarding-a-new-component/
* https://docs.ci.openshift.org/docs/how-tos/adding-a-cluster-profile/
* look into VPN connection with prow
* vault (creds)
* useful links
* https://prow.ci.openshift.org/
* https://search.ci.openshift.org/
* https://docs.ci.openshift.org/docs/
* https://vault.ci.openshift.org/ui/
* slack: #forum-testplatform
* required: create repo
* try to deploy shift-on-stack step registry with our creds
* #osp-podified-ci
* Workflow docs
* https://docs.google.com/presentation/d/1mAiNDjKMxyNQ4PchQV1eVt7PZp4NhCpVcKP8Tb9l5Fc/edit#slide=id.g13bee69c91d_0_1018
* https://hackmd.io/5sXwkCUYR_GWnfetm0mlvw
## Mtg prep with Emilien(july 22nd mtg)
### What we tried so far:-
#### Create a repo
* we create a repo and job config for one of our operator(mariadb).
https://github.com/openshift/release/pull/30638
#### Cluster profile
* Add OpenStack Operators on Vexxhost cluster profile
* https://github.com/Sandeepyadav93/ci-tools/commit/e6f8d0f29588169d4de354b08ff0ca21e28abb4a
* Add necessary bits for a new cluster profile: OpenStack operator
https://github.com/Sandeepyadav93/release/commit/fe52a52bfa95941f7863b4b7b2807fcbc24a06bd
#### Created a secret vault collection
* https://vault.ci.openshift.org/ui/vault/secrets/kv/show/selfservice/openstack-k8s-operators-secrets/
### Doubts
* In reference to [doc](https://docs.ci.openshift.org/docs/how-tos/onboarding-a-new-component/#granting-robots-privileges-and-installing-the-github-app)
* ~~This requires openshift-merge-robot to be an admin of the repo - Is it enough to add openshift-merge-robot as organization `OWNER`?~~
* Yes having the bot as an org owner will be sufficient.
* ~~Is it okay to install openshift ci github app on repo before our PR merges?~~
* Yes as for installing the app and bots: we will need to install the app prior to merge (there is a job `check-gh-automation` that will block merge otherwise)
* ~~How can we validate the jobs which we are adding? We are adding some check jobs for mariadb operator in below PR but a different set of jobs are running which is validation the change not the job which we are adding.~~
* ~~https://github.com/openshift/release/pull/30638~~
* The jobs will be rehearsed on that PR based on the pj-rehearse.openshift.io/can-be-rehearsed label, we need to give it a few minutes and the pj-rehearse job will trigger those jobs and they will appear on the context
* How to decide quota for vexx?
* https://github.com/openshift/release/blob/master/core-services/prow/02_config/generate-boskos.py#L95-L96
* How to prepare this configuration?
* https://github.com/openshift/release/blob/master/core-services/ci-secret-bootstrap/_config.yaml#L1183-L1221
* How to properly add our secrets?
* What we did [here](https://vault.ci.openshift.org/ui/vault/secrets/kv/show/selfservice/openstack-k8s-operators-secrets/cloud-credentials-openstack-operators-vexxhost) with secrets looks right?
* we followed [doc](https://docs.ci.openshift.org/docs/how-tos/adding-a-new-secret-to-ci/#add-a-new-secret)
* Do we have to add `secretsync/target-namespace` and `secretsync/target-name`
* how to provide clouds.yaml from secrets-namespace?
* How we get a AWS subdomain for our clusters?
* *.devcluster.openshift.com
* .awscred: is this secret owned by openshift-ci?
## Quota estimation for vexxhost
* We need a **separate openstack project**(because nodepool is deleting floating ips)
### Resources needed for single OCP environment
~~~
* Control node - 3 nodes - 16G x 3= 48G, 8vcpu x 3=24vcpu , 120GB *3 = 360GB (ram = 48G, cpu = 24vcpu, disk=360GB)
* worker - 3 nodes = (ram = 48G, cpu = 24vcpu, disk=360GB)
* bastion(not sure if we use bastion node) = (4G ram, 2vcpu , disk=20GB)
total = (ram = 100G, cpu = 50vcpu, disk=740GB)
~~~
* For 1 complete OCP deployment:-
* **(ram = 100G, cpu = 50vcpu, disk=740GB)**
* Floating ips =
* **2 floating IPs per job.**
Note: We need atleast two full cluster for POC. Also, We need to investigate if reducing number of Control node and worker node is possible.
**total full deploys = (ram = 200G, cpu = 190vcpu, disk=1500GB)**
### Plus 5 single node deployments:
* 1 node: 32GB, 8vcpu, 120GB, 2 floating ips
* Bastion node? probably not
**total single node = (ram = 160G, vcpu = 40vcpu, disk=600GB)**
### Name for project (vexxhost)
* openstack-k8s-operator
### total quota:
**total = (ram = 360G, vcpu = 230vcpu, disk=2100GB)**
## MTG Agenda with QE on 27th July
* Execution environment - prow vs jenkins
* Are there plans to adopt prow in QE?
* Does QE have anything running atm in Jenkins, using Hive etc?
* product code
* test code
* test tooling
* execution env
* reporting
* CI workflow [diagram](https://docs.google.com/document/d/1WpTPVcr19xgXUqH8Cc-nTb8q5hlZssIuswBGuMCNCsc/edit)
* Where QE want to included to be in workflow
* Check/promotion pipelines?
*
## MTG Agenda with QE on 03rd August
* hive
* full deploy
* future ci-tooling
* cluster resources
pkomarov:Topics from a qe-tripleo-ci meeting:
QE agree that, with accordance to the flow Sandeep Yadav presented:
* Each DFG would have its own operator functional gate with minimal crc+functional operator testing
* Then operators are combined and some functional integration testing is run
* And that the CI would be written in the same language for dev and qe - so that for qe shift left is easy and for developers early functional testing is available. - Although downstream prow testing is still WIP, so no conclusive results about that.
* Discussion around deploying prow internally, Chandan pointed out to below enhancement in prow:
Proposal: Support git providers other than GitHub in prow #10146
https://github.com/kubernetes/test-infra/issues/10146
## MTG Agenda with QE on 10th August
* Sandeep
* Started conversation about PROW deployment internally with DPTP team
* https://coreos.slack.com/archives/CBN38N3MW/p1660060458747759
* Private repos
* vpn connection
* https://docs.ci.openshift.org/docs/how-tos/adding-a-cluster-profile#vpn-connection.
* private images
* https://docs.ci.openshift.org/docs/how-tos/external-images#mirror-private-images.
* private bucket for logs
* Spoke with DPTP Team(Jakob Guzik) - He offered a mtg with engineers from his team. (Most likely next week)
* Request - we write our doubts in advance and share with them and they could answer something before the meeting and we clear doubts in the mtg.
* Doug
* Hive on Internal PSI
* application credentials doesn't work with Hive
* SNO is up + Mariadb + Keystone
* Using ~16GB of ram atm
## Questions/Doubts for DPTP Team(forum-testplatform team) on 17th Aug
* We asked some doubts on `forum-testplatform` slack channel. You can check the thread [here](https://coreos.slack.com/archives/CBN38N3MW/p1660060458747759). Jakub Guzik was very kind to offer a meeting with his team to clear our Doubts.
### Discuss so far:
* **Github only**: Prow is dedicated to being used only with GitHub. Integration with other services like Gitlab are not possible.
* [Old topic](https://github.com/kubernetes/test-infra/issues/8435)
* **Private Github Organization/Repos**:
* If we need privacy, there is a possibility to have private repos or even org.
* There are some Github organization like `openshift-priv` which are not exposed.
* **Private bucket for logs**
* A separate bucket for logs is also possible.
* **Running VPN to RH Network**
* There are some options for running VPN to rh network but it works as follows: tests continue to be scheduled in the public build clusters as they currently are and they connect via VPN to an internal restricted environment.
* [docs](https://docs.ci.openshift.org/docs/how-tos/adding-a-cluster-profile#vpn-connection.)
* VPN configuration and credentials are provided via a Secret volume mount, to be added by the ci-operator.
* Useful links:
* https://github.com/openshift/release/blob/master/clusters/build-clusters/01_cluster/openshift/scc/README.md
* https://docs.google.com/document/d/1mPjrHVS1EvmLdq4kGhRazTpGu6xVZDyGpVAphVZhX4w/edit?resourcekey=0-KA-qXXq1J2bTR7o6Kit9Vw#heading=h.x9snb54sjlu9
* **Private Images**:
* docs: https://docs.ci.openshift.org/docs/how-tos/external-images#mirror-private-images
* **Private Instance of deck**
* Deck is the frontend for prow: https://prow.ci.openshift.org/. We can create private instances of deck that are secured by oauth.
#### Doubts
* How can we configure private bucket for logs?
*Stephen answered in chat already*:
We would have to do that for you if it became necessary. We point your jobs at a different bucket. This is really just a necessary point to having a private deck instance. I suppose you could provide us with an SA for your own GCS bucket and we could utilize that, but with QE we maintain the bucket.
* What it mean by own deck instance?
*Stephen answered in chat already:* The frontend for prow: https://prow.ci.openshift.org/. We can create private instances of deck that are secured by oauth.
* Do we have some existing tests/jobs which already configure VPN to run tests? Curious to know if this workflow(Configuring VPN in test) is already well tested.
*jguzik and bbguimaraes answered in chat already:*
* This feature is quite new, They are the only client right now: https://github.com/openshift/release/pull/27092
* VPN connections have been in constant use in the Nutanix E2E jobs for a while now,
* e.g.: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/5798/pull-ci-openshift-installer-master-e2e-nutanix/1557021185105989632.
* Resulting test pods: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/5798/pull-ci-openshift-installer-master-e2e-nutanix/1557021185105989632/artifacts/build-resources/pods.json.
* Sandeep: Are there any security concerns in running jobs internally in RH Network from Public Prow?
* Sandeep: Will we get support from DPTP team for issues if we run jobs in internal network?
* Doug: Is there a way of avoiding public IPs when using our own cluster profile? Our own CI cluster?
* Doug: Can we test multiple PR changes within the same job?
* Doug: Do we need to always depend on openshift/release repo for our jobs definitions? Can we use private repos for these configurations?
## MTG Agenda with QE on 24th August
* FYI.. CI team have started adding/moving CI exploration details into Confluence page.
* https://docs.engineering.redhat.com/display/OSP/Podified+Control+Plane+-+CI+Documents
* We will also move content from this hackmd to google doc and get rid of this hackmd, Will attach google doc link in confluence page.
* "Evaluate automated promotion of an operator"
* Gathered details about how devs are build operator images.
* They build all the required containers using single command
* https://github.com/openstack-k8s-operators/docs/blob/main/new_operator.md#create-operator-image-and-push-to-custom-registry
~~~
IMAGE_TAG_BASE=quay.io/sandyada/keystone-operator VERSION=0.0.12 IMG=$IMAGE_TAG_BASE:v$VERSION make manifests build docker-build docker-push bundle bundle-build bundle-push catalog-build catalog-push
~~~
* Build and pushed to my registry: https://quay.io/user/sandyada
* Spoke with Jon Schuleter on how we currently build images in cpass
* https://coreos.slack.com/archives/C03MD4LG22Z/p1661180623204909
* As per jon - we have a Dockerfile.in template in the midstream code-eng repo that is populated into the dist-git repos and we build using brew builds there.
* Migi did a lot of work on auto substitution to take the upstream Dockerfiles and convert /translate them enough that they build in brew.
* https://code.engineering.redhat.com/gerrit/c/osp-director-operator/+/425102
* Notes: https://docs.engineering.redhat.com/display/PRODCHAIN/OSP+director+operator
* Plans for POC
* Add a prow job that trigger on each PR along with our existing lint/unit test in check/periodic that :-
* Deploy OCP cluster on vexx
* Deploy podified control plane on OCP cluster
* Basic service checks