OCP/Openshift contacts and resources

tags: `Design`

Design of production-level CI system for Podified Control Plane

Assess which currently available systems have potential to investigate for the Openstack operators
Collaborate with QE and the development teams currently working on operators and reach consensus on what system would work end-to-end testing
Work with the OpenStack Operators team to help develop CI for the initial Operator to be prototyped (someone to align with Chris Jones & team) – this will allow other DFGs to have a pattern to follow as they take on stewardship of their own Operators.
Organization ask: not to develop two separate systems in upstream/dev/CI and QE
Open questions for operator testing
- What defines an operator of acceptable quality
- Can Zuul check/gate still be used for testing some aspects - like individual containers
- What generic testing can be used on all operators
- What reporting system can be used
- Is DLRN still part of the system
General CI system open questions
- Do we have a requirement of testing upstream/downstream
- Is there any system in place already in Director Operator that can be reused
- Do we have a defined test matrix of versions
- What is our timeline here
- Is tempest still a requirement/the definitive end-level test collection
Who do you know who works on Openshift
- What team do they work on?
  - Ronelle Landy (contact from Mike Orazi)
    - Jeff Burke (see forwarded email)
    - Emilien presentation (shift on stack) https://docs.google.com/presentation/d/1IwUg1AbXuGJhvrcmEhWMz_yljcDcSOWvW7-ZVMTHzg0/edit?skip_itp2_check=true&pli=1#slide=id.p
  - Chandan Kumar
    - Praveen Kumar - Code Ready Containers
    - Aditya Konarde - OpenShift AppSre's manager
    - Sinny Kumari - OpenShift (currently on maternity leave)
  - Amol Kahat
    - Deepak Punia - Openshift
    - Avadhoot Sagare - Openshift Storage
  - Sandeep Yadav
    - Arvind Iyengar - Openshift QE
    - Martin Schuppert - Podified control plane(Member of Chris jones team)
    - Jaison raju - AM performance and scale
- Do they do CI?
- Would they be willing to talk with our team?
- Add your name as contact
Podified Control Plane - Team meeting Tuesday 1pm UTC
- https://docs.google.com/document/d/1LKkcql0VtvP4fy23uPk8yydbFT6DFOQdIjgkvivtUz8/edit#heading=h.o42u2wtpb6gl
QE presentation: https://docs.google.com/presentation/d/1hI9XvNM7xHe89r-mWkO47YIN-WIQ08mo0a1ufdBl7-E/edit#slide=id.p
CI flow for current OSPdO : https://docs.google.com/presentation/d/1i3kRLFrBu1yvtyLXR5Tp84kff9NfDg7ZzprrqMWkYUk/edit#slide=id.g8ebb39171e_0_13
ShiftonStack CI ppt https://docs.google.com/presentation/d/1IwUg1AbXuGJhvrcmEhWMz_yljcDcSOWvW7-ZVMTHzg0/edit?skip_itp2_check=true&pli=1#slide=id.p
Notes from Jeff's response:

Essentially, we are looking for :

what automation platforms are already in use in Red Hat (prow?) and if they can be used in both upstream and downstream Prow is in use by the TP and ART (OpenShift teams)
where the teams get the hardware/resources to test Depends on what kinds of tests you mean here. CVP has access to do some specific testing (CVP Actually uses the same environment as TP and DPTP) We call them to use clusters. Other teams have dedicated resources, some team have both BM (Beaker) and Cloud resources. That is not a CPaaS or CVP issue persay. How we handle it is once the package is build and available we put a message on the UBM (Universal Message Bus) at that point in time anyone can be listening on the bus and react to a message.
is anyone taking advantage of the environments available from PSI? The services like CVP are running on the BM instance of PSI. The testing resources CVP uses are from the OCP PSI instance. So yes we are taking advantage of them. One of the biggest limitations for PSI is you need to have quota, the resources are assigned to you. As opposed to something like Beaker where you can dynamically ask for hardware and release it for someone else when you are done.

Is there anyone on your team who will be willing to come and give our team a presentation on using your current tools? Can you give me a little more data on what you are delivering? does it have RPMS or are you doing multistage build? Is it a Managed Service? This will help us understand the overview needed.

Meeting notes

dan: https://github.com/openstack-k8s-operators/osp-director-dev-tools (test platform - can drive baremetal openshift install)
udi: https://github.com/openshift/hive/blob/master/docs/using-hive.md

How to test an operator

Tasks

Point of Contact
- mixed rhel - marios
- external compute - chandan
- podified - sandeep/doug
Tasks
- how to deploy keystone operator
- how to test operator
  - installation
  - running it
  - standard tests or framework or guide?
  - workflow (upgrade)
- requirements gathering - document (rlandy)
  - openshift/k8
  - upstream/downstream
  - expectation of zuul inclusion/github actions

Focus on single operator test (e.g keystone)

Testing Envs

crc
microshift
- sean's roles: https://github.com/SeanMooney/ansible_role_devstack/blob/master/ansible/deploy_microshift.yaml
https://github.com/openshift/assisted-installer (not sure)

Space Requirements

8 GB and 100 GB disk space

Where

not upstream/openstack/opendev
internal sf? (PSI stage)

Notes on github actions (marios):

github native like the lint pipeline https://github.com/openstack-k8s-operators/osp-director-dev-tools/actions

Shift on stack (Emilien - vexx workflow) 07/07

https://docs.google.com/presentation/d/1IwUg1AbXuGJhvrcmEhWMz_yljcDcSOWvW7-ZVMTHzg0/edit?skip_itp2_check=true&pli=1#slide=id.p
https://github.com/openshift/release/blob/master/ci-operator/step-registry/openstack/provision/bastionproxy/openstack-provision-bastionproxy-commands.sh
https://github.com/openshift/release/blob/master/ci-operator/step-registry/openstack/provision/bastionproxy/openstack-provision-bastionproxy-ref.yaml
https://github.com/openshift/release/blob/master/ci-operator/step-registry/openstack/conf/generateconfig/openstack-conf-generateconfig-commands.sh
stable payloads - look into
promtion workflows - investigate
cluster profiles and periodics:
useful links
- https://prow.ci.openshift.org/
- https://search.ci.openshift.org/
- https://docs.ci.openshift.org/docs/
- https://vault.ci.openshift.org/ui/
- slack: #forum-testplatform
required: create repo
- try to deploy shift-on-stack step registry with our creds
- #osp-podified-ci
Workflow docs
- https://docs.google.com/presentation/d/1mAiNDjKMxyNQ4PchQV1eVt7PZp4NhCpVcKP8Tb9l5Fc/edit#slide=id.g13bee69c91d_0_1018
- https://hackmd.io/5sXwkCUYR_GWnfetm0mlvw

Mtg prep with Emilien(july 22nd mtg)

What we tried so far:-

Create a repo

we create a repo and job config for one of our operator(mariadb). https://github.com/openshift/release/pull/30638

Cluster profile

Add OpenStack Operators on Vexxhost cluster profile
- https://github.com/Sandeepyadav93/ci-tools/commit/e6f8d0f29588169d4de354b08ff0ca21e28abb4a
Add necessary bits for a new cluster profile: OpenStack operator https://github.com/Sandeepyadav93/release/commit/fe52a52bfa95941f7863b4b7b2807fcbc24a06bd

Created a secret vault collection

https://vault.ci.openshift.org/ui/vault/secrets/kv/show/selfservice/openstack-k8s-operators-secrets/

Doubts

In reference to doc
- ~~This requires openshift-merge-robot to be an admin of the repo - Is it enough to add openshift-merge-robot as organization OWNER?~~
  - Yes having the bot as an org owner will be sufficient.
- ~~Is it okay to install openshift ci github app on repo before our PR merges?~~
  - Yes as for installing the app and bots: we will need to install the app prior to merge (there is a job check-gh-automation that will block merge otherwise)
How can we validate the jobs which we are adding? We are adding some check jobs for mariadb operator in below PR but a different set of jobs are running which is validation the change not the job which we are adding.
- ~~https://github.com/openshift/release/pull/30638~~
- The jobs will be rehearsed on that PR based on the pj-rehearse.openshift.io/can-be-rehearsed label, we need to give it a few minutes and the pj-rehearse job will trigger those jobs and they will appear on the context
How to decide quota for vexx?
- https://github.com/openshift/release/blob/master/core-services/prow/02_config/generate-boskos.py#L95-L96
How to prepare this configuration?
- https://github.com/openshift/release/blob/master/core-services/ci-secret-bootstrap/_config.yaml#L1183-L1221
How to properly add our secrets?
- What we did here with secrets looks right?
  - we followed doc
  - Do we have to add secretsync/target-namespace and secretsync/target-name
  - how to provide clouds.yaml from secrets-namespace?
How we get a AWS subdomain for our clusters?
- *.devcluster.openshift.com
- .awscred: is this secret owned by openshift-ci?

Quota estimation for vexxhost

We need a separate openstack project(because nodepool is deleting floating ips)

Resources needed for single OCP environment

* Control node - 3 nodes - 16G x 3= 48G, 8vcpu x 3=24vcpu , 120GB *3 = 360GB (ram = 48G, cpu = 24vcpu, disk=360GB) 
* worker - 3 nodes = (ram = 48G, cpu = 24vcpu, disk=360GB) 
* bastion(not sure if we use bastion node) = (4G ram, 2vcpu , disk=20GB)

total = (ram = 100G, cpu = 50vcpu, disk=740GB)

For 1 complete OCP deployment:-
- (ram = 100G, cpu = 50vcpu, disk=740GB)
Floating ips =
- 2 floating IPs per job.

Note: We need atleast two full cluster for POC. Also, We need to investigate if reducing number of Control node and worker node is possible.

total full deploys = (ram = 200G, cpu = 190vcpu, disk=1500GB)

Plus 5 single node deployments:

1 node: 32GB, 8vcpu, 120GB, 2 floating ips
Bastion node? probably not

total single node = (ram = 160G, vcpu = 40vcpu, disk=600GB)

Name for project (vexxhost)

openstack-k8s-operator

total quota:

total = (ram = 360G, vcpu = 230vcpu, disk=2100GB)

MTG Agenda with QE on 27th July

Execution environment - prow vs jenkins
- Are there plans to adopt prow in QE?
Does QE have anything running atm in Jenkins, using Hive etc?
- product code
- test code
- test tooling
- execution env
- reporting
CI workflow diagram
- Where QE want to included to be in workflow
  - Check/promotion pipelines?

MTG Agenda with QE on 03rd August

hive
full deploy
future ci-tooling
cluster resources

pkomarov:Topics from a qe-tripleo-ci meeting: QE agree that, with accordance to the flow Sandeep Yadav presented:

Each DFG would have its own operator functional gate with minimal crc+functional operator testing
Then operators are combined and some functional integration testing is run
And that the CI would be written in the same language for dev and qe - so that for qe shift left is easy and for developers early functional testing is available. - Although downstream prow testing is still WIP, so no conclusive results about that.
Discussion around deploying prow internally, Chandan pointed out to below enhancement in prow: Proposal: Support git providers other than GitHub in prow #10146 https://github.com/kubernetes/test-infra/issues/10146

MTG Agenda with QE on 10th August

Sandeep
- Started conversation about PROW deployment internally with DPTP team
  - https://coreos.slack.com/archives/CBN38N3MW/p1660060458747759
    - Private repos
    - vpn connection
      - https://docs.ci.openshift.org/docs/how-tos/adding-a-cluster-profile#vpn-connection.
    - private images
      - https://docs.ci.openshift.org/docs/how-tos/external-images#mirror-private-images.
    - private bucket for logs
  - Spoke with DPTP Team(Jakob Guzik) - He offered a mtg with engineers from his team. (Most likely next week)
  - Request - we write our doubts in advance and share with them and they could answer something before the meeting and we clear doubts in the mtg.
Doug
- Hive on Internal PSI
  - application credentials doesn't work with Hive
  - SNO is up + Mariadb + Keystone
    - Using ~16GB of ram atm

Questions/Doubts for DPTP Team(forum-testplatform team) on 17th Aug

We asked some doubts on forum-testplatform slack channel. You can check the thread here. Jakub Guzik was very kind to offer a meeting with his team to clear our Doubts.

Discuss so far:

Github only: Prow is dedicated to being used only with GitHub. Integration with other services like Gitlab are not possible.
- Old topic
Private Github Organization/Repos:
- If we need privacy, there is a possibility to have private repos or even org.
- There are some Github organization like openshift-priv which are not exposed.
Private bucket for logs
- A separate bucket for logs is also possible.
Running VPN to RH Network
- There are some options for running VPN to rh network but it works as follows: tests continue to be scheduled in the public build clusters as they currently are and they connect via VPN to an internal restricted environment.
- docs
- VPN configuration and credentials are provided via a Secret volume mount, to be added by the ci-operator.
- Useful links:
  - https://github.com/openshift/release/blob/master/clusters/build-clusters/01_cluster/openshift/scc/README.md
  - https://docs.google.com/document/d/1mPjrHVS1EvmLdq4kGhRazTpGu6xVZDyGpVAphVZhX4w/edit?resourcekey=0-KA-qXXq1J2bTR7o6Kit9Vw#heading=h.x9snb54sjlu9
Private Images:
- docs: https://docs.ci.openshift.org/docs/how-tos/external-images#mirror-private-images
Private Instance of deck
- Deck is the frontend for prow: https://prow.ci.openshift.org/. We can create private instances of deck that are secured by oauth.

Doubts

How can we configure private bucket for logs?

Stephen answered in chat already: We would have to do that for you if it became necessary. We point your jobs at a different bucket. This is really just a necessary point to having a private deck instance. I suppose you could provide us with an SA for your own GCS bucket and we could utilize that, but with QE we maintain the bucket.
What it mean by own deck instance?

Stephen answered in chat already: The frontend for prow: https://prow.ci.openshift.org/. We can create private instances of deck that are secured by oauth.
Do we have some existing tests/jobs which already configure VPN to run tests? Curious to know if this workflow(Configuring VPN in test) is already well tested.

jguzik and bbguimaraes answered in chat already:
- This feature is quite new, They are the only client right now: https://github.com/openshift/release/pull/27092
- VPN connections have been in constant use in the Nutanix E2E jobs for a while now, * e.g.: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/5798/pull-ci-openshift-installer-master-e2e-nutanix/1557021185105989632. * Resulting test pods: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/5798/pull-ci-openshift-installer-master-e2e-nutanix/1557021185105989632/artifacts/build-resources/pods.json.
Sandeep: Are there any security concerns in running jobs internally in RH Network from Public Prow?
Sandeep: Will we get support from DPTP team for issues if we run jobs in internal network?
Doug: Is there a way of avoiding public IPs when using our own cluster profile? Our own CI cluster?
Doug: Can we test multiple PR changes within the same job?
Doug: Do we need to always depend on openshift/release repo for our jobs definitions? Can we use private repos for these configurations?

MTG Agenda with QE on 24th August

FYI.. CI team have started adding/moving CI exploration details into Confluence page.
- https://docs.engineering.redhat.com/display/OSP/Podified+Control+Plane±+CI+Documents
We will also move content from this hackmd to google doc and get rid of this hackmd, Will attach google doc link in confluence page.
"Evaluate automated promotion of an operator"
- Gathered details about how devs are build operator images.
  - They build all the required containers using single command
    - https://github.com/openstack-k8s-operators/docs/blob/main/new_operator.md#create-operator-image-and-push-to-custom-registry
```
IMAGE_TAG_BASE=quay.io/sandyada/keystone-operator VERSION=0.0.12 IMG=$IMAGE_TAG_BASE:v$VERSION make manifests build docker-build docker-push bundle bundle-build bundle-push catalog-build catalog-push
```
    - Build and pushed to my registry: https://quay.io/user/sandyada
- Spoke with Jon Schuleter on how we currently build images in cpass
  - https://coreos.slack.com/archives/C03MD4LG22Z/p1661180623204909
    - As per jon - we have a Dockerfile.in template in the midstream code-eng repo that is populated into the dist-git repos and we build using brew builds there.
    - Migi did a lot of work on auto substitution to take the upstream Dockerfiles and convert /translate them enough that they build in brew.
    - https://code.engineering.redhat.com/gerrit/c/osp-director-operator/+/425102
  - Notes: https://docs.engineering.redhat.com/display/PRODCHAIN/OSP+director+operator
Plans for POC
- Add a prow job that trigger on each PR along with our existing lint/unit test in check/periodic that :-
  - Deploy OCP cluster on vexx
  - Deploy podified control plane on OCP cluster
  - Basic service checks

Phil Weeks

2022/08/24 14:15:00

images

iiuc, chris is less interested in the image builds at this point and more about. "I have an operator, this is how to promote it. (Edited)

2022/08/24 14:16:50

POC

we should be thinking of earlier/incremental demonstration of promoting the operator (test coverage is secondary concern, i believe). (Edited)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

OCP/Openshift contacts and resources

tags: Design

Design of production-level CI system for Podified Control Plane

Meeting notes

How to test an operator

Tasks

Shift on stack (Emilien - vexx workflow) 07/07

Mtg prep with Emilien(july 22nd mtg)

What we tried so far:-

Create a repo

Cluster profile

Created a secret vault collection

Doubts

Quota estimation for vexxhost

Resources needed for single OCP environment

Plus 5 single node deployments:

Name for project (vexxhost)

total quota:

MTG Agenda with QE on 27th July

MTG Agenda with QE on 03rd August

MTG Agenda with QE on 10th August

Questions/Doubts for DPTP Team(forum-testplatform team) on 17th Aug

Discuss so far:

Doubts

MTG Agenda with QE on 24th August

tags: `Design`