# TripleO CI Infrastructure rework ###### tags: `Design` This spec outlines the plan for reworking TripleO CI infrastructure. ## Problem description The TripleO CI Infrastructure currently consists of 4 Virtual Machines (VMs) which run multiple services for our internal consumption. The deployment process is built around Ansible playbooks [1]. In the past, due to lack of working staging environment [2], playbooks were not fully executed, creating issues with proper testing of its functionality. Recent tests exposed multiple difficulties with reproducing working environment. The TripleO CI Infrastructure currently cannot be reproduced with an use of existing Ansible playbooks. ### List of encountered problems Some of experienced issues, during attempts to recreate the environment. This list might or might not contain all possible issues. #### Risk of destroying existing infrastructure Full run forces process of tearing down existing infrastructure [3]. This default behavior discourages from using playbooks to avoid unexpected problems. #### Passwords stored in text mode Passwords are stored as a text file [4], which needs to be populated every time it's getting used. [5] #### Servers aren't provisioned with an use of playbooks `Promoter` server should be using Centos 7 image [6] [7] but it was provisioned manually on `CentOS-8-x86_64-GenericCloud`. It creates a security risk, due to running VM exposed to public internet, without applying regular updates. #### Missing Docker configuration options. TripleO CI Infrastructure relies on docker usage. Due to changes introduced by Docker Hub [8], the infrastructure often times exceeds retry limits. To mitigate the issue, each VM which is pulling from Docker Hub, needs to use login and password to access the Docker Hub. Currently only `promoter` supports Docker #### Centos7 assumption for Python 2.7 Python 2.7 has been deprecated on January 1, 2020 [9]. Ansible playbooks were written in mind with an assumption, that they're using Python 2.7. The code which installs `docker-compose` [10], which means it installs incompatible release. ``` [root@rrcockpit dasm]# docker-compose --help Traceback (most recent call last): [...] File "/root/.local/lib/python2.7/site-packages/certifi/core.py", line 17 def where() -> str: ^ SyntaxError: invalid syntax ``` It means, that we cannot easily redeploy the code, without manual intervention, due to mismatch in packages: ``` pip install --user ansible==2.9.27 Babel==0.9.6 backports.ssl-match-hostname==3.5.0.1 cached-property==1.5.1 certifi==2019.11.28 cffi==1.6.0 chardet==3.0.4 cloud-init==19.4 configobj==4.7.2 cryptography==1.7.2 decorator==3.4.0 docker==3.7.3 docker-compose==1.22.0 docker-pycreds==0.4.0 dockerpty==0.4.1 docopt==0.6.2 enum34==1.0.4 ethtool==0.8 functools32==3.2.3.post2 httplib2==0.18.1 idna==2.6 iniparse==0.4 ipaddress==1.0.16 IPy==0.75 Jinja2==2.7.2 jmespath==0.9.4 jsonpatch==1.2 jsonpointer==1.9 jsonschema==2.6.0 kitchen==1.1.1 MarkupSafe==0.11 mercurial==2.6.2 paramiko==2.1.1 perf==0.1 pip==8.1.2 ply==3.4 policycoreutils-default-encoding==0.1 prettytable==0.7.2 pyasn1==0.1.9 pycparser==2.14 pycurl==7.19.0 pygobject==3.22.0 pygpgme==0.3 pyinotify==0.9.4 pyliblzma==0.5.3 pyserial==2.6 python-dateutil==1.5 python-dmidecode==3.10.13 python-linux-procfs==0.4.9 pytoml==0.1.14 pyudev==0.15 pyxattr==0.5.1 PyYAML==3.10 registries==0.1 requests==2.18.4 schedutils==0.4 seobject==0.1 sepolicy==1.1 setuptools==0.9.8 six==1.9.0 syspurpose==1.24.51 texttable==0.9.1 urlgrabber==3.10 urllib3==1.22 virtualenv==15.1.0 websocket-client==0.57.0 yum-metadata-parser==1.1.4 ``` ### TBD - investigation #### Downstream cockpit #### Downstream promoter #### Downstream toolbox ## Proposed change ### Prepare staging environment As mentioned in `List of encountered problems` section, currently our code cannot be safely used to recreate working environment. Each specified section will require time and effort to update the code. Recently, significant amount of work has been done to allow for easier deployments of `staging` environment [11]. To deploy staging environment, one can run: ``` ansible-playbook -vvv -i inventory/ \ -e cloud="rhos_dev_stage" \ -e infra_setup_repo_fetch=https://review.rdoproject.org/r/rdo-infra/ci-config \ -e infra_setup_repo_fetch_refspec={reference_change_number} \ provision-all.yml ``` ### Update the code Due to the nature of our existing infrastructure, one needs to be very careful with changes to existing playbooks and roles. The code is fetched and executed on VMs every 5 minutes [12] [13]. It means, that it could be very easy to merge unnecessary changes and break existing workflow. To prevent from that, we need to start working on introducing new roles, playbooks and test them in `staging` environment. When `staging` environment will be acting as updated version of existing infrastructure, we might be able to remove outdated code. This behavior should prevent from inadvertently breaking existing code. ### Proposal Currently infrastructure runs on Centos 7 [14] and Centos 8 [15]. Both systems aren't in active development anymore, in favor of Centos Stream 9 [16]. Instead of updating existing code to non-supported systems, we need to move forward and implement missing features on Centos Stream 9. When those changes will be applied, we can recreate infrastructure, based on secure release. ## Implementation Some of activities can be split into sub-tasks to streamline and parallelize required work. ### Assignee(s): * Primary assignee: `dasm` * Other contributors: (help needed) ## References [1]: https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/infra-setup [2]: https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/infra-setup/inventory/group_vars/rhos_dev_stage [3]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/full-run.yml#L1 [4]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/inventory/group_vars/vexxhost/secrets_example.yml [5]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/inventory/group_vars/vexxhost/servers.yml#L54 [6]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/inventory/group_vars/vexxhost/servers.yml#L4 [7]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/inventory/group_vars/vexxhost/common.yml#L2 [8]: https://www.docker.com/blog/scaling-docker-to-serve-millions-more-developers-network-egress/ [9]: https://www.python.org/doc/sunset-python-2/ [10]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/setup_docker_compose/tasks/main.yaml#L12 [11]: https://review.rdoproject.org/r/q/topic:ansible-inventory [12]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/base/templates/ansible-pull.sh.j2 [13]: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/base/tasks/main.yml#L95 [14]: https://en.wikipedia.org/wiki/CentOS#CentOS_version_7 [15]: https://en.wikipedia.org/wiki/CentOS#CentOS_Stream [16]: https://blog.centos.org/2021/12/introducing-centos-stream-9/