Slide 1 - HackMD

Slides: # Slide 1 Hello eveyrone, thanks very much for having me here. I am Miroslav Vadkerti and I am the Team Lead of the Testing Farm Team. I am here today to talk about our Testing Farm service. # Slide 2 Our mission with Testing Farm is to provide an open-source, reliable and scalable Testing System as a service, a service which other teams can use to execute automated tests. For easy integration with our service, we provide an HTTP based API. We scale across infrastructures, including private RH infrastructure and public clouds. Currently we focus on testing operating systems, but we provide value also out of scope of this use case, we can for example run generic container based tests. # Slide 3 We started to be in the Continuos integration business around 2016, when our legacy CI system - BaseOS CI was born. We helped to shape RHEL Gating in early days and BaseOS CI from 2017 became one of the major CI systems contributing with test results. In 2018 the Flexible Metadata Format (FMF) was born, an open-source library providing the base capabilities to store test metadata in Git in a flexible way. In 2019 Test Managemet Tool (TMT) was created standardizing the layout and format of test metadata in Git and providing a user-friendly interface to discover, share and run tests for RHEL, Fedora and recenly also CentOS Stream. In the same year, Artemis, a sub-component of Testing Farm was born, providing advanced provisioning capabilities. BTW there is a presentation about Artemis today on QECamp. Last year we introduced Testing Farm service and today Testing Farm service runs in production as a testing backend for various teams. # Slide 4 Testing Farm service is fairly simple to use. As a user you specify a test you want to run and the environment in which it should be executed. And Testing Farm provides you progress report and results after the tests have been executed. # Slide 5 Let's briefly look at our API. The user who wishes to submit a test request to Testing Farm has to be onboarded and have an api key for authorization. He then uses our `requests` endpoint to submit a request with the details. Tests are defined via a Git repository and we currently support tests described via Test Management Tool format (fmf) - the preferred way, or Standard Test Interface (sti) format, which is a legacy Fedora CI format to describe tests. The only required parameter for test definition is the clonable git repo URL. In the environments section, the user has to specify on which artchitecture should the test run. I will show the HW possiblities on the next slides. Next he needs to define operating system compose against which the tests should run. Other possible options include: * variables and secrets, which both are exposed as environment variables to the tests and provide a way to customize the tests. Secrets are hidden from all logs generated by Testing Farm. * the artifacts array can be used to install arbitrary artifacts on the test environment - like brew builds, koji builds, module builds, copr builds or arbitrary repositories. * it is possible to provide a webhook URL which Testing Farm will call via HTTP POST on state updates of the request, this can be handy to mitigate the need of polling for request updates For getting status of a request we provide an endpoint which provides details about the request like: * state of the request - end states are error and complete * Human readable test summary, overall result and link to xunit * link to our artifacts storage - providing a simple xunit viewer for viewing the results For getting list of supported composes we provide a composes endpoint. # Slide 6 So what features does Testing Farm provide. For our users we are able to failover between infrastructures, meaning that we can transparently for the user move workloads to a healthy compatible infrastructure. We can cloud burst to public cloud infrastructure in case of usage spikes and reduce wait time in case of busy times. With the help of test management tool, our users gain an easy way to discover, share and develop tests and this experience stays the same between RHEL, Fedora and CentOS Stream. Our tests can run in public or also inside RH network. We do support all RHEL architectures, but not in all environments, due to lack of infrastructure provider, more on that in next slide). # Slide 7 Thanks to our service, our users can use one API endpoint to execute tests in Public or inside RH network. The artifacts storage is specific to the network, so internal results do not leak to public. In public we support two architectures, ARM64 and x86_64, thanks to AWS. Internally we support the major infrastructure providers, Beaker, PSI Openstack and AWS connected to internal network, providing all supported architectures. To more improve the experience we plan to use additional infrastructures once they are available or improve the failover and cloud-bursting scenarios. For example for ARM we plan to use by default AWS connected to internal network and failing over to Beaker if reach a certain quota. In case of ppc64 we plan to use IBM cloud and falover to Beaker. For s390x we have no other option just to use Beaker. Once ResourceHub lands, we will integrate it with our provisioner and all our users will start to use it without even needing to think about it :) # Slide 8 On this slide you can see our current reference artchitecture. I won't be going to great details here, due to lack oftime. For running the Core services and Artemis we use Openshift, workers are currently deployed on AWS EC2 instances and use Nomad scheduler to run the workloads. The architecture is expected to change as we will need to containerize also the workers to be able to onboard our service to AppSRE. # Slide 9 Here are some statistics from Testing Farm from our records, to give you an idea of our current scale. # Slide 10 Hello everyone, here I'm goin' to talk about some of our main users and how they integrate with Testing Farm. So the first example we have is the OSCI team, which runs Fedora CI and one of the RHEL CI systems. They use a standard Jenkins based CI system and integrate with Testing Farm by calling the testing farm api in the jobs. All of the tests they run are defined via `tmt`. The installablity and component functional testing run against VMs, while rpmdeplint and rpminspect are container based tests, this example of the container is interesting because now the team all they have to do is basically maintain a container image for these two tests, and then by using the enviroment variables passed to the testing farm API, they customize these generic tests for their needs, to test a specific component. This is a nice example how the team maintains the tests and uses Testing Farm to deal with the execution. All their Jenkins has to do is submit a request to Testing Farm and wait for it to complete. With the usage of the webhoook step plugin they wait until testing farm like notifies them that the status of the request has been updated. After the testing is done their Jenkins reports the results to the message bus. # Slide 11 Our next example is the Packit Github Service. The packit team runs this tool to ease the building and integration of upstream projects to Fedora and some other operating systems. For upstream project pull request they run a copr build which they submit to Testing Farm to test. The tests are defined directly in the repository. And they their tests internally like inside the Red Hat network. And it also uses webhooks to mitigate polling for results. # Slide 12 Another kind of integration that we've been recently working on is integration with the zuul ci system. so zuul is a major CI system that is used for various things, the zuul team recently enabled the ci system for testing fedora and centos dist-git pull requests, and we are able to integrate with zuul to run the tests that are defined directly in the dist-git pull request. The integraion here is done via an ansible role, I actually linked it here so you can look at it. # Slide 13 One of the recent example is the Automotive Toolchain ci pipeline, which is an openshift pipeline-based CI system, and there testinf farm is used to basically build the image (which is currently a workaround, building should not be done via testing farm) but as a quick prototype becuase the service which should be building the image is not ready yet, so we are bulding and also the image via testing farm. I've added some examples how they do it - basically they have pipeline tasks in tekton and via some shell scripts they call the testing farm api, and all the tests are written in tmt. So this is an example how you can use testing farm for kind of generic things outside of the original scope of testing farm. # Slide 14 Another thing that other teams are using are custome github workflow integrations, since some of the teams cannot use packit for whatever reason, we have onboarded them and they've implemented basically the testing of their github pull requests directly via github workflow - there are examples here how the leapp team does it for testing rhel upgrades and also the coreservices apps teams they test actually container images (built from the github pull requests) directly against rhel, centos and fedora. these are pretty nice examples how teams can leverage and just drop their old CI systems for something that is more easy and maintable for them.