RIOT CI Meeting 12/2020

# RIOT CI Meeting 12/2020 03.12.2020 @ 10:00 CET Participants: Kaspar Francisco Martine Leandro Cenk Kevin Alex Koen # Agenda - administrative / procedures / how to work: - general - sucks to maintain a service without access and/or authority. how to deal with it? - who is responsible for what? - bus factor (as usual) - Feature request/spec - integrate HIL - work towards merging robot framework? - run on PRs? - make HIL boards usable for "regular" tests? - deduplicate work? (e.g., PaperCI Murdock/dwq HIL use three different ways of using physical boards, flashing, serial IO, ...) - tooling choice. Murdock lacks some features and has low bus factor, how do alternatives do? (possibility of replacing murdock "frontend", but keep "backend", as that is fast. e.g., use github actions to manage builds, but build using dwq on-premise) - using `riotctrl` for tests - other issues: - container update basically blocked on Doxygen issue, fix issue or build old version? ## Why are we here Kaspar is gets frustrated with getting CI updates Issues with admin borders. We should formalize the CI team with wikipage. Come up with procedures on how to do CI releated things and a plan or roadmap. How to have centralized info: HTML page at the CI? HackMD? What is on the centralized info for distributed note taking: - Overview - CI Team - Roadmap - How CI works Something about CI should be in the RIOT documentation. Two areas? - In RIOT documentation for static information about the CI - hackMD for shared notes # Agreement: HackMD to share CI notes volatile CI data, once static add to RIOT documentation Maybe fallback to https://github.com/hedgedoc/hedgedoc if hackMD is shitty. ## Bus factor We have ci.riot-os.org at HAW that runs murdock but is hacky with uncommited changes (they are backed up). More than hamburg guys to config and reset. How to do this? - Jenkins We reset murdock once every 6 months. Does it make sense to put work into a web interface. Murdock docker-compose has some requires a public endpoint setup, thus no easy "launch local instance" There are 2 points that may be needed to integrate the HiL stuff. How to trigger HiL based on murdock build. - We should namespace the web interface. - Remove hardcoding of riot-os/riot - Allow maintainers to access ssh, make sure permissions are set to prevent full access. - How to manage loadbalancing with murdock master - How to view logs from murdock master Can we host logstash with docker compose for the logs? HAW wants to harden servers. If user based, how to manage revolking credentials. KS: can be done with a script. KS: Tried droneio AA: what about buildkite Should we updated the murdock backend, hard to do that dynamically? BuildKite is not self hosted master. dwq has issues with containers, maybe have dwq in a container. # Agreement: Get a tool (possibly buildkite) to used murdock backend to reduce bus factor To understand how dwq works look at https://github.com/antirez/disque Kaspar-> will harden dwq to work locally, easily. ## Feature request/spec The following will indicate which features we want to spend time implementing and which ones we can ignore (checked means we agree that we want them) - Frontend: - [x] Web interface for config and access management -> solved with dropping Murdock frontend -> Kaspar - [ ] Allow fallback build server when main build server busy, ie. my local computer/build server compiles if murdock is busy -> Maybe solved with frontend -> Breaks fairness of access -> Postponed - [x] Staged CI, with one set per platform -> Kaspar to add a simple 1 board all tests, examples/default all boards - [x] Incremental builds -> Not enough trust in build system to do that -> reduce tests to just used different configs -> Already done -> Leandro https://github.com/leandrolanzieri/RIOT/commits/dev/kconfig/parameterized_tests - [x] Comment on PR -> Repost to the same comment -> Helps with new contributors -> use postbuild hook -> Needs technical issues sorted -> How to deal with static test results ->Koen/Kaspar LUNCH BREAK -> 1pm CET to resume - [ ] Control with comments (e.g., "@ci: test tests/foo tests/blah on native, samr21") -> First we need get comment on pr working -> We do want this -> Still require full test before merge -> implement in rust Kaspar -> WIP - [x] Display and filter of test archive (pass/fail or benchmarks) We have that for build sizes How to store data Looking at delta of metrics, but hard to have a reference (unless master gets rebuilt after each push, before subsquent PR buillds) dont do deltas for the moment with metrics/sizes json must be nested What we have now: https://riot-graphs.snt.utwente.nl/d/Kqo15swWz/merge-statistics?orgId=1&from=now-1y&to=now TODO: We should define a scheme for keeping the data At least nightlies Koen/Kevin - [ ] Seperate flash/test vs build conditions (this way the test nodes don't need to have everything needed for the build) -> RUST -> Maybe think about seperating out flash -> OnFlasher - [x] Support emulator tests (more than native, renode/qemu) -> Alex has [PR for generic emulator](https://github.com/RIOT-OS/RIOT/pull/15512) -> Problem is that not every aspect of the mcu is supported -> Cpu usage is an issue (renode) -> Add renode/qemu to riot docker -> Can be done -> Maybe whitelist of applications -> Alex - [x] Allow of external test repositories -> Fixed with frontend -> Might need to extend dwq to support this -> Kevin/Kaspar - [ ] Console access to HiL/CI boards or test env -> Kevin -> maybe too much security risk and/or time - [x] Launch builds/tests from local checkout (done for single tests, but not documented) -> Kaspar to document - [x] Prioritize builds / change order -> Useful for fix pull request -> Depends on frontend -> Use [priority labels](https://github.com/RIOT-OS/RIOT/issues/10057)? - [ ] Choose build container per-branch or per-PR -> docker build version selection -> Specify the version on master -> Need to version riot docker container -> Depends on frontend - [x] Don't run static tests on Murdock -> Split static checks -> issue is rebase to master logic -> static checks should not effect murdock -> Skip static tests in murdock, make github static checks a requirement **We will pick this up another day** 10.12.2020 @ 10:30am CET - [x] Cut PR checks (needs squashing, commit comments length) out of the static tests (move to check-labels actions?) ([#15564](https://github.com/RIOT-OS/RIOT/pull/15564)) - [ ] only allow Murdock when static tests are green -> controversial, @kaspar030 says it changes workflow too much - Kaspar voiced concerns about working on the same master branch / merge commit across all CI steps / parts (possibly racey, build could run on different data than static tests, etc.) - [x] Parallelize static tests? ([#15563](https://github.com/RIOT-OS/RIOT/pull/15563)) - [x] Static checks should not rebase, merge, or fetch on their own - [x] "make static-tests"should use "dist/tools/ci/static_tests.sh", drop "build_and_test.sh" - [x] Only compile when files changed compared to last successful result (cache `git ls-tree --full-tree -r HEAD | sha256sum` as done for test results), would skip re-build on squash or label change -> Martine will tell Kaspar how to do it with just git -> Martine: Get the Git-tree's SHA from a commit: `git cat-file commit HEAD | grep "^tree"` - [ ] Move Murdock label check *to the end*, fetch then-current labels (allows removal of eg, "PR needs squashing" after build has started) -> taken care with other frontend - [x] Capability to describe different build environments for tests / application-specific build matrix (see [#14669](https://github.com/RIOT-OS/RIOT/issues/14669)) **10 min pause -> 11:30 resume** ## Deduplicate Work Solved with team formation and documentation. ## Using `riotctrl` for Tests - Martine already [started for `tests/lwip`](https://github.com/RIOT-OS/RIOT/pull/14874). Still needs some fluff - riotctrl can be responsible for app name, version, commit etc. ## Integrate HIL RobotFW-Tests -> firmware and python interface to RIOT but leave PHiLIP/DUT env and robot tests in RobotFW-Tests. How to trigger RobotFW-tests on RIOT PR? - label -> Kevin ## Progress / tracking pad: https://hackmd.io/XLmOIKmtTACOWTBs2U9Yzg