# EasyBuild maintainer summit 2021 (Tue-Wed Sept 28-29 2021) ## Attendees #### Day 1 - Åke - Alan - Alex - Bart - Bob - Caspar - Damian - Davide - Fotis - Kenneth - Lars - Miguel - Mikael - Sam - Sebastian - Simon #### Day 2 - Adam - Åke - Alan - Alex - Bart - Bob - Fotis - Kenneth - Lars - Miguel - Mikael - Sebastian - Sam - Simon ## Topics ### Missing contributions in upstream forks - raised by Alan - mostly due to visualisation stuff that's done very differently at JSC - creates problems w.r.t. contributing back - OpenGL wrapper around Mesa (next to X11) - plan was to create a PR for that (but person responsible didn't find time yet) - (Mikael) quite a complex bundle of stuff, could/should be broken up? - (Alan) some additional magic on top to ensure right stuff gets used - LINK TO PR? - (Alan) also: Julia (see CSCS setup) - not upstream yet, also used at JSC - (Bart) work done by CSCS involved too much custom stuff, we ended up with this: - ```# : means the default path, this way extension modules can append to it modluafooter = ''' append_path("JULIA_DEPOT_PATH", ":") append_path("JULIA_LOAD_PATH", ":") '''``` - (Bart) there is a JuliaPackage easyblock to install Julia packages as extensions: https://github.com/easybuilders/CSCS/tree/master/easybuild/easyblocks has three julia-related easyblocks: `julia.py`, `juliapackage.py`, and `juliabundle.py`. - (Damian) also Jupyter - (Kenneth) why aren't these things not being contributed back? - (Damian) partially due to differences in dependencies in local setups - sticking closer to upstream could help - people are mostly focused on what's needed for own site - (Bart) ComputeCanada has a couple of customized easyblocks - contributing those changes back takes time and effort - internal stuff is sometimes not "good enough" for upstream - example: GPU offloading in GCC, used to live in custom easyconfigs, now integrated in main gcc.py easyblock, so available by default in GCCcore. - (Kenneth) could pairing up with a maintainer from another site help to get stuff upstreamed? - Bart + CSCS for Julia? - Mikael + JSC for visualisation stuff? - once you fork, it gradually becomes more difficult to contribute back - "Perfect is the enemy of good" - (Fotis) see easybuild.experimental repository - (Sebastian) asked people at JSC to open PRs to upstream stuff - nobody actually did, only got requests to help out with other stuff - there's definitely a barrier there... - people are not familiar with GitHub integration (they're more familiar with GitLab) - extending GitHub integration with support for GitLab could help - quite different toolchains at JSC - partially due to Parastation MPI (could be custom toolchain upstream) - different names/versions for standard toolchains like gompi/gomkl - raises the bar for easy contribution upstream - (Damian) differences in toolchains grew historically - toolchain versions are tied to JSC stages - (Damian) JSC could work on sticking closer to upstream toolchain definitions - toolchains seems to be more up-to-date upstream currently? - (Caspar) similar problems at SURFsara before - changing internal workflow a lot helped - open PRs more quickly, use --from-pr to install (don't wait for merge) - any reason why JSC doesn't take this approach? - (Damian) probably just workflow we're used to - new stage is an opportunity to change this and increase overlap with upstream - (Sebastian) also thinking about custom toolchain for AMD - AOCC, BLIS/libFLAME, ... - (Damian) biggest concern is easyconfigs using compiler-only toolchain like GCCcore - (Damian) things like SciPy-bundle are handled differently in JSC to avoid using MPI toolchain - (Mikael) relates "diamond" toolchains with compiler+BLAS/LAPACK subtoolchain (no MPI) - (Bart) FFT MPI is actually rarely used - (Åke) VASP? - (Bart) checked in detail, seems not - could be a separate package with only FFTW wrappers for MKL - (Damian) MKL is installed in two places: system + full toolchain (incl. FFTW wrappers) - same version, so doesn't really cause problems - (Kenneth) can we make changes centrally to make it easier for sites to contribute back? - (Caspar) more structured/faster discussion on new toolchains? - they're very fundamental, quicker turnaround time needed there - (Damian) UCX is at system level at JSC (along with CUDA) - so using system compiler to build UCX - also related to Parastation, where vendor tests with OS compiler - (Åke) hooks are the place to implement more or less simple site-specific changes - could be done such that no changes are needed to easyconfig, so they can be sent upstream easily - (Alex) two types of divergence: easyconfigs vs toolchains/easyblocks - too hard to follow easyconfig differences - chasing down more significant changes, such as easyblock and toolchain differences, is more feasible - (Kenneth) working group on defining common toolchains - try to stick to original timeframe (<year>a by Jan, <year>b by July) - should we decouple version of GCCcore subtoolchain from GCC version? - allow easier updating to different GCC while still in `develop` - allow easier divergence for sites that want to - already done at JSC to some extent with MPI with module naming scheme (OpenMPI/4.1) - does create some tension with reproducibility - (Damian) plan to use FlexiBLAS for future (foss) toolchains? - what about ScaLAPACK and FFTW? - any problems with mixing BLIS for BLAS (via FlexiBLAS) vs FFTW via MKL - different library names, so they don't conflict - (Bart) planning to look into support for MKL backend for FlexiBLAS - (Alan) FlexiBLAS provides a lot of flexibility - MPI thing doesn't seem to be as mature? - Alan reached out to them, no response? ### Keeping up with incoming contributions #### Easyconfig PRs - some PRs take significantly more time - new contributors - complex software - will most likely break our easyconfig PR record again this year - >85% are through our GitHub integration - about 500 open for last few months - >100 different contributors for those...so no silver bullet - allow reviewer to make "trivial" changes to PRs to get them merged - fixing code style issues, adding sanity check command, ... - anything that doesn't have impact on how the software is installed - contributors usually don't mind - auto-closing inactive PRs => stalebot GitHub Action (https://github.com/probot/stale) - right timeframes? - auto-tagging PRs as "stale" - be stricter about PRs using system toolchain? - relies on OS dependencies, limited value in EB ecosystem - active easyconfig maintainers are ~10 - finding more people with the right skills and level of attention is _hard_ - would it have helped to have a "rulebook" for maintainers? - (Sebastian...as a recently integrated maintainer) didn't feel that would be necessary - Had contributor experience - Attended biweekly meeting - Open to asking other maintainers question - (Caspar) would be nice to have a checklist for merging PRs - Can take time to get back into things when you have not done maintenance for a while - The "rules" may change over time (cfr. CUDAcore/CUDA, Python versionsuffix) - (xx) make the bot spew out a --review-pr output - only comparing against most similar easyconfig in develop - deprecate toolchains older than 2019 - make EB produce a warning - start closing PRs that use the toolchains - make --new-pr ask the user to submit a test report? - let maintainer mark a PR as approved for testing - let bot auto-test in "standard" environments - have bot mark PR's stale - do we need extra labels to make it easier to find PRs to work on? - single 'status' label, partially managed automated by the bot? - extra labels to mark PR ready for auto testing by generoso? - any maintainers not using Octobox should be used yet - https://octobox.io/ - CI should over patches with no comments on top - `--new-pr` can/should check a couple of common things (checksum, top comment patch) - trigger an automatic `--check-contrib`? - does not do exactly the same as the CI... but we should fix that! - test suite errors could be improved sometimes - see failure message when code style check fails - auto-reply from bot with checklist of things for contributor - request test report after 1 day (if CI is passing) - (Adam) Real application testing - sanity check commands are only scratching the surface - test job on generoso - ReFrame? - work is being done on a shared library of tests - buildtest? - collection of links to application benchmarks: https://github.com/boegel/scicobe (needs to be updated) - (Sam) more collections: https://c4science.ch/source/scitas-examples - would require input from application developers/experts? - examples: - working correctly - some cases raised by @Flamefire - performance issues - NAMD - TensorFlow: https://github.com/easybuilders/easybuild-easyblocks/issues/2577 - Fotis: PRACE benchmark apps https://repository.prace-ri.eu/git/UEABS/ueabs #### Easyblocks/framework PRs - Reviewing is (usually) more time-consuming - Should we have a CI check that verifies that a new feature appears in documentation? - Need to document requirements for maintainers to know when it's ok to merge - Would training help? - It's hard to find your way around - An overview of the structure - Workflow from eb command to parsing easyconfig to easyblock, etc. - People who could help out with this (in order of being familiar with framework): Kenneth, Alan, Bart, Åke - potential topics - workflow from eb command to installation - general overview - example of implementing new configuration option (+ test to go along with it) - toolchain support - overview of framework tests - session on reviewing PRs perhaps - implementing easyblocks refresh of https://easybuilders.github.io/easybuild-tutorial/2021-lust/implementing_easyblocks - the QA system and its features - (Alex) sometimes reviewer requests additional changes that are "out of scope" for that PR - What about easyblocks? - What are the requirements for test reports of easyblock PRs? - You cannot (currently) ask the bot to test an easyblock PR - Nice to know what easyconfigs are touched by an easyblock - `ambertools` was a case that used the `amber` easyblock and broke when this got updated - bot could check this for non-generic? - Have a webpage or similar function where you can enter a easyblock name which will then generate a list of which other easyblocks uses that as its base ### Keeping up with incoming issues - Slack is not searchable so we should keep away from using it for issues - (Adam) issue template is good thing - (Kenneth) should be optional - add suggestion to ask on Slack in issue template - Things to request in template - eb --show-config - name of easyconfig - ... - Add to docco to use the general easybuild repo for new issues, we will move them into the correct repo - Document better how to interpret build logs and how to find the actual problem - Search for _step etc - See troubleshooting in the docco - (Kenneth) arch-tagging? will it help - (Kenneth) Maintainer sprint sessions bi-weekly, on non-eb-bi-weekly weeks - ### Revamp of EasyBuild documentation - Docs not being updated when we add features (in framework in particular) - Current syntax is RST, differences with markdown are enough to be annoying - Workflow with readthedocs is also a bit annoying if you want to preview - Tutorial uses mkdocs...and is in a separate tutorial - Allows for easy and instant local preview - Should we also be hosting the docs on GitHub? - Fotis: GitLab does direct rendering of .rst - Alan: but the Sphinx stuff it won't deal with - Move to another format is the biggest jump - rst-to-myst looks like a good help here - Starting point - create a repo and do a page or two - then do a cry for help, looking for volunteers for 1 page at a time - Need a decent starting point - Should have decent CI and contribution docs in place - Will also need to port automated docs - ACTION: Look for volunteers to help kickstart this ### Support and testing on non-x86_64 platforms - ARM and POWER are secondary platforms - Don't hold back a PR if Intel/AMD work - Open an issue though for non-working archs - Do have access to both archs (ARM through EESSI) - Doing these checks introduces additional latency - Don't have to require these, but can add the capability to the bot to requests there - Can we do a Gentoo-style tagging so we know what works where - Keeping track of this in easyconfigs is a maintenance nightmare - Can use regression tests to at least document this - Alex: what about blacklisting stuff - keeping track of known issues - in easyconfigs? maintenance burden - from regression tests as part of the release in the same way that indexing is done? - Will delay a release - Could do this step afterwards as part of the docs - Still need the ability for EB to pick that up - Would documentation be a better place to keep track of known issues? - cfr. FlexiBLAS trouble on POWER - Consensus on: - treating Arm and POWER as secondary platforms - don't block PRs because of test failures on Arm/POWER - document known issues (on non-x86 platforms) and let 'eb' pick them up and print warnings? - Can we get some (cloud) resources to support arch testing? - JUSUF Cloud is perhaps an option for AMD, Sebastian will investigate - Can we ask vendors for hardware? - Admin is the challenge there - Cloud credits are perhaps a better option - Would cover ARM as well - Fotis: How do other projects manage multiple architectures builds? - see upcoming [packaging-con ](https://packaging-con.org/talks.html) - expand boegelbot to test PRs - POWER9 (emulated) at OSUOSL - Graviton2 aarch64 @ AWS (using EESSI credits) - aarch64 @ fosshost (to be requested) - AMD + GPU @ JSC (JUSUF Cloud) [Sebastian] - bot account @ Mikael's infrastructure? ### Outlook to EasyBuild 5.0 - start working on this in a 5.x branch? - Opportunity to change things we are not happy with - HMNS has a couple of issues - robot can be broken if people do not easyconfigs in the robot search path - building your own software on top of someone elses stack is cumbersome, you need to fiddle with `MODULEPATH` - HMNS is non-unique which makes handling something like `gomkl` difficult (module clashes with `foss`) - Non-unique names in HMNS for things like OpenBLAS (stuff installed with foss vs gomkl toolchain) => separate HMNS with one extra level for math libs or add versionsuffix for non-foss toolchains - (Bart) Having a bootstrap location for dependencies required for bootstrapping toolchains - (Bart) Kill incomplete implementation of support for .yeb easyconfigs (YAML syntax) - (Bart) cleanup in: - easyblocks (old software versions) - 32-bit support in framework - macOS support - Kenneth: let's not, basic functionality works - Deprecating Python 2.7 makes sense but dropping support right now is probably too much - Can also deprecate 3.5 - What about Lmod? - Deprecating Lmod 7 might be a good idea - Can we default to use `depends_on` for dependencies (with Lmod)? - Drop support for ancient Tcl-only implementation? - new features for 5.0 - (Alex) versionless dependencies, let EB use what's available - (Åke) not a fan of this... - (Kenneth) adding the feature in framework and using it in easyconfigs in the central repo are two different things - Can we leverage some of the code of `--try-update-deps`? - (Sam) support for only specifying partial versions (Python 3.6.*) - should that be reflected in generated module file, or not? - (Alex) separate metadata files for easyconfigs (homepage, checksums, etc) - metadata and checksums should be separate files (due to updates needed in PRs) - (Kenneth) could help with maintenance? - (Mikael) will it really? - having things across multiple files may cause trouble in some contexts - example: checksum added in checksums.json, only copying easyconfig file - (Fotis) reduce the need for conditionals in easyblocks - missing feature that results in lots of if statements - some kind of lookup table to avoid if/else blocks - exploiting repetitive patterns can be indication of a missing feature - (Simon) cleaning up of easyconfigs for old software versions - cfr. bintray cleanup - mostly stuff with system toolchain? ### Code-of-conduct for EasyBuild community - Contacts - separate committee - documented - group contact + individual contacts ### New generoso cluster at CSCS - Most of the details are in the slides - Bot is currently installing in shared space - Should do an install in `/tmp` first - Should automatically do the installation in the shared space _after_ the PR is merged - automation is hard as there are a lot of corner cases - can we use singularity to do the installation in an overlay? - EESSI uses fuse overlay (not singularity overlay) - would like to start testing in singularity containers - should create a repository where we define these test environments (hosted on GitHub) - will allow us to test in multiple environments/OSes - (Åke) have Ubuntu Focal minimal containers available - would like to have access to logs ### Actions items - Crush the curve of open (easyconfig) PRs - Deprecate old toolchains (< 2019) - Close PRs for deprecated toolchains - Set up bot to tag/auto-close stale issues/PRs (https://github.com/probot/stale) - Try to empower contributors more to make PRs ready - Improve errors for failing CI tests (cfr. code style for easyconfigs) - Make CI fail over common issues (like missing comments on top of patches) - Make life of maintainers easier - document requirements to merge PRs (in different repos) - Make boegelbot add comment with output of `eb --review-pr` (single easyconfig) - auto-label with PR status (new, CI passes, ) - working group for migrating docs to mkdocs - early starting point: see `mkdocs` branch in https://github.com/easybuilders/easybuild - expand farm of test platforms (boegelbot) - code-of-conduct - Alan & Kenneth follow up - odd number of committee members (3?) - PR for code-of-conduct that all maintainers should agree on - EasyBuild 5.0 - project to track progress on major targets to tackle for EasyBuild 5.0 - set up 5.x branches - issue template for reporting bugs/questions/...