# EESSI hackathon (Jan'22) - general notes - **when**: week of 17-21 Jan 2022 - **main goal**: focused effort on various tasks in EESSI - **expectations**: - joining kickoff/sync/show & tell meetings - spending a couple of hours that week on one or more of the outlined tasks (in group) - take extensive notes (to integrate into documentation later) - registration: https://doodle.com/poll/z74qi2gcvau69kgr - **main communication channel: EESSI Slack - join via https://eessi-hpc.org/join** ## Links * [Dec'21 hackathon notes](https://github.com/EESSI/meetings/wiki/EESSI-hackathon-Dec'21) * **GitHub repo for EESSI hackathon(s): https://github.com/EESSI/hackathons** * show & tell shared slide deck: https://docs.google.com/presentation/d/1Ok5l1x1EydHg0ptsglSkPKOImPTHp_kFTvRQBDq-VjY ## Meetings - **Mon 17 Jan 2022, 09:00 UTC**: kickoff - clarify expectations - overview of tasks - getting organised: who works on what, form groups - **Wed 19 Jan 2022, 09:00 UTC**: sync - sync meeting notes: - quick progress report per group - briefly discuss next steps - notes: https://hackmd.io/9yX5_kLeTJKbStedlX2jWg - **Fri 21 Jan 2022, 13:00 UTC**: show & tell - each group briefly demos/presents what they worked on - outline follow-up steps - slides: *(coming soon)* - recording: *(coming soon)* ## Attendees ***If you plan to actively participate in this hackathon:*** - add your name + affiliation + GitHub handle below (or ask someone to do it for you) - feel free to pick **ONE** task you would like to work on, add your name to the list for that task (see `people working on this`) Joining: - Kenneth Hoste (HPC-UGent) - `@boegel` - Thomas Röblitz (HPC-UBergen) - `@trz42` - Bob Dröge (HPC-UGroningen) - `@bedroge` - Martin Errenst (University of Wuppertal) - `@stderr-enst` - Bartosz Kostrzewa (University of Bonn) - `@kostrzewa` - Terje Kvernes (University of Oslo) - `@terjekv` - Jacob Ziemke - Jure Pečar `@jpecar` - Hugo Meiland - Frank Everdij (Delft University of Technology) - `@frankeverdij` - Jörg Sassmannshausen - Michael Hübner - Maicon Faria (HPCNow!) ## Available infrastructure **Please use the virtual clusters we have set up for this hackathon!** * EESSI pilot repository is readily available * Different CPU types supported * Singularity is installed ### Magic Castle clusters in AWS + FENIX + Azure * managed by Alan * all info at https://github.com/EESSI/hackathons/tree/main/2022-01/magic_castle ### Cluster-in-the-Cloud in AWS * managed by Kenneth * all info at https://github.com/EESSI/hackathons/tree/main/2022-01/citc ## Communication If you need help, contact us via the [EESSI Slack](https://eessi-hpc.slack.com) (join via https://www.eessi-hpc.org/join) General hackathon channel: [#hackathon](https://eessi-hpc.slack.com/archives/C02NB46EK9P). **See also task-specific channels!** ## Selected tasks & task teams A subset of main tasks was selected for this hackathon: * [02] **Installing software on top of EESSI** * task lead: Kenneth * participating: Kenneth, Jacob, Frank, Martin * notes: https://hackmd.io/sLBLV7RDQdmyYfh1rYHGSQ * Slack channel: [#hackathon-software_on_top](https://eessi-hpc.slack.com/archives/C02PXMGTD46) * Zoom: https://uib.zoom.us/j/63932793319?pwd=Ry9BQ3VUaHcwTncwSFFrdU1PWnF3QT09 * subtasks: * [02.1] document how to install software on top of EESSI * with EasyBuild * manually (depends on [02.2]?) * [02.2] implement support in EasyBuild to install RPATH wrapper scripts along with compiler (see https://github.com/easybuilders/easybuild-framework/issues/3918) * [02.3] standalone script to install software on top of EESSI * [03] **Workflow to propose additions to EESSI software stack** * task lead: Bob * participating: Bob, Kenneth, Jörg * notes: https://hackmd.io/6V91CHRWRtuutANPaZRVPw * Slack channel: [#hackathon-contribution_workflow](https://eessi-hpc.slack.com/archives/C02Q6LJBJ9J) * Zoom: https://uib.zoom.us/j/69105180487?pwd=YldPU2ZPYWRGV1duV2JaV082MEVJdz09 * subtasks: * [03.1] Use EESSI build container for the actual software builds * [03.2] Monitor build/test jobs, handle failures * [03.3] Let the app pick up the logs and tarball(s) * [03.4] Reply results/failures back to PR (success/failure comment, logs as gist, ...) * [03.5] Support more architectures * [03.6] Add dependency structure between subtasks, e.g. tests should be triggered when build has succeeded * [03.7] Handle many more events * [03.8] Prepare slides for [Easybuild User Meeting talk](https://easybuild.io/eum22/#eessi-workflow) * [05] **GPU support** * task lead: Alan * participating: Alan, Bartek, Michael, Maicon, (Hugo?) * notes: https://hackmd.io/47FAwaeWRi66tdiqjy2Zvg * Slack channel: [#hackathon-gpu_support](https://eessi-hpc.slack.com/archives/C02Q4DJT7J7) * Zoom: https://uib.zoom.us/j/62193929082?pwd=em0wMUp4enorKzZIUkI5MWJqMDY2QT09 * subtasks: * [05.01] Install GROMACS, OSU Microbenchmarks, TensorFlow on top of foss/2021a + CUDA for skylake + zen2 * [06] **EESSI test suite** * task lead: Thomas * participating: Thomas, Hugo * notes: https://hackmd.io/wx2hjHiWQnmkERSVR2-a2A * Slack channel: [#hackathon-test_suite](https://eessi-hpc.slack.com/archives/C02QH1XVAKT) * Zoom: ask Thomas if you need one * subtasks: collection of ideas/todos from last hackathon * [06.1] add tests for compat layer (e.g. https://github.com/EESSI/compatibility-layer/issues/152) * [06.2] repeatedly run existing tests at different places (hackathon resources, your HPC, your machine, ...) * [06.3] integrate tests in CI pipeline (may benefit from [06.1], [06.2]) * [06.4] guidelines/cookbook on how to develop application tests * [06.5] document how to run test suite for CI/monitoring (likely depends on [06.3] or [06.8]) * [06.6] develop tests for results by other tasks (e.g. GPU support) * [06.7] decide on resourcedir setup to handle large inputs (WRF), finish WRF and create/merge PR * [06.8] many more tests...... ("steal" from other places, e.g. CSCS ReFrame repo) * [07] **Monitoring** * task lead: Terje (?) * participating: Terje (?) * notes: https://hackmd.io/YWDG2GO5R3Sm3wS1SpvYrg * Slack channel: [#hackathon-monitoring](https://eessi-hpc.slack.com/archives/C02Q1S6484X) * Zoom: ask Thomas if you need one * subtasks: * [07.1] Finalize CVMFS monitoring * Should involve role-based install of filesystem-layer for any node * Working dashboards out of the box would be... nice. * <span style="color:red">**PROBLEM:**</span> [cvmfs_exporter](https://gitlab.cern.ch/cloud/cvmfs-prometheus-exporter/) only works if the CVMFS volume is mounted. On Stratum1, that ain't happening. We will need (to write?) a monitoring tool with role comprehension. * [07.2] Decide on naming scheme for Ansible repos and the roles shipped * [07.3] Should probably migrate to the Ansible role repo to a collection * Today's solutions works. Collection migration should probably be a task onto itself, depending on time. * [07.4] Move repos to EESSI * [08] **Setting up a private Stratum-1** * task lead: ??? * participating: ??? * notes: https://hackmd.io/TyNaBqgkTHGjqwNZcasx5Q * Slack channel: [#hackathon-monitoring](https://eessi-hpc.slack.com/archives/C02Q1S6484X) * Zoom: ask Thomas if you need one * subtasks: * [08.1] develop ReFrame tests to verify that a Stratum 1 server is configured correctly (see https://github.com/EESSI/filesystem-layer/issues/111) * [08.2] setup a new Stratum 1 server, make some intentional mistakes (?), and use the tests from [08.1] to detect the misconfiguration ... fix them and rerun tests * [16] **Export a version of the EESSI stack to a tarball and/or container image** * task lead: Jure * participating: Jure * notes: https://hackmd.io/2YpzQGgUSDyTvW3ILulzwA * Slack channel: [#hackathon-export_software_stack](https://eessi-hpc.slack.com/archives/C02QH2H8TH7) * Zoom: ask Thomas if you need one * subtasks: * [16.1] investigate if/how variant symlinks can be included * [16.2] make script developed during last hackathon available & document its use * [16.3] integrate existing ReFrame tests to simplify validation of running a container elsewhere * [16.4] investigate if/how PIDs (persistent identifiers, e.g. DOI) could be attached to a container image ## TODOs - shared presentation for show & tell on Friday - https://docs.google.com/presentation/d/1Ok5l1x1EydHg0ptsglSkPKOImPTHp_kFTvRQBDq-VjY