# INUITS - EESSI ## Lessons learned (TODO Hafsa: what worked, what didn't work, good/bad things, ...) * Readme was really helpful to understand the software layer and to set up the EESSI bot components. * The regular meetings and discussions helped a lot with the progress and the notes after every meeting were also helpful to keep track of everything. * The reviews on the PR were always detailed and clear (they were also explained in the meetings) which made it easy to make changes accordingly. * most challenging: git/GitHub * most learned: Git, merge conflicts, Python --------------------------------------------------------------------------------------------------- ## Next meetings - ... --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 2 Jan 2023 11:00 CET - 15:00 PKT - OK for Hafsa, Thomas, Kenneth ### Meeting agenda - PR #131: very close to being finished - PR #132: looks OK - PR #136: we should avoid repeating the same code, and implement a `retry` function instead? - things to improve in bot - make sure that it's clear what the status is of a build (still running vs bot crashed, etc.) - support a way to ask the bot for a status update - status page of all bot instances --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 28 Dec 2022 10:00 CET - 14:00 PKT - OK for Hafsa, Thomas - Kenneth unsure ### Meeting agenda - PR #130 - Suggest to add a comment that config is read to raise an exception early when the event handler starts. Line 120 event handler and line 577 in job manager. - Remove new log method from job manager - (Thomas) create issue about reading config multiple times (see https://github.com/EESSI/eessi-bot-software-layer/pull/130#pullrequestreview-1230996141) - PR #131 - issue #125 (dedicated log method) - Do not read config in log (event handler). Write an __init__ method which is similar to the one for the job manager. The PyGHee needs to be initialized though. - Addressing failing tests "KeyError" (see https://github.com/EESSI/eessi-bot-software-layer/actions/runs/3790115473/jobs/6444477781) - Deposit a basic app.cfg in tests directory - `[job_manager]` - `log_path = some_relative_path` (without a leading `/`) - Or create that file before initializing the job manager instance. - PR #132 - issue #20 (retry communication with GitHub if first attempt failed) - A few minor changes should be implemented. See PR comments. - Suggest to add addressed issues in PR description and mention that one case of communicating with GitHub is improved. - issue #30 (handle connection error to GitHub gracefully) - pick any other communication with GitHub (e.g., tools/pr_comments.py) and add `try/except` blocks and also `tries` loop - next issues - issue #110 (make log method print function name that called it) --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 21 Dec 2022 10:00 CET - 14:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - PR #116 merged - PR #128 is replaced by PR #130 (for issue #112) - validating whether all required config settings are provided (like in Thomas' PR #85) can be done in a separate PR - see PR review for PR #130 - also use `read_config` in job manager - next issues - issue #125 (dedicated log method) - issue #110 (make log method print function name that called it) - issue #30 (handle connection error to GitHub gracefully) - issue #20 (retry communication with GitHub if first attempt failed) --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 19 Dec 2022 09:00 CET - 13:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - PR #116 (added comment in pr for showing running state of the jobs) - only missing docstring in `process_running_jobs` function, then good to merge - PR #123 (improve start of the app) - merged! - PR #124 - merged! - PR #128 for issue #112 - see suggestions in PR review - next issues - issue #125 (dedicated log method) - issue #110 (make log method print function name that called it) - issue #30 (handle connection error to GitHub gracefully) - issue #20 (retry communication with GitHub if first attempt failed) --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 12 Dec 2022 11:00 CET - 15:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - PRs - #116: add missing docstrings - #123: see PR review - #124: see PR review - next issues - #112 (generic read_config function) - #125 log method --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 07 Dec 2022 14:00 CET - 18:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - Merged PR #83 PR comments - PR #116 added comment in pr for showing running state of the jobs - addresses issue #27 - determines running jobs and adds row(s) to PR comment - missing: only add row once - for this the comment body needs to be scanned - Next issues to look at - issue #9 improve information printed at start of event handler (and job manager) - for now only change the eessi-bot-software-layer code (later changes maybe ported to PyGHee) - issue #97 (permission to trigger builds) - issue #112 (generic read_config function) --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 05 Dec 2022 10:00 CET - 14:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - EESSI monthly update meeting - Merged PR #80 argparse - Merged PR #84 username - PR #83 PR comments - closed PR #82 - Conflicts are resolved - few minor issues will be resolved (Thomas forgot to submit the review and Hafsa couldn't see them) - issue #27 print more information about state changes of jobs - when a job starts running a line to the PR comment is added - 1st step: determine running jobs and log information (not harmful to log everytime the job manager runs the main loop) - maybe need to run squeue or parse its output for 'RUNNING' - also consider to update/change current function for obtaining current jobs - seems not needed as state is already included in returned data structure - probably need to define a new function, for example, determine_running_jobs(self, current_jobs) - plus a second function which processed running jobs - 2nd step: update PR comment once - bonus 1: update every hour (or make update frequency configurable) - bonus 2: report about time left for the job - Next issues to look at - issue #9 improve information printed at start of event handler (and job manager) - for now only change the eessi-bot-software-layer code (later changes maybe ported to PyGHee) --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 30 Nov 2022 10:00 CET - 14:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - EESSI monthly update meeting tomorrow - PR #80 argparse - fix unusued import, then ready to be merged - PR #83 PR comments - close PR #82 - try to address comments in PR (more generic function, docstrings) - add logging output to figure out why the more generic function doesn't work - PR #84 username - looks fine, can be merged as is - Next issues to look at - issue #9 improve information printed at start of event handler (and job manager) - for now only change the eessi-bot-software-layer code (later changes maybe ported to PyGHee) - issue #27 print more information about state changes of jobs - when a job starts running a line to the PR comment is added ------------------------------------- -------------------------------------------------------------- ## Sync meeting Mon 28 Nov 2022 10:00 CET - 14:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - PR #80 argparse - https://github.com/EESSI/eessi-bot-software-layer/pull/80 - few minor changes to make, then ready to merge PR - draft PR #82 - addresses issue #32, function for identifying PR comment to be updated - https://github.com/EESSI/eessi-bot-software-layer/issues/32 - avoid getting commits into PR which don't belong to the PR by using the following procedure ``` git clone https://github.com/YOUR_GH_ACCOUNT/eessi-bot-software-layer identify-pr-comment cd identify-pr-comment git branch identify-pr-comment git checkout identify-pr-comment ``` - discussed draft PR - add parameter for search_pattern - move `return None` to correct indentation level - capturing such unintended changes could be covered by some tests? - rename (?) file `github.py` to `pr_comments.py` to avoid confusion with `connections/github.py` - add missing `import` - until next meeting: - try to finish two PRs as they are - if time, either add tests for them or find a new issue (looked a bit into issues but didn't find an obvious one to continue) --------------------------------------------------------------------------------------------------- ## Sync meeting Thu 24 Nov 2022 10:00 CET - 14:00 PKT - OK for Kenneth, Hafsa, Thomas ### Meeting agenda - PR #78: should be good to merge - https://github.com/EESSI/eessi-bot-software-layer/pull/78 - draft PR #80 to rework option parser - https://github.com/EESSI/eessi-bot-software-layer/pull/80 - next issue for Hafsa: https://github.com/EESSI/eessi-bot-software-layer/issues/32 --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 21 Nov 2022 10:00 CET - 14:00 PKT - OK for Kenneth, Hafsa (Thomas excused) ### Meeting agenda - problem with bot tripping over unknown jobs fixed by cleaning up 'jobs' directory - PR #78 - run_cmd needs a log_file argument to make sure logging is done in correct file - enhance existing test for run_cmd to check log_file functionality - issue #25 - open draft PR for parse_common_args - ArgumentParser.parse_args will fail if it finds unknown options => problem for parse_common_args... --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 16 Nov 2022 10:00 CET - 14:00 PKT - OK for Thomas, Kenneth, Hafsa ### Meeting agenda - problems for Hafsa with connecting to the AWS cluster - test with another SSH server? - Kenneth will look into getting Hafsa a VSC account to access login.hpc.ugent.be - PR #63 merged - PR #70 adds flake8 check in CI (issue #64) - should be almost ready to go now, only small style fixes needed still - next: - issue #33: follow-up by Thomas - issue #61: run_cmd everywhere => Hafsa - after: - issue #25: dedicated parse_args function => Hafsa - incl. tests for the parse_args function(s) - issue #32: function to locate PR comment to update => Hafsa - issue #9: cleaner 'start' method for app => Hafsa - issue #20: retry communication with GitHub if it fails => Hafsa - issue #38: add unit tests for existing functions => Hafsa - this can be done by only touching files in test/ directory - for `task/build.py` functions: - should be easy: `create_pr_dir`, `get_build_env_cfg`, `create_metadata`, - a bit more difficult (talks to GitHub): `download_pr`, `setup_pr_in_arch_job_dir` - mroe difficult (talks to Slurm): `submit_job`, `submit_build_jobs` --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 14 Nov 2022 10:00 CET - 14:00 PKT - OK for Thomas, Kenneth, Hafsa ### Meeting agenda - working on issue #33 - see PR #63 (fix_bug branch) - next issues (use a separate branch!) - issue #25: dedicated parse_args function - issue #32: function to locate PR comment to update - issue #61: run_cmd everywhere - issue #64: flake8 (Thomas) --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 9 Nov 2022 14:00 CET - 18:00 PKT - OK for Thomas, Kenneth, Hafsa ### Meeting agenda - done - run_cmd PR merged - refactoring PR merged - started new PR to make bot ignore non-bot jobs - for future PRs: start a new branch from latest `main` branch ``` git checkout main git pull origin main # clean up old branches git branch -d refactoring # create new branch for new PR git checkout -b example_branch git add ... git commit ... git push YOUR_FORK example_branch ``` - next - issue #25: dedicated parse_args function - issue #32: function to locate PR comment to update - issue #61: run_cmd everywhere --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 7 Nov 2022 10:00 CET - 14:00 PKT - OK for Thomas, Kenneth, Hafsa ### Meeting agenda - follow-up on current issues + PRs - 1) finish PR #56 (run_cmd) - 2) open PR for additional refactoring (issue #53) - 3) open PR for issue #33: avoid that bot crashes on non-bot jobs - then follow up on priority:high issues - Thomas - look into new PR on top of main for `deploy` support - new PR for deploy (+ eessi-upload-* script?) - or maybe to new EESSI/tools repo? - new PR for resubmit - get status from Jakob on issue #6 (error reporting) --------------------------------------------------------------------------------------------------- ## Sync meeting Thu 3 Nov 2022 11:00 CET - 15:00 PKT - OK for Thomas, Kenneth, Hafsa ### Meeting agenda - EESSI meeting: 14:00 CET - 18:00 PKT - run_cmd PR - next issues --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 2 Nov 2022 10:00 CET - 14:00 PKT - OK for Kenneth, Thomas, Hafsa ### Meeting agenda - follow up on `run_cmd` PR (https://github.com/EESSI/eessi-bot-software-layer/pull/56) - code style fixes, test for `run_cmd`, suggestions in PR review - prepare slides on progress on bot for EESSI monthly meeting (Thu 3 Nov, 14:00 CET - 18:00 PKT) - https://docs.google.com/presentation/d/125xb6892Sn5FY-JzDgNjTXeGDzYSpUGJzPyN_MJ0lwE - see (current) slide 9-10 --------------------------------------------------------------------------------------------------- ## Sync meeting Fri 28 Oct 2022 10:00 CEST - 13:00 PKT ### Meeting agenda - PR #52 is merged \o/ - code style check added - see https://github.com/EESSI/eessi-bot-software-layer/pull/54 - tests/CI - see https://github.com/EESSI/eessi-bot-software-layer/pull/55 - next issues: - implementation of run function (#44) - TODO: - fix code style issues - check locally with `flake8` command, install with `pip install` - add test for `run_cmd` function - check locally with `./test.run` (install `pytest` first with `pip install`) - tackle PR review remarks - avoid crash for non-bot jobs(#33) --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 26 Oct 2022 10:00 CEST - 13:00 PKT ### Meeting agenda - problems with smee.io being unstable - alternatives? - ngrok + localtunnel - see https://docs.github.com/en/developers/apps/getting-started-with-apps/setting-up-your-development-environment-to-create-a-github-app#step-1-start-a-new-smee-channel - discuss recent PRs - refactor: https://github.com/EESSI/eessi-bot-software-layer/pull/52 - PR is almost ready, see remarks - next issues to work on: - implementation of dedicated run function (https://github.com/EESSI/eessi-bot-software-layer/issues/44) - let Hafsa present progress on the bot during next EESSI meeting? --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 24 Oct 2022 11:00 CEST - 14:00 PKT ### Meeting agenda - discuss recent PRs - recfactor: https://github.com/EESSI/eessi-bot-software-layer/pull/52 - look at next issues to work on - after refactor: dedicated `run_cmd` function https://github.com/EESSI/eessi-bot-software-layer/issues/44 --------------------------------------------------------------------------------------------------- ## Sync meeting Wed 19 Oct 2022 11:00 CEST - 14:00 PKT ### Meeting agenda - changes to rename event handler were done - PR opened https://github.com/EESSI/eessi-bot-software-layer/pull/50 - preparing to refactor, see https://github.com/EESSI/eessi-bot-software-layer/issues/36 - fixing hardcoding of `--hold` (see https://github.com/EESSI/eessi-bot-software-layer/issues/35) --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 17 Oct 2022 13:30 CEST - 16:30 PKT ### Meeting agenda - go over worflow again to develop & test the bot - questions by Hafsa - Why do jobs need to be submitted with hold? - let event handler submit jobs, but ensure that job manager is in control over when jobs start - that way, job can only start running after job manager has seen it - + we will have a clear record in the job manager log for each job (start/running/done) - workaround for lack of Slurm accounting database in CitC cluster - job manager can also run a single loop iteration + with a filter for a specific job ID - see extra command line options for job manager - currently working on: - understanding the event manager log file + corresponding code - looking into issue #37 (rename for event manager) + other issues marked with "good first issue" label --------------------------------------------------------------------------------------------------- ## Sync meeting Fri 14 Oct 2022 14:00 CEST - 17:00 PKT ### Meeting agenda - updates by Hafsa - questions on README from EESSI/eessi-bot-software-layer repository - which value should be used for - cvmfs_customizations - should be set empty ({}) - jobs_base_dir + jobs_ids_dir - $HOME can't be used, should be absolute path - get bot working in Hafsa's account on CitC cluster - success, see https://github.com/Hafsa-Naeem/software-layer/pull/1 - TODO Hafsa - check log files (pyghee.log + eessi_bot_job_manager.log) to have a clear view what is being done - try to replicate the demo using a new test PR - use this URL: https://github.com/Hafsa-Naeem/software-layer/compare/main...EESSI:software-layer:add-CaDiCaL-9.3.0?expand=1 - update README (https://github.com/EESSI/eessi-bot-software-layer/issues/47) --------------------------------------------------------------------------------------------------- ## Sync meeting Mon 10 Oct 2022 14:00 CEST - 17:00 PKT ### Meeting agenda - questions by Hafsa - things to do with Hafsa - set up account on CitC cluster using GitHub account - demo of bot with PR - next steps - set up bot according to README, see https://github.com/EESSI/eessi-bot-software-layer - notes - demo of bot done, see recorded meeting - TOD Hafsa: try and set up the bot using documentation in README file + replicate demo - account for Hafsa @ CitC has been set up - log in with `ssh hafsa@3.250.220.9` - more info at https://github.com/EESSI/hackathons/tree/main/2022-01/citc - several issues have been opened, labels have been added, see https://github.com/EESSI/eessi-bot-software-layer/issues - issues with "difficulty:easy" + "good first issue" are good starting points