# INUITS - EESSI
## Lessons learned
(TODO Hafsa: what worked, what didn't work, good/bad things, ...)
* Readme was really helpful to understand the software layer and to set up the EESSI bot components.
* The regular meetings and discussions helped a lot with the progress and the notes after every meeting were also helpful to keep track of everything.
* The reviews on the PR were always detailed and clear (they were also explained in the meetings) which made it easy to make changes accordingly.
* most challenging: git/GitHub
* most learned: Git, merge conflicts, Python
---------------------------------------------------------------------------------------------------
## Next meetings
- ...
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 2 Jan 2023
11:00 CET - 15:00 PKT
- OK for Hafsa, Thomas, Kenneth
### Meeting agenda
- PR #131: very close to being finished
- PR #132: looks OK
- PR #136: we should avoid repeating the same code, and implement a `retry` function instead?
- things to improve in bot
- make sure that it's clear what the status is of a build (still running vs bot crashed, etc.)
- support a way to ask the bot for a status update
- status page of all bot instances
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 28 Dec 2022
10:00 CET - 14:00 PKT
- OK for Hafsa, Thomas - Kenneth unsure
### Meeting agenda
- PR #130
- Suggest to add a comment that config is read to raise an exception early when the event handler starts. Line 120 event handler and line 577 in job manager.
- Remove new log method from job manager
- (Thomas) create issue about reading config multiple times (see https://github.com/EESSI/eessi-bot-software-layer/pull/130#pullrequestreview-1230996141)
- PR #131
- issue #125 (dedicated log method)
- Do not read config in log (event handler). Write an __init__ method which is similar to the one for the job manager. The PyGHee needs to be initialized though.
- Addressing failing tests "KeyError" (see https://github.com/EESSI/eessi-bot-software-layer/actions/runs/3790115473/jobs/6444477781)
- Deposit a basic app.cfg in tests directory
- `[job_manager]`
- `log_path = some_relative_path` (without a leading `/`)
- Or create that file before initializing the job manager instance.
- PR #132
- issue #20 (retry communication with GitHub if first attempt failed)
- A few minor changes should be implemented. See PR comments.
- Suggest to add addressed issues in PR description and mention that one case of communicating with GitHub is improved.
- issue #30 (handle connection error to GitHub gracefully)
- pick any other communication with GitHub (e.g., tools/pr_comments.py) and add `try/except` blocks and also `tries` loop
- next issues
- issue #110 (make log method print function name that called it)
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 21 Dec 2022
10:00 CET - 14:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- PR #116 merged
- PR #128 is replaced by PR #130 (for issue #112)
- validating whether all required config settings are provided (like in Thomas' PR #85) can be done in a separate PR
- see PR review for PR #130
- also use `read_config` in job manager
- next issues
- issue #125 (dedicated log method)
- issue #110 (make log method print function name that called it)
- issue #30 (handle connection error to GitHub gracefully)
- issue #20 (retry communication with GitHub if first attempt failed)
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 19 Dec 2022
09:00 CET - 13:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- PR #116 (added comment in pr for showing running state of the jobs)
- only missing docstring in `process_running_jobs` function, then good to merge
- PR #123 (improve start of the app)
- merged!
- PR #124
- merged!
- PR #128 for issue #112
- see suggestions in PR review
- next issues
- issue #125 (dedicated log method)
- issue #110 (make log method print function name that called it)
- issue #30 (handle connection error to GitHub gracefully)
- issue #20 (retry communication with GitHub if first attempt failed)
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 12 Dec 2022
11:00 CET - 15:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- PRs
- #116: add missing docstrings
- #123: see PR review
- #124: see PR review
- next issues
- #112 (generic read_config function)
- #125 log method
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 07 Dec 2022
14:00 CET - 18:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- Merged PR #83 PR comments
- PR #116 added comment in pr for showing running state of the jobs
- addresses issue #27
- determines running jobs and adds row(s) to PR comment
- missing: only add row once
- for this the comment body needs to be scanned
- Next issues to look at
- issue #9 improve information printed at start of event handler (and job manager)
- for now only change the eessi-bot-software-layer code (later changes maybe ported to PyGHee)
- issue #97 (permission to trigger builds)
- issue #112 (generic read_config function)
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 05 Dec 2022
10:00 CET - 14:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- EESSI monthly update meeting
- Merged PR #80 argparse
- Merged PR #84 username
- PR #83 PR comments
- closed PR #82
- Conflicts are resolved
- few minor issues will be resolved (Thomas forgot to submit the review and Hafsa couldn't see them)
- issue #27 print more information about state changes of jobs
- when a job starts running a line to the PR comment is added
- 1st step: determine running jobs and log information (not harmful to log everytime the job manager runs the main loop)
- maybe need to run squeue or parse its output for 'RUNNING'
- also consider to update/change current function for obtaining current jobs
- seems not needed as state is already included in returned data structure
- probably need to define a new function, for example, determine_running_jobs(self, current_jobs)
- plus a second function which processed running jobs
- 2nd step: update PR comment once
- bonus 1: update every hour (or make update frequency configurable)
- bonus 2: report about time left for the job
- Next issues to look at
- issue #9 improve information printed at start of event handler (and job manager)
- for now only change the eessi-bot-software-layer code (later changes maybe ported to PyGHee)
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 30 Nov 2022
10:00 CET - 14:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- EESSI monthly update meeting tomorrow
- PR #80 argparse
- fix unusued import, then ready to be merged
- PR #83 PR comments
- close PR #82
- try to address comments in PR (more generic function, docstrings)
- add logging output to figure out why the more generic function doesn't work
- PR #84 username
- looks fine, can be merged as is
- Next issues to look at
- issue #9 improve information printed at start of event handler (and job manager)
- for now only change the eessi-bot-software-layer code (later changes maybe ported to PyGHee)
- issue #27 print more information about state changes of jobs
- when a job starts running a line to the PR comment is added
-------------------------------------
--------------------------------------------------------------
## Sync meeting Mon 28 Nov 2022
10:00 CET - 14:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- PR #80 argparse
- https://github.com/EESSI/eessi-bot-software-layer/pull/80
- few minor changes to make, then ready to merge PR
- draft PR #82
- addresses issue #32, function for identifying PR comment to be updated
- https://github.com/EESSI/eessi-bot-software-layer/issues/32
- avoid getting commits into PR which don't belong to the PR by using the following procedure
```
git clone https://github.com/YOUR_GH_ACCOUNT/eessi-bot-software-layer identify-pr-comment
cd identify-pr-comment
git branch identify-pr-comment
git checkout identify-pr-comment
```
- discussed draft PR
- add parameter for search_pattern
- move `return None` to correct indentation level
- capturing such unintended changes could be covered by some tests?
- rename (?) file `github.py` to `pr_comments.py` to avoid confusion with `connections/github.py`
- add missing `import`
- until next meeting:
- try to finish two PRs as they are
- if time, either add tests for them or find a new issue (looked a bit into issues but didn't find an obvious one to continue)
---------------------------------------------------------------------------------------------------
## Sync meeting Thu 24 Nov 2022
10:00 CET - 14:00 PKT
- OK for Kenneth, Hafsa, Thomas
### Meeting agenda
- PR #78: should be good to merge
- https://github.com/EESSI/eessi-bot-software-layer/pull/78
- draft PR #80 to rework option parser
- https://github.com/EESSI/eessi-bot-software-layer/pull/80
- next issue for Hafsa: https://github.com/EESSI/eessi-bot-software-layer/issues/32
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 21 Nov 2022
10:00 CET - 14:00 PKT
- OK for Kenneth, Hafsa (Thomas excused)
### Meeting agenda
- problem with bot tripping over unknown jobs fixed by cleaning up 'jobs' directory
- PR #78
- run_cmd needs a log_file argument to make sure logging is done in correct file
- enhance existing test for run_cmd to check log_file functionality
- issue #25
- open draft PR for parse_common_args
- ArgumentParser.parse_args will fail if it finds unknown options => problem for parse_common_args...
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 16 Nov 2022
10:00 CET - 14:00 PKT
- OK for Thomas, Kenneth, Hafsa
### Meeting agenda
- problems for Hafsa with connecting to the AWS cluster
- test with another SSH server?
- Kenneth will look into getting Hafsa a VSC account to access login.hpc.ugent.be
- PR #63 merged
- PR #70 adds flake8 check in CI (issue #64)
- should be almost ready to go now, only small style fixes needed still
- next:
- issue #33: follow-up by Thomas
- issue #61: run_cmd everywhere => Hafsa
- after:
- issue #25: dedicated parse_args function => Hafsa
- incl. tests for the parse_args function(s)
- issue #32: function to locate PR comment to update => Hafsa
- issue #9: cleaner 'start' method for app => Hafsa
- issue #20: retry communication with GitHub if it fails => Hafsa
- issue #38: add unit tests for existing functions => Hafsa
- this can be done by only touching files in test/ directory
- for `task/build.py` functions:
- should be easy: `create_pr_dir`, `get_build_env_cfg`, `create_metadata`,
- a bit more difficult (talks to GitHub): `download_pr`, `setup_pr_in_arch_job_dir`
- mroe difficult (talks to Slurm): `submit_job`, `submit_build_jobs`
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 14 Nov 2022
10:00 CET - 14:00 PKT
- OK for Thomas, Kenneth, Hafsa
### Meeting agenda
- working on issue #33
- see PR #63 (fix_bug branch)
- next issues (use a separate branch!)
- issue #25: dedicated parse_args function
- issue #32: function to locate PR comment to update
- issue #61: run_cmd everywhere
- issue #64: flake8 (Thomas)
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 9 Nov 2022
14:00 CET - 18:00 PKT
- OK for Thomas, Kenneth, Hafsa
### Meeting agenda
- done
- run_cmd PR merged
- refactoring PR merged
- started new PR to make bot ignore non-bot jobs
- for future PRs: start a new branch from latest `main` branch
```
git checkout main
git pull origin main
# clean up old branches
git branch -d refactoring
# create new branch for new PR
git checkout -b example_branch
git add ...
git commit ...
git push YOUR_FORK example_branch
```
- next
- issue #25: dedicated parse_args function
- issue #32: function to locate PR comment to update
- issue #61: run_cmd everywhere
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 7 Nov 2022
10:00 CET - 14:00 PKT
- OK for Thomas, Kenneth, Hafsa
### Meeting agenda
- follow-up on current issues + PRs
- 1) finish PR #56 (run_cmd)
- 2) open PR for additional refactoring (issue #53)
- 3) open PR for issue #33: avoid that bot crashes on non-bot jobs
- then follow up on priority:high issues
- Thomas
- look into new PR on top of main for `deploy` support
- new PR for deploy (+ eessi-upload-* script?)
- or maybe to new EESSI/tools repo?
- new PR for resubmit
- get status from Jakob on issue #6 (error reporting)
---------------------------------------------------------------------------------------------------
## Sync meeting Thu 3 Nov 2022
11:00 CET - 15:00 PKT
- OK for Thomas, Kenneth, Hafsa
### Meeting agenda
- EESSI meeting: 14:00 CET - 18:00 PKT
- run_cmd PR
- next issues
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 2 Nov 2022
10:00 CET - 14:00 PKT
- OK for Kenneth, Thomas, Hafsa
### Meeting agenda
- follow up on `run_cmd` PR (https://github.com/EESSI/eessi-bot-software-layer/pull/56)
- code style fixes, test for `run_cmd`, suggestions in PR review
- prepare slides on progress on bot for EESSI monthly meeting (Thu 3 Nov, 14:00 CET - 18:00 PKT)
- https://docs.google.com/presentation/d/125xb6892Sn5FY-JzDgNjTXeGDzYSpUGJzPyN_MJ0lwE
- see (current) slide 9-10
---------------------------------------------------------------------------------------------------
## Sync meeting Fri 28 Oct 2022
10:00 CEST - 13:00 PKT
### Meeting agenda
- PR #52 is merged \o/
- code style check added
- see https://github.com/EESSI/eessi-bot-software-layer/pull/54
- tests/CI
- see https://github.com/EESSI/eessi-bot-software-layer/pull/55
- next issues:
- implementation of run function (#44)
- TODO:
- fix code style issues
- check locally with `flake8` command, install with `pip install`
- add test for `run_cmd` function
- check locally with `./test.run` (install `pytest` first with `pip install`)
- tackle PR review remarks
- avoid crash for non-bot jobs(#33)
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 26 Oct 2022
10:00 CEST - 13:00 PKT
### Meeting agenda
- problems with smee.io being unstable
- alternatives?
- ngrok + localtunnel
- see https://docs.github.com/en/developers/apps/getting-started-with-apps/setting-up-your-development-environment-to-create-a-github-app#step-1-start-a-new-smee-channel
- discuss recent PRs
- refactor: https://github.com/EESSI/eessi-bot-software-layer/pull/52
- PR is almost ready, see remarks
- next issues to work on:
- implementation of dedicated run function (https://github.com/EESSI/eessi-bot-software-layer/issues/44)
- let Hafsa present progress on the bot during next EESSI meeting?
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 24 Oct 2022
11:00 CEST - 14:00 PKT
### Meeting agenda
- discuss recent PRs
- recfactor: https://github.com/EESSI/eessi-bot-software-layer/pull/52
- look at next issues to work on
- after refactor: dedicated `run_cmd` function https://github.com/EESSI/eessi-bot-software-layer/issues/44
---------------------------------------------------------------------------------------------------
## Sync meeting Wed 19 Oct 2022
11:00 CEST - 14:00 PKT
### Meeting agenda
- changes to rename event handler were done
- PR opened https://github.com/EESSI/eessi-bot-software-layer/pull/50
- preparing to refactor, see https://github.com/EESSI/eessi-bot-software-layer/issues/36
- fixing hardcoding of `--hold` (see https://github.com/EESSI/eessi-bot-software-layer/issues/35)
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 17 Oct 2022
13:30 CEST - 16:30 PKT
### Meeting agenda
- go over worflow again to develop & test the bot
- questions by Hafsa
- Why do jobs need to be submitted with hold?
- let event handler submit jobs, but ensure that job manager is in control over when jobs start
- that way, job can only start running after job manager has seen it
- + we will have a clear record in the job manager log for each job (start/running/done)
- workaround for lack of Slurm accounting database in CitC cluster
- job manager can also run a single loop iteration + with a filter for a specific job ID
- see extra command line options for job manager
- currently working on:
- understanding the event manager log file + corresponding code
- looking into issue #37 (rename for event manager) + other issues marked with "good first issue" label
---------------------------------------------------------------------------------------------------
## Sync meeting Fri 14 Oct 2022
14:00 CEST - 17:00 PKT
### Meeting agenda
- updates by Hafsa
- questions on README from EESSI/eessi-bot-software-layer repository
- which value should be used for
- cvmfs_customizations
- should be set empty ({})
- jobs_base_dir + jobs_ids_dir
- $HOME can't be used, should be absolute path
- get bot working in Hafsa's account on CitC cluster
- success, see https://github.com/Hafsa-Naeem/software-layer/pull/1
- TODO Hafsa
- check log files (pyghee.log + eessi_bot_job_manager.log) to have a clear view what is being done
- try to replicate the demo using a new test PR
- use this URL: https://github.com/Hafsa-Naeem/software-layer/compare/main...EESSI:software-layer:add-CaDiCaL-9.3.0?expand=1
- update README (https://github.com/EESSI/eessi-bot-software-layer/issues/47)
---------------------------------------------------------------------------------------------------
## Sync meeting Mon 10 Oct 2022
14:00 CEST - 17:00 PKT
### Meeting agenda
- questions by Hafsa
- things to do with Hafsa
- set up account on CitC cluster using GitHub account
- demo of bot with PR
- next steps
- set up bot according to README, see https://github.com/EESSI/eessi-bot-software-layer
- notes
- demo of bot done, see recorded meeting
- TOD Hafsa: try and set up the bot using documentation in README file + replicate demo
- account for Hafsa @ CitC has been set up
- log in with `ssh hafsa@3.250.220.9`
- more info at https://github.com/EESSI/hackathons/tree/main/2022-01/citc
- several issues have been opened, labels have been added, see https://github.com/EESSI/eessi-bot-software-layer/issues
- issues with "difficulty:easy" + "good first issue" are good starting points