# EESSI/Azure/SURF sync meeting 20211015
## Agenda
- Update on NeIC project proposal (S4)
- EESSI Stratum-1 in Azure (Bob)
- GitHub runners for EESSI hosted in Azure VM (Bob)
- Use of Terraform (Bob)
- Zen3 build node (Kenneth)
- Some trouble due to SELinux (?)
- Work on interconnect detection support in archspec (Hugo)
- See https://github.com/archspec/archspec/pull/60
## Attendees
- Laura Redfern
- Martin Brandt
- Ivar Janmaat
- Bob Dröge
- Kenneth Hoste
- Ahmad Hesam
- Alan O'Cais
- Hugo Meiland
## Notes
- Update on NeIC project proposal (S4)
- Did not get funded
- Pretty decent score but competition was tough
- Were recommended to reapply in next funding round (Feb. 22)
- Will use feedback to tune proposal
- Need to make a concrete connection back to users
- Need some additional nordic partners (since that was noted)
- Other opportunities will arise soon (like EOSC calls which are currently being fine-tuned)
- Laura: Can arrange a letter of support for future bids
- EESSI Stratum-1 in Azure (Bob)
- Now part of the (latest) configuration package
- CVMFS uses geoapi so may not be used so much since it currently sits in US
- Hugo will test it out
- Can check with `cvmfs_config` which S1 you're talking to
```
# first make sure that CVMFS is mounted, e.g. by doing an ls:
ls /cvmfs/pilot.eessi-hpc.org
cvmfs_config stat -v pilot.eessi-hpc.org
# That should show something like:
# Connection: http://134.94.88.70/cvmfs/pilot.eessi-hpc.org through proxy DIRECT (online)
```
- (Default) GitHub runners may also be using this
- Should keep an eye on traffic, as this can be large
- Azure blob as Stratum-1 is an option that might be interesting
- GitHub runners for EESSI hosted in Azure VM (Bob)
- Some of our actions exceed the 6h time limit for default runners
- CVMFS do not provide containers for some archs (ARM + POWER) so we need to build them from source
- Created our own runners to build containers
- Only need these intermittently when the containers need updating
- Any experience with Auto-scaling Kubernetes cluster for GitHub Actions workflows?
- Martin can check with SURF people working on Kubernetes & GitLab runners
- Hugo can share info on throwaway multi-node clusters used internally
- Martin: see https://docs.microsoft.com/nl-nl/azure/aks/kubernetes-action
- for multi-node application testing Magic Castle should work well (and more secure than Cluster-in-the-Cloud)
- support for Infiniband and EFA
- see https://github.com/ComputeCanada/magic_castle
- Use of Terraform through API access to Azure (Bob)
- separate 'terraform' account
- Martin can probably help here, has done this
- Zen3 build node (Kenneth)
- available now for EESSI in West Europe
- Some trouble due to SELinux (?)
- Using our container inside the image is kicking an error
```
Singularity> mkdir /cvmfs/pilot.eessi-hpc.org/2021.06/software/linux/x86_64/amd/zen3
mkdir: cannot create directory '/cvmfs/pilot.eessi-hpc.org/2021.06/software/linux/x86_64/amd/zen3': Operation not supported
```
- Can make it work if `/tmp` is used for the overlay
- Look like it could be related to mount options and SELinux
- Once we get this resolved we should have a full stack in a day given the node is so powerful
- Work on interconnect detection support in archspec (Hugo)
- See https://github.com/archspec/archspec/pull/60
- Motivated by issues with using the interconnect, see https://github.com/EESSI/software-layer/issues/136
- Fixed by setting some environment variables
- OpenMPI should probably behave nicer
- Interconnect detection could trigger some appropriate environment variables
- Usage is about 200euro/month so no alarms trigger :P
- only Stratum-1 + GitHub Actions runners
- Are there any relevant upcoming events?
- There will be another EasyBuild User meeting
- Having an end-user focussed tutorial might be a good idea
- For example, for someone building on top of the EESSI stack
- Topics:
- setting up EESSI from scratch
- usage
- building your own software on top
- Hugo: Marketplace VM image
- When will there be a production stack?
- Really hard to say
- Would need to have some monitoring in place...and someone to notify if there is something wrong
- How we do roll something back? Who can do that?
- If we have issues with a stratum 1 who fixes it, and if we can't contact the responsible people how do we kick it out
- Can we use DNS to kick out stratum 1s?
- That is possible