owned this note
owned this note
Published
Linked with GitHub
# AWS tech short recording (2023-05-30)
- 10-15min per video
- prep
- scenario
- takeaways
- visuals
https://docs.google.com/presentation/d/1H2EyLcwSSMrt8b9NY75seztOegC3X60eSpGNQWybMIw
## Scenario
* Set the scene
* Compiling for the CPU micrarchitecture matters (show GROMACS performance barplot)
* You can have _portable_ environments with a container runtime... but you need more than that for the P in HPC
* EESSI:
* Optimized binaries for specific CPUs that work on any Linux distribution
* Streaming scientific software from remote filesystem
* Show the EESSI graphic, and give a brief description of the layers
* compat layer: "containers with the containing"
* Spin up vanilla Amazon Linux VM in eu-west-1 (same as AWS S1) with AWS Graviton2 CPUs (c6g.2xlarge, 8-core, 16GB)
* ssh ec2-user@...
* Show how empty it is (no git, low disk usage, no `module` command, no modules/scientific software, ...)
* Clone the EESSI demo, walk through the installation script (cfr. https://github.com/EESSI/eessi-demo)
* source the init script and run "`module avail`", and also show that disk usage has not changed much even though we have a ton of software
* Run a few examples, show how quick it is
* how disk usage changes with each run (OpenFOAM is a good example here since the install is very big, but what you use isn't)
* TensorFlow is big-ish but has nicer output
* cvmfs_config stat pilot.eessi-hpc.org | column -t
* du -sh on compat layer (~3.5GB) + TensorFlow (1.7GB) + OpenFOAM (1.7GB) + GROMACS (~150MB)
* => ~7GB (excl. deps!)
* cvmfs stat shows < 1GB for running GROMACS + TensorFlow + OpenFOAM
* so only ~10-15% of data is really needed to run
* Proof-of-concept EESSI pilot repository, so limited software stack + old software versions
* Explain in more detail how EESSI all works
* show structure in /cvmfs/pilot.eessi-hpc.org/...
* CVMFS visual
* Adding software to EESSI: community effort
* bot workflow (visual)
* MultiXscale
* Learn more
* EESSI open access paper
* EESSI docs
* Explain why multinode also works (fat MPI builds)
* Performance graph from EESSI paper
* https://github.com/EESSI/paper-SPE-SI-HPC/blob/main/Data/GROMACS.ipynb
* Show off how to make it available on ParallelCluster
* Show that we get expected network performance there (OSU Microbenchmarks)
* Show a GROMACS example on multiple nodes
* Later
* EESSI production (eessi.io)
* GPU support
---
Useful commands:
```bash
cvmfs_config stat pilot.eessi-hpc.org | column -t -H 1,2,3,4,5,6,7,10,11,12,13,15,18,19
mpirun --mca pml cm osu_bibw
mpirun --mca pml cm osu_latency
# Replace MPI stack with AWS shipped packages
LD_PRELOAD=/opt/amazon/openmpi/lib64/libmpi.so.40:/opt/amazon/openmpi/lib64/libopen-rte.so.40:/opt/amazon/openmpi/lib64/libopen-rte.so.40:/opt/amazon/openmpi/lib64/libopen-pal.so.40:/opt/amazon/efa/lib64/libfabric.so.1:/lib64/libefa.so.1:/lib64/libhwloc.so.5:/lib64/libevent_core-2.0.so.5:/lib64/libevent_pthreads-2.0.so.5:/lib64/libnl-3.so.200:/lib64/libnl-route-3.so.200 mpirun --mca pml cm osu_latency
```
## Notes
- AWS doesn't like that EESSI provides its own libfabric and bypassing the one they provide