We are looking to buy some new GPU nodes for the Argus Kubernetes cluster. Chris asked us to run some checks to find the cheapest type of GPU which is viable for running our desired program. We want to run AreTomo on these nodes because then we could scale the tomo_align service; at the moment we can only scale using the tomo_align_iris service, which doesn't have as high guarantees as Argus (and no SLA).
AreTomo licensing was mentioned as a possible issue but I think this should be fine.
Test A100 V100s - how to test in container?
Mihai said to temporarily install Docker on Hopper - containerd, singularity is daemonless.
Will it make a difference running in a Singularity container vs running directly on the node or in a Docker container? Chris says every system call is marshalled by the container process, so it could make a difference. Mihai seems to think it shouldn't make a difference. Run everything through Singularity containers.
Take one node out to test - Hamilton?
Test V100 on Argus
Test A100 on Hopper/Hamilton
Test other, lower spec GPUs on Pollux