Running nf-core pipelines on de.NBI SimpleVM clusters

--- title: Running nf-core pipelines on de.NBI SimpleVM clusters tags: denbi,nextflow,tutorial,clusters,nf-core,bibigrid author: James A. Fellows Yates date: 2022-12-07 --- # Running nf-core pipelines on de.NBI SimpleVM clusters ## Introduction This tutorial describes how you can set up nf-core pipelines (and likely any Nextflow pipeline) on the de.NBI cloud's SimpleVM cluster system using the SLURM executor and docker. The following instructions assume that you already have: - a de.NBI account, - an approved SimpleVM project with allocated resources - requested access to the SimpleVM-cluster interface (in beta, at the time of writing) - (optionally) requested an additional OpenStack project, if planning on using S3-based storage - It is currently not possible to generate S3 object storage from a SimpleVM project, therefore you must request two separate projects Once you have these, you can open the [de.NBI cloud portal](https://cloud.denbi.de) > 💡 Most of these instructions on this page will also likely apply a cluster spun up with the BiBiGrid CLI ## Spin Up the Cluster 1. In the left-hand menu, select 'New Cluster' ![](https://i.imgur.com/4Id7lWD.png) 2. Choose a name for the cluster ![](https://i.imgur.com/XQ3diEn.png) 3. Under **master instance** (i.e., the head node) select a flavor with a large ephemeral disk to act as a your 'head node' ![](https://i.imgur.com/B99r1Dc.png) > 🛈 At the moment, only the disk of the master node is used as NFS disk which means that if you want to process a lot of data, you will have to choose a flavor for the master with a large ephemeral disk (e.g. highmem medium with 500GB ephemeral disk). This will change once we offer to use a volume as nfs disk. 4. Under **Image** select `Ubuntu20.04 de.NBI (C) Master (2022-12-05) - You can hover over the name to get a tool-tip of the full name ![](https://i.imgur.com/AWjdHiK.png) 5. Under **worker instances** select the flavors for your worker nodes, and the number of these instances you wish to use > ℹ️ The specific set up may very depending on what work you wish to do, and what your project resource allocation is set to. You have different tabs for standard and high memory flavours. > ℹ️ If you need a mixture of flavors, press 'Add worker batch' at the bottom of the section > ❓ How to configure different partitions? ![](https://i.imgur.com/0g0KvkM.png) 6. Press 'start cluster', and wait for it to spin up [Start: 08:51, Built: 09;01)] > ℹ️ Can take ~10m 8. Once it has spun up, follow the instructions to `ssh` into the head node ## Making persistant storage You can use two different ways of storing your results files. This can either be a dedicated SimpleVM volume or an S3 Object Storage. We will provide instructions for both. The latter is more involved to set up, but makes accessible of results files easier - particualy viewing results from your webbrowser (via de.NBI's OpenStack portal) ### Volume based To ensure that you can save your data somewhere in a more long-term manner (i.e., not on ephemeral/unstable), you need to also create and attach a volume to your cluster. 1. Go to 'Volumes' in the 'Vuirtual Machines > Overviews' section of the cloud portal ![](https://hackmd.io/_uploads/B1Y5G2Xmn.png) 2. Press the Create & Attach Volume button at the top ![](https://hackmd.io/_uploads/HklnBQnmQh.png) 4. Select your project and wait a moment for the usage of the project to load. 4. Select the headnode of the now spun-up cluster 5. A volume name 6. How many gigabytes the volume should hold ![](https://hackmd.io/_uploads/rJOCf3mQ3.png) 4. Press Create and Attach volume 5. Note the 'Device' name (e.g./dev/vdd) on the volume overview page. 6. Switch back to the terminal where you have logged into the cluster. 7. Follow the instructions from '[Create the volume file system](https://simplevm.denbi.de/wiki/simple_vm/volumes/)' section of the deNBI wiki to make the volume available to your software. ### S3 Object Storage To ensure that you can save your data somewhere in a more long-term manner (i.e., not on ephemeral/unstable), and you wish to access the objects more easily outside of the de.NBI network, we need to set up S3 on your OpenStack project and cluster. The following instructions are how to set up `s3cmd` to _manually_ interact with the object storage _not_ Nextflow. We will also need to configure Nextflow with the instructions [below](#setting-up-the-cluster). 1. Request an OpenStack project on the de.NBI cloud portal 2. Make a Object Storage container within the OpenStack portal. Left Bar Menu > Object Store > Containers > ➕ Container ![](https://hackmd.io/_uploads/BJ9jSbYVh.png) 3. Create a set of Application credentials following [here](https://cloud.denbi.de/wiki/Compute_Center/Bielefeld/#application-credentials-use-openstack-api) - Left Bar Menu > Application Credentials > Create Application Credential ![](https://hackmd.io/_uploads/BkkSLbFE3.png) 4. Download the corresponding `openrc` file 5. On your SimpleVM Cluster, copy and paste contents of downloaded file from password credentials portal into a file e.g. `openstack-s3.rc` 6. Source the resulting file ```bash source openstack-s3.rc ``` > Note: You will have to source this file on every login! If you wish to have this happen automagically on every login for you do the following ```bash echo "source /<path>/<to>/openstack-s3.rc" >> ~/.bashrc ``` > ⚠️ update path above! 8. Install the OpenStack CLI (apt install can't find it on 20.04 image?, switching to pip) ```bash pip install python-openstackclient echo 'export PATH="/home/ubuntu/.local/bin:$PATH"' >> ~/.bashrc source ~/.bashrc ``` 10. To verify the application credentials, run the following command. Output is result in no errors, and is expected to be empty. ```bash openstack --os-identity-api-version 3 ec2 credentials list ``` 8. Create new S3 credentials (if needed) ```bash openstack --os-identity-api-version 3 ec2 credentials create ``` - **Note down the `Access` and `Secret` values** these will be required for both s3cfg and nextflow.config configuration steps 9. Verify the credentials were created correctly ```bash openstack --os-identity-api-version 3 ec2 credentials list ``` 10. Install `s3cmd` ```bash sudo apt install s3cmd ``` 11. Configure s3cmd as described [here](https://cloud.denbi.de/wiki/Tutorials/ObjectStorage/#configuration-of-the-s3cmd-client) ```bash s3cmd --configure ``` To summarise, you will need your `~/.s3cfg` file to be modifed in the following entries: - `access_key = <access_key_from_openstack_credentials>` - `bucket_location = US (default)` - (S3 Endpoint in configure dialogue) `host_base = https://openstack.cebitec.uni-bielefeld.de:8080 // (for Bielefeld)` - (DNS-style bucket+hostname in configue dialogue) `host_bucket = %(bucket).https://S3_openstack.cebitec.uni-bielefeld.de:8080 // (for Bielefeld)` - `use_https = True` - `secret_key = <secret_key_from_openstack_credentials>` 12. Test you can see your S3 container created earlier with ```bash s3cmd ls ``` Useful commands for manually interacting with the bucket ```bash ## similar to ls -a s3cmd la ## To check teh contents of a directory on the bucket s3cmd ls --recursive ## To upload a file to the bucket s3cmd put <file> s3://<bucket_name> ## To upload a file to a directory on the bucket (exisiting or new) s3cmd put <file> s3://<bucket_name>/<directory_name>/ ## To upload a directory to the bucket s3cmd put <directory>/ s3://<bucket_name>/ --recursive ## To delete a file in the bucket s3cmd rm s3://<bucket_name>/<file> ## To delete a directory in the bucket s3cmd rm s3://<bucket_name>/<directory>/ --recursive ``` ## Setting up the cluster > **Optional**: to add other members to the cluster so they can also log into the same infrastructure, you will need to add their public keys to the file `~/.ssh/authorized_keys`. > > You can get the public keys of the other users by going to the the corresponding project management page of your project on the left hand side of the deNBI portal, scrolling down to the project members and pressing 'public key' in the actions column. > > ![](https://hackmd.io/_uploads/By7e4uAM2.png) > > You can then copy this string and add it to the authorized keys file. Your colleagues should then be able to log in. > > In the future this functionality will be available via the deNBI portal page. Before we can run jobs, we need to install and configure Nextflow and Docker. 1. Check docker is running correctly ```bash docker run hello-world ``` - The `hello-world` image should be pulled and a message from Docker saying 'hello' should be printed 2. Change into the shared NFS under with ``` cd /vol/spool ``` > ⚠️ All your computations from now on should be run from this directory! Your worker nodes cannot access to your `$HOME` on the head node! > ❓ Are other volumes (when you need more HDD space) accessible by workers? 3. Install Java and download the Nextflow executable ```bash sudo apt install -y default-jre mkdir bin/ && cd bin/ curl -s https://get.nextflow.io | bash chmod +x nextflow ```` 4. Set `NXF_HOME` to the shared NFS, and add the Nextflow executable to your path ```bash mkdir /vol/spool/nxf_home echo "export NXF_HOME=/vol/spool/nxf_home" >> ~/.bashrc echo "export PATH=\"/vol/spool/bin:$PATH\"" >> ~/.bashrc source ~/.bashrc ```` > ℹ️ Setting NXF_HOME to spool ensures that all (nf-core) pipelines are stored in a location that any `bin/` directory scripts can be shared with the worker nodes 5. Set up a nextflow config file. At a minimum you require ```bash echo "process.executor = 'slurm'" >> $NXF_HOME/config echo "process.scratch = '/vol/scratch'" >> $NXF_HOME/config echo "docker.enabled = true" >> $NXF_HOME/config ``` > ℹ️ If you have a more complicated cluster set up `sinfo -a -o '%P|%n|%c|%m|%l'` can give you most of the required information needed to make you nextflow config file If you wish to use S3 object storage, you will additionally need to add the following to the config file. **Note:** Do not forget to add the access and secret keys according to the openstack config created. ```bash echo "aws.client.endpoint = 'https://openstack.cebitec.uni-bielefeld.de:8080'" >> $NXF_HOME/config // For Bielefeld echo "aws.client.protocol = 'https'" >> $NXF_HOME/config echo "aws.client.s3PathStyleAccess = true" >> $NXF_HOME/config echo "aws.accessKey = '<access_key_from_openstack_credentials>'" >> $NXF_HOME/config echo "aws.secretKey = '<secret_key_from_openstack_credentials>'" >> $NXF_HOME/config ``` You will likely also need to specify your clusters absolute maximum available resources (i.e., on the largest node). ```bash echo "params.max_memory = '<largest_node_RAM>.GB'" >> $NXF_HOME/config echo "params.max_cpus = <largest_node_cpu>" >> $NXF_HOME/config echo "params.max_time = '730.h'" >> $NXF_HOME/config ``` > ℹ️ To get this information, you can run `scontrol show node` and look for the node with largest CPUTot and RealMemory fields and node the values there. We also highly recommend setting `cleanup = true` to the config, to ensure intermediate files in `work/` are removed after a successful run only. Note this will break resume, only specify if you're comfortable with this! ```bash echo "clean_ up = true" >> $NXF_HOME/config ``` 6. Start the`slurm` service ```bash sudo service slurmctld start ``` You can check, if `slurm` is running correctly by typing the command `squeue` 7. Pull your nf-core pipeline pipeline (e.g. nf-core/eager) ```bash nextflow pull nf-core/eager ``` 8. Open a screen/tmux session 9. Run the test profile of your nf-core/pipeline with your custom config - Using volume-based storage ```bash cd /vol/spool/ mkdir test cd test/ nextflow run nf-core/funcscan -profile test,docker --outdir /vol/<your_volume>/results -w /vol/spool/work ``` > ⚠️ You must make sure both `results/` and your `work/` directory are located in `/vol/spool` as they need to be accessible by all worker nodes! > ℹ️ If you get an error such as 'exceeds available memory' you likely have forgotten to specify the custom config file with `-c` - Using S3-based storage ```bash nextflow run nf-core/funcscan -profile test,docker --outdir 's3://<your_s3_container_name>' -w /vol/spool/work ``` > ℹ️ You can verify the test run has stored your `--outdir` by running `s3cmd la s3://<your_s3_container_name>/test` > ℹ️ To delete the test directory you can run `s3cmd rm --recursive s3://<your_s3_container_name>/test/` ## Conclusion This tutorial has shown you how to spin up a cluster using the de.NBI cloud SimpleVM interface, and the bare-minimum configuration required to run an nf-core pipeline. It also describes how to use volume or S3 object storage to store output data for long term storage ## FAQ and Troubleshooting ### Hanging runs and SLURM exit status of 0:53 after a restart After a cluster reboot, you may occasionally get problems where when you try to run a `Nextflow` pipeline, the run will hang for a long time with no progress then time out with a 'killed by the external system'. If you get this, and run `sacct` you may see all the `nf-*` jobs with an exist status of `0:53`. The cause of this problem is normally when the compute nodes cannot access the `/vol/` directories as the filesystem (NFS) system is down. If you get this problem, you can first try: 1. Re running the `ansible` playbook that is run in the background when run when you initially boot up the cluster ```bash cd ~/playbook ansible-playbook -v -i ansible_hosts site.yml ``` 2. If it still isn't working, you can run ```bash sudo systemctl enable nfs-kernel-server ``` 3. If that still doesn't work, manually restart the nfs server Check if the nfs-kernel is activated: ``` sudo service nfs-kernel-server status ``` If it's inactive, then try restarting it with ``` sudo service nfs-kernel-server restart ``` Re-run the playbook with ``` cd ~/playbook ansible-playbook -v -i ansible_hosts site.yml ```