ssh you@codon-login
~/.ssh/config
to make things easier:ssh codon-ext
to connect to the codon cluster;Your ssh connection might be closed due to inactivity (after a few minutes of inactivity). To avoid this, add this to your ~/.ssh/config
:
I had some issues with sourcing shell initialization files in the codon cluster only. It seems that when we login to the cluster, ~/.profile
is loaded, but when we start a new terminal (e.g. submit an interactive job, or start a screen), ~/.bashrc
is loaded. So I needed both files to always initialize the shell with my configs. Easiest way to deal with this is to link one to the other (e.g. I did ln -s ~/.profile ~/.bashrc
). As such, whenever you change any of them, the change is reflected on the other, and the issues of sometimes one being used and the other being used is solved.
/hps/software/users/iqbal
The typo is that the quota for /hps/software
is NOT 20GB/Group, but 200GB/Group. Although it is kind of large, we should just store software, conda environments, etc here. Don't store containers, as these can be quite heavy.
/hps/nobackup/iqbal/
/nfs/research/zi
(will soon be changed to /nfs/research/iqbal
);/nfs
)… although some people have reported they had issues running pipelines on /hps
, but worked just fine on /nfs
;Sometimes a filesystem gets slow because it is overloaded with users using it, or it reached almost full capacity. Then what usually happens is that your pipelines and jobs start to run a lot slower, as in bioinformatics everything relies a lot on reading and writing data to disks. If you suspect a filesystem you are using is overloaded and slow, you can use this script to verify this: https://github.com/leoisl/test_disk_speed_in_cluster . Unfortunately, this only helps you know there is an issue, it does not solve the issue. The solution is to wait for the slow down to be fixed, or migrate to another filesystem that is not overloaded.
bsub -o hello_world.o -e hello_world.e echo Hello world!
hello_world.o
: output stream (stdout) contents will be written to this file;hello_world.e
: error stream (stderr) contents will be written to this file;An annotated look at hello_world.o
:
-J <jobname>
4GB
I think?): -M <amount_of_RAM_in_MB>
bjobs
(or bjobs -w
):-n <number_of_CPUs>
: bsub -o hello_world.o -e hello_world.e -J hello_world -M 1000 -n 8 bash hello_world.sh
ATTENTION!! Submitting a job asking for 1 CPU and running your tool with more CPUs than 1 (2, 4, 8, etc) is evil. You are telling the job scheduler you need 1 CPU, but you use 8. If everyone does this, we will have workers with 50 CPUs trying to do the work of 100 or more CPUs, the jobs of everyone will execute very slow. There is no way for the job scheduler to guess how many CPUs your job will use. If you use more CPUs than the number you asked, the job scheduler won't kill your job (different from RAM, it will kill your job).
Better host selection: -R "select[mem>1000] rusage[mem=1000]"
;
Testing for filesystem errors: -E 'test -e /homes/<your_username>'
AHHH this is too much!
This is a wrapper script created by Martin that simplifies a lot submitting jobs to the cluster.
Installation: pip install git+https://github.com/sanger-pathogens/Farmpy
Usage:
Use bjobs
. Most used parameters:
bjobs -w
bjobs -l <job_id>
bjobs -a
This is probably the most important section. When the cluster runs well, and you are submitting jobs correctly, everything runs fine. When things break (i.e. cluster is overloaded, or you have stuck jobs, or submitted to a bad node, or you submitted jobs incorrectly, etc), then you start investing a lot of time to understand and fix what is happening.
Let's simulate a job that uses ~6 GB of RAM, but we ask only for 2 GB to LSF (in reality, heavy RAM jobs are jobs > 100 GB, but it is hard to control memory allocation in python
…):
Checking heavy_RAM_job.o
:
Once we see TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.
, this means that our job requested more memory than we asked for, so LSF killed it.
Let's now ask for 10 GB (4 GB of margin):
It runs fine!
Successfully completed.
-> your job run fine!
Suppose you have a job that is supposed to do a lot of computation (e.g. running on 20 threads for a several days). After a full day of work, you do a bjobs -l <job_id>
and you see this:
This means that your job was "running" for a whole day, but actually executed on a CPU for just 6 seconds. In this case, your job is stuck. There are some reasons for jobs to get stuck. The most common one is getting stuck when mounting unresponsive filesystems when starting a container (see 9.3. Solving singularity stuck issue), but there are others. Your job could also be very slow if the worker node is overloaded, but this rarely happens. These are very unusual situations, but in any case, if you need to debug why your job is stuck, I would recommend logging into the worker node and checking what is happening. You can do this by:
EXEC_HOST
is the worker node. Then you can log into the worker node:
Here, you are in an interactive job. You can run top
, etc, to monitor what is going on in the worker node. More on interactive jobs in section 8. Interactive jobs.
When you submit a job, it first goes to the PEND
state - it is waiting for LSF to find a suitable host to run your job. Depending on the cluster overload, your job might take a good time to get scheduled. You might want to investigate why it is not being scheduled, and maybe change some submission parameters to get things going. You can do this by inspecting the pending job details with bjobs -l <job_id>
. For example, let's submit a job that requires 200 CPUs (there are no worker nodes capable of running this job in codon
):
We can see that our job is pending:
Let's find out why:
This is saying that there are no hosts among the 187 available hosts that has enough job slots (or CPUs, 200) that you are asking. There might be several reasons why your job is pending, and they are all listed in the PENDING REASONS
section.
If you need to kill a job, use bkill <job_id>
. For example, to kill the previously pending job (or a job that you run incorrectly):
If we run bjobs -wa
, we will that the job was killed:
bkill 0
To kill some specific jobs, but not all, you can use this handy bash function (add it to your ~/.bashrc
to always have it available):
For example, I have 3 short and 3 long jobs runing (the job name identifies the long and the short jobs):
If I find an error with the long jobs, I can kill them all by running:
Note that grep_bkill
don't actually kill your jobs, it just gives you the bkill
command to kill the jobs. It allows you to recheck if everything is fine before actually killing them. Now we can proceed and kill the jobs:
We can check the queues that we submit jobs to using bqueues
:
Jobs are submitted by default to the standard
queue.
We can check stats (number of CPUs, max mem, etc) of each host (login and worker hosts) with lshosts
:
Here we can see we have several big memory nodes (hl-codon-bm
) with 755GB or 1.4TB of RAM. You might want to cherry pick these worker nodes to run your jobs. We can also see that the normal (non-big-mem) nodes (hl-codon-01
, …) have 376GB of RAM only, but a lot more cores (96). Big mem nodes are in the bigmem
queue, while most of the other nodes are in the standard
queue.
Let's say we want to run a 500Gb job. We won't even manage to submit it to the standard
queue:
But we can submit it to the bigmem
queue (now using bsub.py
):
And it will run in one of the big mem nodes.
Interactive jobs are nice to test stuff out in a worker node before putting everything up into a script and running. It is like a terminal session but on a worker node. To run interactive jobs, I highly recommend adding this function to your ~/.bashrc
(credits to Michael):
Then, to start an interactive job where we can use up to 16 GB and 4 threads, we run:
This will start a job in one of the worker nodes:
Screens are interactive sessions that live on login nodes that you can detach, and reattach later. In a nutshell, you can use an interactive job inside a screen to do some work, and then, when you have to go home, for example, you can simply detach from the screen and turn off your laptop. Your screen still exists (it was not killed), and your interactive job lives inside the screen, so it was not killed neither. On the other day, you can simply resume your screen and resume working on your interactive job. Without screen, as soon as you lose connection to the server (e.g. turn off the laptop) or exit the session, the interactive job will be killed. Then on the next day you need to resubmit your interactive job, and remember what you were doing to continue your work. You might also not have your bash history available. Screens solve this problem.
As soon as you create a screen, you are in it (i.e. attached to it) and can type commands. Let's do an ls
:
Let's say I finished my work today, but want to continue tomorrow. I can simply detach from the screen. To do this type <ctrl> a
and then d
. You will see:
When you come back to work, you want to resume your screens, but you forgot its names and how many you have. Just list your available screens:
The (Detached)
means that the screen is not attached and can be attached to.
And we are back:
Sometimes, for example when you close your laptop without detaching the screen, it will not detach itself. Then you will have trouble resuming it. Let's see an example:
Here, hello_world
screen is attached to a terminal, I can't attach it to me now:
What we can do is to tell screen
to detach
the screen and resume
it for us here, by adding the parameter -d
:
And we are back in the screen:
There are much more to screen
s than what is written here. Google is really good resource to learn more.
And from here you can do whatever you want.
Systems manages a bunch of software through modules. If what you need is already in modules, then just use it! There are several documentation on the web explaining how to use modules, but here we will just see the most used commands, so that you can get going with things:
I don't have nextflow
installed:
Search for the module and load it:
Nothing is loaded:
If you use some tools frequently, load them automatically, adding a module load
line to your ~/.bashrc
. For example, this is mine:
The codon cluster is capable of running even hundreds of thousands of jobs, but managing which job failed, and has to be resubmitted, which failed due to RAM and we have to submit with more RAM, which succeeded, etc is a mess and a complicated job. For complicated workflow, we should use workflow managers. The most commonly used in bioinformatics is snakemake (in python) and nextflow (in groovy, a Java variant). You can manage workflows similarly with both. For snakemake
, we have a LSF profile that takes care of many stuff and issues with LSF clusters automatically for us. This profile was started by Michael and is maintained by him, Brice, me and others. Most importantly, we have been using it for a couple of years already in EBI clusters, including codon
, so it works well on our cluster. If you need a workflow manager and choose snakemake
, I can help you in case of any issues (as well as Michael, Brice, etc). For Nextflow, I think Martin or other people can help.
You can try stuff out on whatever environment, but we strongly suggest that when you run something important, for a project or a paper, you do it in isolated enviroments. An easy isolated environment to set up is a conda environment. If you want even more isolation, reproducibility, and easyness for external people to reproduce your results or rerun your pipeline, go for containers (Docker
or Singularity
). In the cluster, we just have singularity
, as docker
needs special permissions to run. But singularity
can run docker
containers, so you can create recipes for any of these, and you will be able to run both on the cluster through singularity
. Using containers will totally remove the but-it-worked-on-my-machine-or-cluster issue.
If you use singularity
to run containers, you might have some stuck jobs from time to time. By stuck I mean that the job is running for, let's say, 1 day, and when you query its CPU time usage (with bjobs -l <job_id>
), you see that it actually executed for just a few seconds. There is a high chance that singularity is just stuck trying to mount several paths predefined by systems. In summary, systems wants to simplify the life of everyone using singularity, so they specify a configuration that every almost every filesystem available in the codon cluster is mounted when a singularity container starts. It takes a single one of these filesystems to be overloaded or very slow for some other reason to get all your jobs using singularity stuck. We, as a research group, don't use any special filesystem, just /nfs,/hps,/homes
. Thus, you can add this to your ~/.bashrc
:
This will tell singularity
to not mount any predefined paths specified by systems, and just mount /nfs,/hps,/homes,/tmp
, which is all we normally need.
Many tools use /tmp
directory to create temporary files during its execution. In the codon
cluster, /tmp
is a very fast filesystem (it is a local filesystem of the worker node), but it is also relatively small. There is no control of how much tmp
space a job can use, thus we could have 96 jobs running on a worker node, and a single one of these could fill up the tmp
space, and every job that needs tmp
now will fail. Most worrying is that there are tools that don't clean up their tmp
files if they have an error during their execution. So, in summary, /tmp
filesystem is super fast, but unreliable. If you don't want to encounter these issues anymore (they look like this: Not enough free disk space on /tmp
), you can change the temporary directory by adding this to your ~/.bashrc
:
Notes:
custom_temp_dir
, change to one of your paths. It has to be a /hps
path and the dir has to exist. It should not be a /nfs
path (too slow, and no need to be backed up), or a /hps/software
path (it will just fill up your /hps/software
and you won't be able to install anything else);/tmp
dir than to /hps
. If your job does a lot of I/O to the temp dir, then it might actually get slower, otherwise the slowdown is negligible;/hps
is full, but then the whole cluster goes down;/hps
filesystem is not writable from login nodes, just from worker nodes. But we need access to a writable temp dir from login nodes for several reasons, one of them is submitting jobs: bsub
creates temporary files when submitting jobs, and if the temp dir is not writable, you won't be able to submit any jobs;This is a copy paste from Michael Hall's instructions. All credits go to him.
First thing we will define is some port number variables we use in different places to make it easy to control
If you get a permission error when trying to run jupyter notebook
then try running export XDG_RUNTIME_DIR=""
.
localhost:<PORT_NUM>
with the value of the local_port
variable you set in the beginning. For example - In the above we would change http://localhost:8080/?token=<TOKEN>
to http://localhost:9000/?token=<TOKEN>
I have surely missed stuff about using the cluster here. There are more things, but we could make these meetings from time to time, even if it is just 10 minutes to talk about something new. Using the cluster is like riding a bike, it is not much use to read and re-read this document - now that you know the basics, when the time come that you need to run some job in the cluster, try it yourself. If it does not work, try to debug it yourself, or check if this document helps. EBI intranet (https://intranet.ebi.ac.uk/) might help, or googling about your issue. You can also always contact us on the #codon-cluster
channel on Slack or, if you feel more comfortable, just message me directly with your issue and we can discuss it. I really like solving cluster issues, as I think it is really important to keep the group workflow going, and to let others be aware of potential issues and already provide a solution when they face them.
Incorporate EBI guides (search email for "EBI cluster guides")