Please add here below questions or comments or share links (edit button top left or top right):
Please write your questions above this block.
How to do slurm array jobs with inputs with non-simple files names as inputs like sample-1.txt, sample-55.txt, sample-111.txt?
DATASET=$(head -n$SLURM_ARRAY_TASK_ID files.txt | tail -n1)
. In this case the files.txt
can be created with for example ls *.out > files.txt
Persistent mounting of project is not working inside of JupyterHub container on NIRD Toolkit?
How to share data (zar archives) from NIRD best?
Can you share files from NIRD by placing them in a ssh
folder?
…
preproc
partition but only for one day, see https://documentation.sigma2.no/jobs/choosing_job_types.htmlscontrol show partition=normal
on betzy.Is it possible to run longer than usual jobs? Background is to run a dask scheduler that would "run" for several weeks and schedule shorter jobs on the cluster
Suggestion for future HPC: provide a side-node to "park" schedulers since users regularly need and ask to have a scheduler which is "outside" of slurm (Dask, Snakemake)
Upcoming NRIS courses
would it be beneficial to have modules for data(sets/bases)?
How to know what resources (time, cores, memory) to ask for a job?
Upcoming NRIS courses
Slides will be linked soon
This is not related to GPU, but I cannot use sftp to access NIRD for a while. I asked for email support but there is no response yet. WHen I use scp, this is only response "First: /usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"
Is there a rule-of-thumb for deciding when it's worth using a GPU (e.g. size of matrix, number of matrix operations)?
A question related to parallel python runs. We have a lot of single-processor python scripts to do model diagnostics, which we would like to run in parallel (a lot of simultaneous runs) in a slurm job. But when submitting "mpirun python run.py", all the runs are submitted to only 1 cpu in the allocated resources.
What computational tasks are more suitable to run with GPUs instead of CPUs?
Question to meeting participants: what change/improvement (small or large) would make your work on computing and storage resources easier and smoother?
I suppose it is a very complicated task to modify a CPU code into a GPU code, right?
Gromacs is available for AMD GPUs, do you know if there are any efforts porting NAMD to AMD GPUs so that it would be available on LUMI? I saw some slides from AMD about it a while ago (at a LUMI meeting actually) but haven't seen anything about it lately. (edited out the name since now we know and since we reuse this document)
eap
partition to test run on GPUs, even without having a LUMI-G allocation at the momentHow do I get in contact with the GPU team?
…
Interconnect diagram of a LUMI-G node:
Please write your questions here:
Can you show again how to be in this page you are showing now, i am just logged in…
- https://apps.sigma2.no/packages/sigma2/jupyterhub/0.16.15/install
how to reconfigure a stopped/failed application?
why some memory is is not released after closing all the applications?
- after stopping or deleting an application it can take a couple of minutes for the service to be completely shut down. If there is some resource that is not released, you can ask us to look at it in a support ticket . When stopping a jupyterhub service, the user services attached to it can still be running, so before stopping the hub service go to the hub control panel and stop all running servers .
-
How to apply to have more resources (e.g. memory)?
- at the moment you can apply through a support ticket to either support@nris.no or contact@sigma2.no
Can I set the paths of the tensorboard in Deep learning apps?
- it is not currently configurable from the installation pages, but we have noted it as a feature request
when using persistent data storage, would it be possible to specify the home path and jupyter lab configuration path?
- similarly to the question above, and we will look at it.
NorESM DIagnostic Tool and ESMValTool are also included in the NIRD Toolkits?
-It is not included by default in the NIRD toolkit, one way of adding software in the NIRD toolkit is to build a custom docker image that contains the software : https://documentation.sigma2.no/nird_toolkit/custom-docker-image.html
Slides: https://docs.google.com/presentation/d/1pgueQ6w8sFW4-1y3iRwiWgkypUhrlLfhEPTFSY2_Lw8/
Next course: https://documentation.sigma2.no/training/events/2022-05-best-practices-on-NRIS-clusters.html
Where to put self-installed software? Home or project folder?
useful command to compare time that a job took and cpu time used:
sacct -j JOB_ID -o NTasks,ReqCPUS,AllocCPUs,CPUTime,Elapsed,Timelimit,ExitCode,NodeList
seff JOB_ID
is more usefulAveCPU
can also help to show whether CPUs were busy [sabryr@login-3.SAGA ~]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5621650 normal example- sabryr PD 0:00 1 (Priority)
[sabryr@login-3.SAGA ~]$ ssh c1-32
[sabryr@c1-32 ~]$ hostname
c1-32
[sabryr@c1-32 ~]$ top -u $USER
- then quit top by pressing "q"
- log out of compute node with "exit" and you are back on login node
To check jobeffiencey
seff <JOBID>
Some places to look for bioinformatics pipelines and tools which are not installed on our servers: nextflow.io, https://singularity-hpc.readthedocs.io/en/latest/
Status of Quantum Chemistry Software VASP & Gaussian - Presentation slides
Super short feedback form. Please fill it out
Dedicated Q&A sessions with VASP?
Since the Betzy and Fram have CPUs from different companies and we are compiling VASP by ourself. Is it possible to maybe share a recommended compiling setting for Betzy?
-march=core-avx2
is the most safe option and also what performs decent. Running VASP on AMD is not straightforward, but easier with VASP 6 due to better support for newer GCC versions etc. Also, other cleanup in the code. We will thus prioritize getting VASP 6 deployed on Betzy first. Our compilation setup, including the full stack, which also runs the tests and build the modules etc. will be accesible to users.Also have you done any benchmark like calculation on Betzy which you can share?
If I want to run several VASP calculations with ASE(python package) in a single submit run are there a best practice to do it nicely parallelization in documentation?
Workshops, meetings, etc. How often?
VASP tutorials. Educating new users. How often? Local training?
Dedicated channel in our Mattermost chat client?
Email list that is automatically updated?
Development of better documentation. Also in tutorial form.
Need to also be a community effort. Do users want to contribute?
Do users feel like they are part of a local VASP community?
Do you see a user need to have VASP in a container? Do you need a reproducible environment for VASP?
What is most pressing for the VASP community?
Any interest in the user community to utilize AiiDA-VASP and/or the AiiDA framework?
Does there exist an overview of the numerous quantum chemistry softwares installed on NRIS? Also, is there an overview of the licenses and who pays for them?
For development access to LUMI-G, should I apply through Norway’s share in next week’s deadline?
For application to LUMI-G, can I / should I enter required number of CPU hours on the GPU nodes? Are there different quotas for CPU and GPU hours?
Could we have the slides from the talk please? Lots of usefull links there.
Does the application + allocation of storage on LUMI work the same as on Saga etc.?
Problem with multi-node Gaussian jobs on Saga
File limit problem with Conda environments
singularity exec docker://ubuntu:18.04 bash -c "mkdir -p overlay/upper overlay/work && dd if=/dev/zero of=overlay.img bs=1M count=50 && mkfs.ext3 -d overlay overlay.img"
singularity shell --overlay overlay.img docker://ubuntu:18.04
Will there ever be same VNC support for Betzy or LUMI the same way it exist on FRAM and SAGA right now?
When running Slurm Array jobs on GPU-nodes on Saga, I don't get a "GPU usage stats:"-summary for any of the array tasks or the whole job. I only get the "GPU usage stats" when NOT running an array job - any way to get GPU stats also for arrays?
Training event first week on November:
Registration will close very soon
All the sessions will be recorded
We have all material from last course (March): https://documentation.sigma2.no/training/material.html#training-material-used-in-our-past-courses
After the Fram downtime: inter-node communication (?) with Gaussian/Linda. Sometimes it crashes out of the blue, but only for multi-node jobs. Difficult to reproduce.
Slurm environment variables: https://documentation.sigma2.no/jobs/job_scripts/environment_variables.html
If a job crashes sometimes, what can one do?
--exclude
it in your jobscript. But even better for everybody else is if you report it to us so that we take that node out of the system and fix it.We also discussed strategies of what to do if a Betzy job runs optimally on only half the cores. How to schedule it without paying for the unused cores and/or wasting resources.
Is it user friendly on Sigma2, since I am not a regular Linux user?
So the super-computers you have at the center could be accessed but remote loggin or pl has to sit physically by the computers?
Is there some kind of forum or online help pl can ask stupid questions and discuss issues of the workflow and problems undergoing?
Question about whether we have tested Singularity performance when scaling to many nodes
I [KZ] am working on a rat genome assembly. There is a huge amount (around 1TB) of raw data to screen against. Would it be problem to upload such huge amount of raw data?
rsync
for transfering files of this size/amount (it also checks consistency of data)Research Data Archive
Request to host https://www.iochem-bd.org/