Using CSC HPC environment efficiently 3 / 2021

# Using CSC HPC environment efficiently 3 / 2021 ###### tags: `puhti` `mahti` `allas` > This is the collaborative "notebook" for the "Using CSC HPC environment efficiently" course organised in August 2021 at CSC -IT center for Science. > [Prerequisite course page in eLena platform](https://e-learn.csc.fi/course/view.php?id=70) > [Course page in eLena platform](https://e-learn.csc.fi/course/view.php?id=71) > [Collection of slides and material](https://csc-training.github.io/csc-env-eff/) > [Zoom-link for sessions](https://cscfi.zoom.us/j/63897095482?pwd=R0JMQU1sVkZKelJrYTlTMXRCR054UT09) > [Course chat in chat.csc.fi](https://chat.csc.fi/invite/QwwNHE) (Link checked 20.08.21. klo 13.40 It should accept eDuuni-ID which can be created wih HAKA or Gmail etc.) :::danger :calendar: The latest updates on the **course schedule** can be found in here. :interrobang: This is also the place to ask questions about the workshop content! We use the Zoom chat only for posting links, reporting Zoom problems and such. ::: :::info :bulb: **Hint:** HackMD is great for sharing information in this kind of courses, as the code formatting is nice & easy with [MarkDown](https://www.markdownguide.org/basic-syntax/)! Just add 3 ticks (``` ` ```) for the``` code blocks ```. Otherwise, it's like a Google doc: it allows simultaneous editing. There's a section for practice down there ⬇️ ::: [ToC] ## 📅 Agenda ### Day 1: Wednesday 18.8. :::spoiler | Time | Content | |-------|---------| | 8:50 | Login, check connections work, test microphone | | 9:00 | Welcome, course practical info, learning targets | | 9:15 | **Topic 1: Prerequisites & Connecting** | | 9:30 | **Topic 2: Introduction to HPC Environment** | | 9:55 | _Break_ | | 10:05 | **Topic 3: Disk Systems** + Hands on 3.1 (15 min) | | 10:30 | **Topic 4: Module System** + Hands on 4.1. (10 min) | | 10:50 | _Break_ | | 11:00 | **Topic 5: Batch Jobs** + Hands on 5.1 (25 min) | | 11:50 | Recap + preparations for Thursday | | 12:00 | Finish | ::: ### Day 2: Thursday 19.8. :::spoiler | Time | Content | |-------|---------| | 9:00 | **Topic 6: Understanding resource usage** | | 9:40 | **Topic 6:** Hands on (continue 5.1 from Wednesday, if needed, then to 6.1, 10 min) | | 9:50 | _Break_ | | 10:00 | **Topic 6:** Hands on 6.1 continued (30min) | | 10:30 | **Topic 8: Installing applications** | | 10:50 | _Break_ | | 11:00 | **Topic 8:** Hands on 8.4 and 8.1 (50min) | | 11:50 | Recap + preparations for Friday | | 12:00 | Finish | ::: ### Day 3: Friday 20.8. :::spoiler | Time | Content | |-------|---------| | 9:00 | **Topic 7: Allas lectures and demos** | | 9:50 | _Break_ | | 10:00 | **Topic 7:** Hands on 7.1 (30 min) | | 10:30 | **Topic 9: Containers** | | 11:08 | _Break_ | | 11:18 | **Topic 9:** Hands on 9.1 + ask anything on all previous topics (30 min) | | 11:50 | Recap + feedback + open questions + cleaning up files from course project | | 12:00 | Finish | ::: --- ## 📝 Q & A Your questions are answered here. We will answer them, and this document will store the answers for you for later use! :rocket: :::info Scroll :arrow_down: to the bottom of the page to submit a question ::: ### General and practical matters - [x] **Q: I have difficulty pasting my questions into HackMD (here). Do you have some instructions on how to write here?** - A: Can you see these three icons on top left corner, next to HackMD text? There’s pencil, this side-by-side symbol, and an eye. In eye view, you can’t edit, you are just viewing. The other two reveal the markdown (MD) version of the page, which you can edit. I find it easiest to edit with the side-by-side view. :::info :bulb: **Hint:** You can also apply styling from the toolbar at the top :arrow_upper_left: of the editing area.![](https://i.imgur.com/Cnle9f9.png) ::: - [x] **Q: Slides available after the course?** - A: They sure are! You will also have the access to this HackMD document (save the link). - The slides and tutorials are available here: https://csc-training.github.io/csc-env-eff/ - Access to e-lena (and the quizzes there) may discontinue. - We encourage you to share and use the material also in your own courses. The material is in git, and we welcome all feedback and edit suggestions (pull requests)! - [x] **Q: Has the Zoom meeting already started? I'm just gettin "The meeting has not started" -note.** - A: Make sure you have the full Zoom link when connecting! It's long and contains the password :) - [x] **Q: Should we be able to access the slides in e-elena? I can only see the first slide.** - A: Try navigating with the arrow keys :) ### User account, logging in to Puhti, ssh... - [x] **Q: So many credendials... Which ones do I need and where?!?** - A: You need - **CSC credentials** to Puhti/Mahti - **Haka OR Virtu credentials** (=your university/institution credentials) to my.csc.fi - **Haka credentials** (=your university/institution credentials) to [eLena platform](https://e-learn.csc.fi/course/view.php?id=71) - contact us if you don't have HAKA/Virtu in use - [x] **Q: I have forgotten my CSC username and password** - A: In my.csc.fi you can check your CSC username. If you have forgotten your CSC password, check this link: https://docs.csc.fi/accounts/how-to-change-password/ -Note that it takes some time for the password to update to Puhti, if you need to change that :) - [x] **Q: Depending on the place you say to obtain acces to Puhti and Allas or Puhti and Mahti, should we have all three of them (just in case)?** - A: On the course there is one tutorial in which you can try moving data from Puhti to Mahti. It is an optional task so you don't necessarily need Mahti. - [ ] **Q: Hi. I do not know how to access to ssh connection, and answer to question: "Quiz: ssh connection to Puhti "** - A: You can access ssh by using command ssh in ie. Terminal or MobaXterm. - Did you follow the [Connecting to Puhti -tutorial](https://csc-training.github.io/csc-env-eff/hands-on/connecting/ssh-puhti.html)? - Which operating system are you using (Windows, Mac, Linux) - Can you elaborate the exact problem you encountered? - [x] **Q: Is it possible to use two different SSH keys for one desktop and one for laptop computer to connect Puhti/Mahti?** - A: Yes. It is possible. ### Technical issues - [x] **Q: I got some error message in Terminal on a mac. Please help!** - Help will come to you, but a bit more spesific details are needed for proper answer :nerd_face: - Here is some issues documented: [Mac FAQ HackMD](https://hackmd.io/@MatiasJJ/MacFAQ) - It is a work-in-progress — searching for [CSC Docs](docs.csc.fi) recommended - [x] **Q: Is there a way to paste text into linux terminal using (Finnish) keyboard only. (i mean without using right click). I am using PUTTY. On windows, how about `control + shift + C` for copying and `control + shift + V` It still does not work. :(** - A: In Putty (and MobaXterm) paste works with shift + insert. - [x] **Q: There is something bugging the "Quiz: Disk areas in HPC environment", the 4. question. I cannot click anything in the picture, and give my answer.** - A: The idea there is to drag and drop the description to the matching disk area -so choose one of the boxes on the bottom half and drag it to the correct box on the upper half. Once you have placed all the boxes, click "Check" button. - [ ] **Q: Topic 6. Batch job resource usage Quiz is giving randomly right and wrong answers. For example on question 1 *Amount of requested memory was OK* is at times the correct answer and other times incorrect.** - A: Just to make sure: There are three questions which all have same options. Thus there are times when option "*Amount of requested memory was OK*" is a correct answer and times when it is not. - If there is behaviour that differs from the aforementioned please tell us! ### Linux / OS-related questions - [x] **Q: In 8.2.2 Installing a simple C code from source it says to use `--prefix=/path/` with the `./configure` command, but then the command is `./configure --prefix=$PROJAPPL/mcl/version-14-137`. So why it is `$PROJAPPL` instead of `/projappl` and what does this mean?** - A: In the tutorial, the first step is to set an *environment variable*: `export PROJAPPL=/projappl/project_XXXX`. This defines a new variable `$PROJAPPL` which contains the path to your project in `/projappl` (the dollar sign is used in shell scripting to access the value of a variable). So you can think of `$PROJAPPL` as a kind of shortcut to your specific project directory in `/projappl`. Defining environment variables is useful especially if the paths stored are very long and cumbersome. You can of course substitute `$PROJAPPL` with `/projappl/project_XXXX` in the command, but it is good to be aware that such variables exists and are frequently used (e.g. in some scripts you may have downloaded from the internet). ### Files and data -questions - [x] **Q: My $HOME folder has met the limitation of maximum files. How to delete the useless files efficiently?** - A: Try to identify the directory that has unusual number of files in your home. Usual supects are in hidden folders such as $HOME/.conda or $HOME/.cache. It is possible that you may have installed some conda packages in your home and that has caused the problem (check: `find $HOME/.conda -type f | wc -l`). It is also possible that you may ahve run some complex workflows that has produced lot of logfiles in one of those hidden directories. In any case try to find the folder where you have lot files and then check whether they are useful (or useless). - [x] **Q: Files from projappl will not be removed after 90 days?** - A: No, they will not be removed from there. Removal is for `/scratch.` Even for `/scratch` the removal protocol have been disabled for quite some time. - [x] **Q: How to transfer files from IDA to Puhti?** - A: Instructions here: https://docs.csc.fi/data/ida/using_ida/ - [x] **Q: How to use `$LOCAL_SCRATCH` and where does it store file? For me pwd is not showing its directory. I am using command: `cd $LOCAL_SCRATCH`.** - A: Only some of compute nodes have NVMe-storage (accessible using env variable, `$LOCAL_SCRATCH`) and actual path to it is slightly different in different nodes. Once you change directory (using `cd`) to `$LOCAL_SCRATCH`, you can use `pwd` to find exact path on that compute node (or directly use command: `echo $LOCAL_SCRATCH` in batch script). Make sure to request NVMe-storage in sbatch directives before using it in your script. - [x] **Q: When using NVMe-storage the files have to be copied there and back in a batch-script. How to do that?** - A: Firstly, one has to request NVMe-storage using the following sbatch directive in your batch script: ```#SBATCH --gres=nvme:100 # Reserve the local NVMe storage, here 100 MB``` - and then include copy/move -commands to transfer input files of your application to NVMe-storage area as below: ``` mkdir $LOCAL_SCRATCH/analysis_folder # create your own folder inside local scratch folder with name e.g., analysis_folder cp /scratch/project_XXXX/inputfile(s) $LOCAL_SCRATCH/analysis_folder # if your inputfile(s) is a directory then you can use "cp -r folder ... " instead cd $LOCAL_SCRATCH/analysis_folder # Actual computation steps are performed in that folder. Once the computation is finished, move the resulting files/folders to (lustre-parallel) scratch area so that they are visible in your project folder in login node. mv $LOCAL_SCRATCH/analsyis_folder /scratch /project_XXXX/$USER # assumes you have a your own folder in your project under scratch ``` - [x] **Q: In Puhti if I do `echo $TMPDIR` it outputs `/local_scratch/myaccount`. What is the difference between this and `$LOCAL_SCRATCH` NVMe-disks?** - A: `$TMPDIR` is an environment variable that points to the 2900 GiB fast local storage in `/local_scratch` that exists on each *login node*. Similarly, the *compute nodes* have fast local storage (NVMe, non-volatile memory) accessible at `$LOCAL_SCRATCH`. The NVMe is, however, accessible only during job allocation. So one has to request NVMe-storage using a sbatch directive in your batch script and then copy/move data in and out. See also the answers above. - [x] **Q: I have an application related database of molecular property files that is currently exceeding my projappl disk area quota in terms of number of files. Naturally, I would like that these files would not end up to be periodically cleaned. Are there a suggested approach to pursue here? Request higher quota on projappl? Resort to Allas or cPouta for the database location?** - A: I would keep the files in Allas and then copy them to the proper /scratch-project when doing computations. Cleaning in scratch are not frequent, so this should be efficient. ProjAppl is not cleaned, but you might want to look for a more efficient way to store the data than separate files. Perhaps ask via servicedesk@csc.fi? ### Batch job and resources -questions - [x] **Q: What is correlation (if any) between CPUs and ntasks (appeared in bash script)? or are they same?** - A: It depends on how your code utilize multiple compute cores. With MPI, compute cores and ntasks are the same, while with OpenMP combined with MPI, compute cores used is: compute-cores-per-task * number-of-tasks. With only OpenMP set ntasks=1 and cpus-per-task=some-number-you-want. There is quite often some confusion between what is called compute-cores and CPUs. - A: There's more about this in the extra-curriculum slide set: https://a3s.fi/CSC_training/10_speed_up_jobs.html#/what-is-mpi - [x] **Q: So how to know in a batch-script that where the command is executed?** - A: Submitting the job using a batch script and the `sbatch` command will execute the command/program appropriately on the compute node(s). The batch job "starts" (`pwd` would return that) in the same folder where you submitted it. - [x] **Q: Is `srun` required at the start of the every batch job? E.g. is there difference between `srun module load` and `module load`.** - A: The `srun` command is used to launch a program through the batch script. It is *not* used for simple commands, such as `module load` or `csc-workspace` run e.g. on the login node. Do not launch any programs with `srun` on the login node. - The first line in the command prompt tells you which node you currently are in - [x] **Q: Is the requested run time a parameter for Slurm job priority?** - A: The requested run time is one of many parameters that is used in prioritising your job. Estimating the wall-time required for your job as accurately as possible will help in getting your job running as soon as possible. - The effect on priority is small, but a slot that can **fit** your job is easier to find, if the duration is shorter. --> less queueing. That said, the _minimum_ meaningful batch job duration is 30 minutes. Shorter jobs create unnecessary overhead (queuing, scheduling, setting up, cleaning up) - [x] **Q: How to determine what kind of job i am running? serial or parallel? How to determine that.** - A: This is a bit tricky. It depends on the code: some code can only run in serial, and some codes need so much computations that running in serial is practically useless. You need to become familiar with the code you are using so that you know how it is best used; serial or parallel. - In parallel one must also know exactly how it should be used. There are several ways to run in parallel, and it is important to do it the right way for each code separately. Only when you have figured out how to run your code, then you need to write the batch script so that the code runs in the correct way. Let's say you figured out you want to run in parallel with OpenMP using 8 cores. Then you consult the 'CSC docs web-pages' on how to build a batch script for such a batch run. - A2: I'm not 100% sure if the question was about planning a job or analysing an already running job. If the latter, check `scontrol show job <jobid>` and look for `NumCPUs=XXX`. `XXX` will tell how many cores were allocated. The same info can be found in your batch script (`--ntasks=Y, --cpus-per-task=Z`) Multiple cores are not used automatically, so you must specify your code to also use them. - [x] **Q: Which commands to run in login node and which one to run in batch job? (Example: loading modules before running the batch job or loading them inside the batch script?)** - A: It is a bit generic question. When you want to submit a job using batch script, it is a good practice load all necessary modules required for your application in the batch script itself. In that case you have to just use `sbatch` command on login node to submit your batch script. Any other light-weight operations, like `pwd`, `ls` can be done on login node. Note, that heavier interactive commands should be run on compute nodes, too, and this is convenient via the `sinteractive` command: https://docs.csc.fi/computing/running/interactive-usage/ - [x] **Q: What are the multiple lines printed out by `sacct` without extension, with extension .batch, .extern, and .[0-N]?** - A: These are slurm accounting details associated with different job preparation steps (= first queueing for its turn for job allocation and then ssh to a allocated compute node and finally executing your jobs e.g., using srun in that compute node) needed by batch scheduling system. <jobid>.batch accounts for the resources needed by the batch script; <jobid>.extern accounts for all resources needed by the job outside of slurm; <jobid>.[O-N] accounts for differnt runs (or srun if that is used) - [x] **Q In tutorial 9.1 when copying the batch-script there's errors in the output of `sbatch test.sh`:** ``` /var/spool/slurmd/job7281075/slurm_script: line 5: hugemem_longrun: command not found /var/spool/slurmd/job7281075/slurm_script: line 10: node.: command not found Hello from the container ``` - A: The issue is if the lines wrap when copy-pasting the batch-script from the tutorial. Check that in the batch-script the lines begin as expected. - [ ] **Q: When should I turn off multithreading/hyperthreading in my Slurm script? For instance, if I need CPU power more, should I turn hyperthreading off?** - A: Hyperthreading is very seldom needed. I would try all other means to improve performance first, and then you can try it. Hyperthreading shares the same physical CPU core for computational work, so it is only certain kinds of tasks where it can speed up work. However, in this case, as in all performance improvement cases, you need to try and see. Check the Topic 10, which has more discussion on how to speed up jobs. ### Allas -questions - [x] **Q: In the tutorial https://csc-training.github.io/csc-env-eff/hands-on/allas/tutorial_allas-in-batch-jobs.html, adding wc -l your-file-name > your-file-name.num_rows creates a new file with the suffix .num_rows in puhti and Allas. Is that the expected output, and should I enter file-name.txt in these fields?** - A: Running `wc -l your-file-name > your-file-name.num_rows` counts the number of lines in `your-file-name` and stores the output in a new file `your-file-name.num_rows` in your current working directory on Puhti. This file can then be transferred to Allas using e.g. the `a-put` or `rclone copy` commands. You should substitute `your-file-name` with whatever you have named your file as. - [x] **Q: An out-of-curiosity question: Accessing allas via python-swiftclient (and others for other languages...) allowed? Or is the idea to always bring data in with the command line tools? If allowed and possible, any tutorials especially on which auth and/or (even better) an environment specific authentication example? (Only had time to skim the docs during the presentation, but couldn't "instantly spot" one :-).** - A: See here: https://docs.csc.fi/data/Allas/using_allas/python_library/ - [x] **Q: Just to clarify, when using `a-publish` I need to have the file in puhti, not just in Allas?** - A: Yes, `a-publish` applies to a file in your local environment. - [x] **Q: Flipping will not expose any existing buckets, not using flip and publishing from a bucket will expose all files in the same bucket, is this right?** - A: `a-flip` will create a bucket for you (it doesn't ask for the bucket name) _and_ expose it. `a-publish` will create a bucket _if_ the bucket does not exist, and make it public, and all of the contents in it. - A2: If you want to make something available via http, but difficult for others to find, you can make the name of the bucket complicated. Here's an example: ``` [someuser@puhti-login3 slides]$ export bucket=`openssl rand -hex 16` [someuser@puhti-login3 slides]$ echo $bucket 8a02952ee792d023860454d810b35566 [someuser@puhti-login3 slides]$ a-publish -b $bucket 07_allas.html Files to be uploaded: 07_allas.html Bucket: 8a02952ee792d023860454d810b35566 Processing: 07_allas.html Checking total size of 07_allas.html. Please wait. Uploading data to allas. Transferred: 1.627M / 1.627 MBytes, 100%, 13.360 MBytes/s, ETA 0s Transferred: 1 / 1, 100% Elapsed time: 0.1s Confirming upload... 07_allas.html OK 07_allas.html uploaded to 8a02952ee792d023860454d810b35566 Public link: https://a3s.fi/8a02952ee792d023860454d810b35566/07_allas.html ``` - A3: If you need more fine grained access management, you have some other (a little bit more involved) options: https://docs.csc.fi/data/Allas/allas-nextcloud/ ### Software and Container -questions - [x] **Q:If I install packages in R (in an sinteractive way), do I need to re-install them the next time I use R?** - A: See here: https://docs.csc.fi/apps/r-env-singularity/#r-package-installations (in short, you can install R-packages to /projappl. You do not need to re-install them every time, although you need to modify the library trees accordingly each time when launching R) - [x] **Q: How can I make my one module file?** - A: If you want to set up some software i.e. put the path to the executable in your `$PATH`, and edit some other environment variables, perhaps easier would be to make an `alias` for it, or if it's very complicated, a script that your source to execute its contents in the current sessio. For example, to add path `~/bin` to `$PATH`, change the module setup to gcc/9.1.0, add this line to your `~/.bashrc`: - `alias myenv='export PATH=~/bin:$PATH;module load gcc/9.1.0'` - Then, (log out, back in) you can set your env with `myenv`. See also: https://a3s.fi/CSC_training/04_modules.html#/customizing-own-environment - It _is_ also possible to set up your own module "collections" or even (additional) module path, that will add to the system Module. See, https://docs.csc.fi/computing/modules/#using-your-own-module-files - [x] **Q: How can I direct python package installations to /projappl?** - A: You can e.g. set up a virtual environment in `/projappl`. Then when installing packages using `pip` these will be installed there (make sure to activate the environment first). See also https://docs.csc.fi/apps/python/#installing-python-packages-to-existing-modules ``` module load python-data python -m venv --system-site-packages /projappl/<my_project>/my-venv # Replace <my_project> with your project (project_xxxxxx) source /projappl/<my_project>/my-venv/bin/activate pip install my_package_to_install ``` - [x] **Q: Pip doesn't show any version of PyTorch. Where is it?** - A: There are several versions. First try at docs.csc.fi/apps/ at python and python-data. Also, try `module spider pytorch` - [x] **Q: I want to use Python PyTorch via kubectl. What's the recommended way to use/install it on Puhti?** - Puhti doesn't have Kubernetes, so kubectl is of no use to manage PyTorch jobs in there. You could do it in Rahti, i.e. use kubectl (or `oc` i.e. openshift commands) to manage the jobs there. However, at the moment there are no GPUs available in Rahti. See, https://docs.csc.fi/apps/python/ and https://docs.csc.fi/cloud/rahti/ - [x] **Q: Can I use Allas to set up persistent storage for my cloud instances?** - A: Kind of. But, there are other disk systems for read/write application data, which is also persistent (see https://docs.csc.fi/cloud/rahti/storage/persistent/) and for additional security, at the end of the page we recommend a backup to Allas, and here's a tutorial on how: https://docs.csc.fi/cloud/rahti/tutorials/backup-postgres-allas/ - [ ] **Q: Is there a way to use NodeJS in Puhti?** - A: What would you use it for? How about cPouta or Rahti? It is possible to download and install (e.g. in /projappl) nodejs binaries, which you can use (also in containers) when added into path / mounted in container. If you have a more specific question, please contact us via servicedesk@csc.fi ## ☃️ ICE BREAKER (HackMD -practice) Let's learn how to use this HackMD document by answering an ice breaker question! So back to elementary school it is: "Best thing about my summer vacation was..." **Results:** ... were nice ## ✏️ Add your questions here Please type your questions here. We will answer them, and organise the document topically. - [x] **Q: Have I clicked the edit mode on?** - A: Probably not yet.. ↖️ - [x] **Q: Should I copy-paste an old question to get started with a new one?** - A: A really good idea! Here's a template for you ⬇️ - [ ] **Q: ** - A: --- **Write your questions above this line** The questions are moved upwards :arrow_up: into their categories when they get an answer. ---