# Using CSC HPC environment efficiently 2 / 2021 ###### tags: `puhti` `mahti` `allas` > This is the common "notebook" for the "Using CSC HPC environment efficiently" course organised in May 2021 at CSC -IT center for Science. [Course page in e-Lena platform](https://e-learn.csc.fi/course/view.php?id=43) > :::danger This is the place to ask questions about the workshop content! We use the Zoom chat only for posting links, reporting Zoom problems and such. ::: :::info :bulb: **Hint:** HackMD is great for sharing information in this kind of courses, as the code formatting is nice & easy with MarkDown! Just add 3 ticks (``` ` ```) for the``` code blocks ```. Otherwise, it's like a Google doc: it allows simultaneous editing. ::: [ToC] ## Agenda ### Monday - 8:50 - Login, check connections work, test microphone - 9:00 - Welcome, course practical info - 9:15 - Chapter 1: HPC Environment - 9:30 - Chapter 2: Connecting + Hands on 2.1 (10 min) - 9:50 - _Break_ - 10:00 - Chapter 3: Disk Systems + Hands on 3.1 (15 min) - 10:30 - Chapter 4: Module System + Hands on 4.1. (10 min) - 10:50 - _Break_ - 11:00 - Chapter 5: Batch Jobs + Hands on 5.1 (25 min) - 11:50 - Recap + preparations for Tuesday - 12:00 - Finish - ### Tuesday - 9:00 - Chapter 6: Understanding resource usage + Hands on 6.1 (15 min) - 9:50 - _Break_ - 10:00 - Chapter 6: Hands on continued - 10:30 - Chapter 7: Allas - 10:50 - _Break_ - 11:00 - Chapter 7: Allas continued + Hands on 7.2 (30 min) - 11:50 - Recap + preparations for Wednesday - 12:00 - Finish ### Wednesday - 9:00 - Chapter 8: Installing applications + Hands on 8.1 (15 min) - 9:50 - _Break_ - 10:00 - Chapter 8: Hands on continued - 10:30 - Chapter 9: Containers - 10:50 - _break_ - 11:00 - Chapter 9: Containers continued + Hands on 9.2 + ask anything on all previous topics (40 min) - 11:00 - You will receive a feedback link to your e-mail - 11:50 - Recap + feedback + open questions - 12:00 - Finish ## Code of conduct We strive to follow the [Code of Conduct developed by The Carpentries organisation](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html) to foster a welcoming environment for everyone. In short: - Use welcoming and inclusive language - Be respectful of different viewpoints and experiences - Gracefully accept constructive criticism - Focus on what is best for the community - Show courtesy and respect towards other community members --- ## TO DO before the course - Please make sure you have a CSC user account, project with Puhti and Allas as services, and have accepted the terms of use for Puhti and Allas in my.csc.fi!  - Check & test that you have a functional terminal connection (https://docs.csc.fi/computing/connecting/, graphical connection not needed) - Complete the prerequisite skill test / mini course "New to command line? Test your skills!" https://e-learn.csc.fi/course/view.php?id=58. (Registration key sent to you via e-mail) - Login to e-Lena course page (https://e-learn.csc.fi/course/view.php?id=43) and self-register to the course (registration key sent to you via e-mail) - Check the Zoom instructions below ## Zoom instructions - Link to the zoom room was sent to you via e-mail - Please arrive 5-10 minutes before to test your microphone setup. - Use your **full name** (Firstname Lastname)! ![](https://i.imgur.com/TAHEHVt.png) - During the course, please remember to always mute your microphone when you are not speaking. - Please use a headset in order to avoid echo (a simple phone headset is just fine). - You can find all the controls (mic, video, chat, screen sharing) at the bottom of the Zoom window (when you bring your mouse there). - You can use the chat box for questions and comments, but please make sure you reply to "all panelists and participants" instead of just "all panelists", which is often the default. - If you have a spoken question/comment, please use the "raise hand" button: we will then give the floor (and microphone rights) to you. - Note: for questions and answers about the course topics, we will be using this living document - Break-out rooms: More info:https://support.zoom.us/hc/en-us/articles/115005769646-Participating-in-Breakout-Rooms#collapseWeb Breakout rooms are smaller sessions that are split off from the main Zoom meeting. They are completely isolated in terms of audio and video. The host will need to invite you to join the breakout room, after which you can click "Join" in the notification pop-up. Mouse over the number of people in the breakout room to reveal the "Join" button. ![](https://i.imgur.com/wRJ2gqp.png) ## :memo: Q & A You can type your questions here. We will answer them, and this document will store the answers for you for later use! :rocket: ### General and practical matters - [x] **Q1: I have difficulty pasting my questions into HackMD (here). Do you have some instructions on how to write here?** - A: Can you see these three icons on top left corner, next to HackMD text? There’s pencil, this side-by-side symbol, and an eye. In eye view, you can’t edit, you are just viewing. The other two reveal the markdown (MD) version of the page, which you can edit. I find it easiest to edit with the side-by-side view. :::info :bulb: **Hint:** You can also apply styling from the toolbar at the top :arrow_upper_left: of the editing area.![](https://i.imgur.com/Cnle9f9.png) ::: - [x] **Q2: I cannot access most of the slides from the course page. These are marked with an earth logo.** - A: Try to use the arrow keys or space bar to switch the slide! - [x] **Q3: Slides available after the course?** - A: They sure are! We are developing self-learning material in these topics. You will have the access to this HackMD document as well as the e-lena course materials also after the course. The slides and exercises are available here: - Slides: https://a3s.fi/CSC_training/README.html#/using-csc-hpc-environment-efficiently - Exercises: https://csc-training.github.io/csc-env-eff/ - [x] **Q4: Has the Zoom meeting already started? I'm just gettin "The meeting has not started" -note.** - A: Make sure you have the full Zoom link when connecting! It's long and contains the password :) - [x] **Q5: Do we get ECTS credits from this course for our university transcripts?** - A: We will sent a certificate in e-mail to the course participants that were present in both sessions, where we recommend 0.5 credits for this course. We are not allowed to give credits, but we can recommend them, and by taking the certificate to your university, they usually accept :) - NOTE: If you didn't receive the certificate, and were participating, please contact our event support (event-support (at) csc.fi). We have a new process we are just starting to use, so thank you for your understanding and sorry for the possible inconvenience! - [x] **Q6: Should I get my "own" project for the course, can't I just use a project you provide?** - A: The course project will be available only during the course, and before and after that you need to do the exercises on your own projects anyway :) Also, we wish you to help you get past this first obstacle of setting up everything in the beginning! - [x] **Q7: So many credendials... Which ones do I need and where?!?** - A: You need CSC credentials to login to Puhti, whereas for my.csc.fi and e-Lena you can nowadays login also with Haka credentials (=your university credentials). In my.csc.fi you can check the CSC username. If you have forgotten your password, check this link: https://docs.csc.fi/accounts/how-to-change-password/ -Note that it takes some time for the password to update to Puhti, if you need to change that :) - [x] **Q: How long the CSC account for this cource is valid? I would like study and play with it afterwards.** - A: Your CSC account is yours, but the course project is available only during the course (today still, but closed tomorrow). See how to create/join a project: https://research.csc.fi/accounts-and-projects (If you were using a training account, that is only valid during the course, but I don't think many had that case.) - [x] **Q: How long are the course materials available online? Tutorials, slides etc.?** - Tutorials and slides will be there indefinitely (https://a3s.fi/CSC_training/README.html and https://csc-training.github.io/csc-env-eff/). Access to elearn (and the quizzes there) may discontinue. The main material will thus be available. ### User account, logging in to Puhti, ssh... - [x] **Q1: I have a CSC login password as well as Haka password. I guess CSC login password should work for Puhti?** - A: Yes, you need CSC credentials to login to Puhti - for my.csc.fi (and e-Lena) you can nowadays login also with Haka credentials (=your university credentials), but to access Puhti, you need the CSC username and password. In my.csc.fi you can check the username. If you have forgotten your password, check this link: https://docs.csc.fi/accounts/how-to-change-password/ -Note that it takes some time for the password to update to Puhti :) - [x] **Q2: I can see the course's project in list of projects in MyCSC profile but I can't access. i.e it doesn't show up under `csc-workspaces`.** - A: Did you accept the terms of use in MyCSC? See: https://docs.csc.fi/accounts/how-to-add-service-access-for-project/#member If you did, it should look like this, with the green tick marks: ![](https://i.imgur.com/jFW7vw4.png) - There might be also some delay before this acceptance information is updated to Puhti. Please try logging out and back in! - [x] **Q3: I am trying to connect to RStudio using the SSH client. When I reach step 3 (on: https://docs.csc.fi/support/tutorials/rstudio-or-jupyter-notebooks/) I have trouble connecting to the service. The system asks for my password for csc and then states 'setsockopt IPV6_TCLASS 16: Operation not permitted'. And then asks for a password from .bullx's, where my normal password doesn't work for. Any solutions?** - A: Which ssh client are you using? Did you successfully set up the ssh-keys? - Q: I set up a SSH key with puhti a week or 2 ago. When I get to step 3 of the RStudio docs file (create a SSH tunnel), it says to connect to a SSH terminal without logging into puhti yet. But I cannot access a SSH terminal without logging into the puhti (session > SHH > remote host in MobaXterm). I managed to get RStudio to work in an interactive session although I think it is still on login node rather than computer node. - A: In MobaXterm, for the step 3 open first local terminal - "Start local terminal", in the middle of MobaXterm front page. - Q. I run the ssh -N -L 8787:localhost:40072.... in a new local terminal, it then asks for my passphrase and then tries to open a channel but the connection fails. Also, where do I open the local browser? Is this in my normal chrome, for example, or from MobaXterm? - A: Local browser is Chrome, Firefox or whatever you are otherwise using. - A: You can easily see if you are on compute or login node from your command prompt. For login nodes it is puhti-login1 or puhti-login2. For compute nodes it is something like r07c49, but the numbers might be different. - -- This makes sense. Thank you. - [x] **Q: I'm currently running linux via Oracle's VirtualBox on Windows in order to ssh into puhti. Are there any drawbacks to this method I should be aware of? Thanks** - A: I'm not aware of issues, it should be similar as using an ssh client on the host operating system. However, please let us know if any of the exercises don't work! ssh from linux should generally work well. Here's a link to this topic in [CSC's Introduction linux slide source](https://github.com/csc-training/linux-1/blob/master/02-Linux_on_my_own_computer.md) and [corresponding video](https://video.csc.fi/media/0_5rr1616y). - [x] **Q: I tried running the advanced tutorial in 2.3 but I keep getting an error. When I create the ssh tunnel in a second terminal, it first asks me for the puhti password which works fine, and then for the interactive session password which always returns the same message: *Permission denied, please try again*. Any ideas on how to solve this?** - Which ssh-client are you using? Please note, that the instructions were made for MobaXterm (or the standard shell in linux/MacOS. Also, tunneling can be problematic if you're connecting via a proxy. In that case, you might need to ask from your local IT support). At the bottom of the page (and in a link in the middle) there are separate instructions for Putty. - Follow up: The issue was linked to the ssh key indeed. I set up a new key and it worked. Thank you! - [x] **Q: I have access to Puhti-FMI and Allas, but I can't request access to Puhti. How to get it?** - A: You can check in my.csc.fi if your project have access to Puhti. Only the project manager can request change in accesses. Check this docs page: https://docs.csc.fi/accounts/how-to-add-service-access-for-project/ - Note, for FMI staff the project has to be either academic (i.e. have access to normal queues but not fmi and fmitest) or FMI (when it will have access to only those two queues). More info: https://docs.csc.fi/accounts/fmi/ - [x] **Q: In 2.2/2.3 while trying to connect to the Jupyter server via MobaXterm with the ssh command (after starting the server) it gives an error "stty: standard input: Inappropriate ioctl for device" and then asks for a password, but I can't figure out which password it is. I've tried my CSC password and the passphrase for the ssh keypair but neither works. It asks password for username@r07c51.bullx** - A: You should not need a password. I guess the ssh-key setup is not correct. Can you login normally without password with the ssh-key? - Q: There probably was a problem with the ssh-key setup as you mentioned, since after setting the keys up again it worked. Also to clarifyy, it still gives the same stty error, but no longer asks for a password and I can now connect to the notebook with my browser. ### Modules and disk areas - [x] **Q: echo $LOCAL_SCRATCH returns an empty line, cannot figure out why** - A: `$LOCAL_SCRATCH` points to the NVMe fast local disk in the compute nodes. If you give the command on the login node, it should be empty. You can set up an interactive job and try it in there. - [x] **Q: Are these modules available in epouta too?** - A: No. ePouta will have a plain operating system and you need to install all applications. - [x] **Q: `module list` returns the modules already loaded. Is the order / number next to each module relevant? For example Stdev was listed as 4) first time I ran module list, then after module load gromacs-env, Stdev was listed as 1). Is that number an indication of which module was loaded first or last?** - A: I have never paid any attention to the order, and am not aware of any meanning for it. ### Batch jobs - [x] **Q: In tutorials 2.2 or 2.3 how much memory and time should be given for the interactive session? I am only planning to open the environment in a Jupyter notebook.** - A: That's the million dollar question :) Start with something reasonable. The default for batch jobs is 1GB per core, so that would be good (not overkill, but good for most use cases). If your job runs out of memory it will be killed, but then just ask for more memory. This is the topic for tomorrow morning. - [x] **Q: How long the queue usually is?** - A: This varies a lot and is (typically) different for all queues. On Puhti longrun, gpu and large memory e.g. can be overbooked, while Mahti has currently free resources. There are two lengths: the maximum duration that can be asked and how long you need to wait for your job to start. Perhaps you meant the latter. As said above, this varies, but for resources that are easy to provide (few cores, not too much memory, no GPU, no NVMe) you should get the job running quickly (either immediately on within hours). If you ask for 7 days runtime and 1.5TB of memory, you might need to queue for days. You can look at the partition table to see what kind of nodes exist (and reverse engineer from it what kind of resource requests are easy to fullfil) https://docs.csc.fi/computing/running/batch-job-partitions/#puhti-partitions - [x] **Q: Does the priority depend only on the partition and number of jobs run, but not for example on used nodes or memory?** - A: There are different queues for different cathegories of memory and nodes. That is, *large*, *medium*, *small*, and *test* have different limits for number of nodes, and *large-mem* queue is for really large memory usage. The priority is different in the different queues, where *test* e.g. has a very high priority, but is meant only for short and small tests. Also the history affect the priority. That is, if you continuously submit jobs, your priority is a bit lower compared to occasional users. - [x] **Q: What is the principle and tips to estimate the memory needed for my job?** - A: This is a bit tricky, but you can run a test job and then use the `seff` command to see your memory efficiency. The command is `seff JOBID`, where JOBID is the number your job is given. JOBID, you get e.g. via the command `squeue -u yourusername`, while your job is running. - [x] **Q: How the billing units are calculated?** - A: Check this page: https://research.csc.fi/pricing There's a billing unit calculator as well. And also please look at our other documentation page: https://docs.csc.fi/accounts/billing/ This will also be covered on Tuesday morning. - [x] **Q: What is a good way to estimate the time Puhti or Mahti will use for my computations?** - A: It is best to make a short test run in the test queue. Then try to extrapolate to longer and/or larger runs. After using your code for a while one typically learns how long a job takes. Having a restart option is also very useful. If the job stops for running out of time, it can be restarted where it stopped. - [x] **Q: In tutorial 5.2 "A simple MPI job" after using cat on the slurm file it gives an error "The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute..." What could be the problem?** - A: Perhaps the job was not submitted with `sbatch`? Can you list all the commands you gave and also list the contents of your batch script? - Q: After trying it again after Monday's session I got it working as in the instructions. - [x] **Q: Also the same problem with the simple MPI job, the second part of tutorial 5.2** - See above. - [x] **Q: A lot of error messages in tutorial 5.2 (A simple MPI job)** - See above. - [ ] **Q: In exercise 5.4 part 1 when trying to run 'tree pythium' while using sinteractive I get a message 'bash: tree: command not found'. Is this a mistake or do I have to exit the interactive partition in order to complete the exercise?** - A: `tree` command is not unfortunately available on interactive partition. You can instead use for example `ls -laR` command which will also show almost same info but in different format. We have `tree` command on login though. - [ ] **Q: In 5.4.3., while downloading data with the EnaDataGet command, error message: "No WGS set file available for AKYA02000000, format fasta"** - A: There seems to be currently some issue with this dataset to retrieve using EnaDataGet tool. Use direct ftp download link instead for the purpose of this exercise. Ùse the following command for downloading fasta dataset:``` wget ftp://ftp.ebi.ac.uk/pub/databases/ena/wgs/public/aky/AKYA02.fasta.gz``` - [x] **In 5.5, Puhti's GIS data folder : /appl/data/geo/mml/dem10m/rs-tm35fin-n2000/ is not available** - Thanks for reporting this! The exercise has been edited to have a working path. - [x] **Q: It is still unclear to me, where the batch jobs should be started: in my home directory or in /sratch/ or somewhere else?** - A: They should be started in /scratch/project_xxxx. Home directory is too small. - [x] **Q: What about Mahti? (on your slide "1 TiB year of additional space in Puhti (scratch, projappl) 50000 BU")** - A: All info here: https://docs.csc.fi/accounts/billing/ ### Understanding batch job resource usage and efficiency - [x] **Q: If you underestimate the required resources and the job is killed, will your job remain at the front of the queue or does it go to the back of the line?** - A: It must be resubmitted and goes to the 'back' of the queue. It is treated as a new job. A restart option in the code is a good idea - then the job can just continue almost from where it was stopped. - [x] **Q: In tutorial 6.1 file "/appl/soft/bio/course/sacct_exercise/test-a" requires sudo access or at least permissions, that my user does not have** - A: The permissions have been fixed. Sorry and thanks for reporting! - [x] **Q: In the e-lena quiz for part 6 the solution to the second question states that the CPU time wasn't used efficiently because the efficiency was very high, 99.71 %. Then in the third question the solution states that the CPU time was used efficiently because the efficiency was high, 99.93 %. Why do these solutions state the opposite? Is there something else in the batch script or the seff output that explains this?** - Thanks for reporting! The *text* in the second quiz option is correct, but it has been erroneously flagged as a wrong answer. We'll fix this after the course (editing now would restart your quizzing...) 95% efficiency is high and good, but please remember that s busy CPU (as known to slurm) does not mean the work is done efficiently. Please mind the caveats mentioned in the slides. - [x] **Q: Tutorial 6.1 (original): I did "sacct -X -j 6111805" which gave state "TIMEOUT" for subjob 1. However, the subjob 1 was finished anyway. On the other hand, the result for subjob 3 was TIMEOUT too, but it was not finished (naturally due to time limit). Why the first job 1 w/ TIMEOUT was finished?** - A: With finished you mean the output text file says it finished? Perhaps exceeding the allocated time with 1 second was fast enough not to get the job killed? (slurm activates when it notices this. Also, the memory exceeding job was killed when it had exceeded the limit by 50% - it seems slurm is quite loaded already) - Q: Yes, the output text file said so ("Subjob 1 finished."). seff also gave "00:01:01 core-walltime" for the sub job 1 with TIMEOUT and for the sub job 2 with COMPLETED state. - A: There was indeed a bit of a problem with this exercise. We had 1 minute reservation and the job took exactly 1 minute, so we got this confusing output. It has now been fixed. If you re-run it now you'll get a clearer output. (Most jobs should complete with two failing: one with timeout and one with running out of memory). - [x] **Q: I have been reading through the R-env-singularity docs as I need to do a R analysis using CSC resources. I'm a little bit confused as to knowing when you would need to do a serial job or parallel job (array, multicore, etc), and how would you know what you need in terms of number of tasks, nodes, and cores? As an example, if I'm thinkng I want to run 4 different models/analyses, should I do it in 4 R-Scripts with no. of tasks =4, cores =8 (2cores per task), node=1) and written as a parallel job? The R-Scripts would be written as a command under the #SBATCH script. Does that sound like I have the right idea?** - A: It is possible to implement your job either as parallel or serial job. Please note that parallelism does not come out of the box just because we reserved the more cores/tasks. The code that that you write after sbatch directives should be a parallel. Usually, in R environment, when you have loops in your code, you can make use of some packages like parallel/doParallel/mcapply to be compatible with parallel codes. If your aim in this case is to perform some heavy computational tasks with four differenmt models, each one requires very long time, you can then for example put then in loop and use one of those R parallel packages to make your code parallel. Make sure to utilise two cores per task there in each model. ### Pouta (not covered in the course) - [x] **Q: All these commands of batch processing incluing seff, sacct, squeue will execute in ePouta?** - A: No. In ePouta you run locally build Virtual Machines that typically don't have batch job system. ### Allas - [x] **In 7.1: a-put pythium & getting error message: "lfs quota: cannot open '/scratch/project_2004306': Permission denied"** - A: Did you check in mycsc.fi if you had accepted all terms of use for this project? Do you have some other scratch folder you can use? - [x] **Q: Is there a doc page which tells you all the different types of files Puhti can read?** - Puhti itself does not really read files. It is the different software that read different types of files. There are a large number of software installed, and you can install more yourself. So if you have files with a given format - look for software that can handle that format. ### Installing - [x] **Q: How can I install Cargo (Rust package manager) in Puhti?** - A: Look for options to direct the installation to /projappl/project_xxxxx instead of the system default. Then it might work. (DuckDuckgo tells me that) e.g. these environment variables could be used to control where stuff goes https://rust-lang.github.io/rustup/environment-variables.html ### Containers - [x] **Q: Why should we go for container rather than anaconda environment? If we have to install by conda packages? Is there any otherway to install content in singularity?** - A: The main downside of conda-based installations is that you end up with a lot of files. In my experience conda environments can also be "fragile", i.e. sensitive to changes in host environment. Many software that is available as conda packages is also available as Docker containers. It is also possible to replicate an existing conda environment as a container. Check the tutorial. In Puhti using a conda environment (with lots of files) can be slower than a container and more load for all users. - Ohk..Thank you - [x] **Q: I'm doing the extra exercise 9.4 and have installed singularity 3.8.0 to my laptop succesfully. I've made the file centos.def but when I try to build the container I get a message 'FATAL: While performing build: conveyor failed to get: neither yum nor dnf in path'. What should I do?** - A: Never mind, I realized that I didn't have yum or dnf installed. ## ICE BREAKER **Results:** Finland won by 10 votes! Italy got second place with 5 votes, and then there was a three-way-tie with 4 votes for Iceland, France and Switcherland. Finland 10 Italy 5 Iceland 4 France 4 Switzerland 4 Ukraine 3 Sweden 1 Germany 1 Netherlands 1 Portugal 1 NaN 1 Never watched it! 1 Didn't watch 1