LUMI General Course

# LUMI General Course 14.--17.2.2023 9:00--17:30 (CET), 10:00-18:30 (EET) Zoom link: https://cscfi.zoom.us/j/65207108811?pwd=Mm8wZGUyNW1DQzdwL0hSY1VIMDBLQT09 :::info Please ask your questions at [the bottom of this document](#EOF) <-- click here ::: --- [TOC] ## General Information - Link to this document: [https://md.sigma2.no/lumi-general-course?edit](https://md.sigma2.no/lumi-general-course?edit) - [Schedule](#Schedule) - Zoom link: https://cscfi.zoom.us/j/65207108811?pwd=Mm8wZGUyNW1DQzdwL0hSY1VIMDBLQT09 # ### Next public HPC coffee break **22.2.23, 13:00--13:45 (CET), 14:00--14:45(EET)** Meet the LUMI user support team, discuss problems, give feedback or suggestions on how to improve services, and get advice for your projects. Every last Wednesday in a month. [Join via Zoom](https://cscfi.zoom.us/j/68857034104?pwd=UE9xV0FmemQ2QjZiQVFrbEpSSnVBQT09) ## Schedule All times CET. <table style="text-align: left;"> <tbody> <tr> <td colspan="2" align="center"> DAY 1 – Tuesday, 14.2.2023 </td> </tr> <tr> <td>09:00  </td> <td>Welcome and introduction Presenters: Emmanuel Ory (LUST), Jørn Dietze (LUST), Harvey Richardson (HPE) </td> </tr> <tr> <td>09:10</td> <td>Introduction to the HPE Cray Hardware and Programming Environment <ul> <li>Focus on the HPE Cray EX hardware architecture and software stack.</li> <li>Tutorial on the Cray module environment and compiler wrapper scripts.</li> </ul> Presenter: Harvey Richardson (HPE)  </td> </tr> <tr> <td>10:30</td> <td>break (20 minutes) </td> </tr> <tr> <td>10:45</td> <td>First steps for running on Cray EX Hardware <ul> <li>Examples of using the Slurm Batch system, launching jobs on the front end and basic controls for job placement (CPU/GPU/NIC)</li> </ul> Presenter: Harvey Richardson (HPE)  </td> </tr> <tr> <td>11:20</td> <td>Exercises </td> </tr> <tr> <td>12:00</td> <td>lunch break (90 minutes) </td> </tr> <tr> <td>13:30</td> <td>Overview of compilers and Parallel Programming Models <ul> <li>An introduction to the compiler suites available, including examples of how to get additional information about the compilation process.</li> <li>Cray Compilation Environment (CCE) and options relevant to porting and performance. CCE classic to Clang transition.</li> <li>Description of the Parallel Programming models.</li> </ul> Presenter: Alfio Lazzaro (HPE)  </td> </tr> <tr> <td>14:30</td> <td>Exercises </td> </tr> <tr> <td>15:00</td> <td>break (30 minutes) </td> </tr> <tr> <td>16:00</td> <td>Scientific Libraries <ul> <li>The Cray Scientific Libraries for CPU and GPU execution.</li> </ul> Presenter: Alfio Lazzaro (HPE)  </td> </tr> <tr> <td>16:30</td> <td>Exercises </td> </tr> <tr> <td>17:00</td> <td>Open Questions & Answers (participants are encouraged to continue with exercises in case there should be no questions) </td> </tr> <tr> <td>17:30</td> <td>End of the course day </td> </tr> <tr> <td colspan="2" align="center"> DAY 2 – Wednesday, 15.2.2023 </td> </tr> <tr> <td>09:00</td> <td>OpenACC and OpenMP offload with Cray Compilation Environment <ul> <li>Directive-based approach for GPU offloading execution with the Cray Compilation Environment.</li> </ul> Presenter: Alfio Lazzaro (HPE) </td> </tr> <tr> <td>09:45</td> <td>Exercises: about 30 minutes</td> </tr> <tr> <td>10:15</td> <td>break (30 minutes)</td> </tr> <tr> <td>10:45</td> <td>Advanced Application Placement <ul> <li>More detailed treatment of Slurm binding technology and OpenMP controls.</li> </ul> Presenter: Jean Pourroy (HPE) </td> </tr> <tr> <td>11:30</td> <td>Exercises </td> </tr> <tr> <td>12:00</td> <td>lunch break (75 minutes) </td> </tr> <tr> <td>13:15</td> <td>Understanding Cray MPI on Slingshot, rank reordering and MPMD launch <ul> <li>High level overview of Cray MPI on Slingshot</li> <li>Useful environment variable controls</li> <li>Rank reordering and MPMD application launch</li> </ul> Presenter: Harvey Richardson (HPE) </td> </tr> <tr> <td>14.10</td> <td>Exercises </td> </tr> <tr> <td>14:40</td> <td>break (20 minutes) </td> </tr> <tr> <td>15:00</td> <td>Additional software on LUMI <ul> <li>Software policy.</li> <li>Software environment on LUMI.</li> <li>Installing software with EasyBuild (concepts, contributed recipes)</li> <li>Containers for Python, R, VNC (container wrappers)</li> </ul> Presenter: Kurt Lust (LUST) </td> </tr> <tr> <td>16:30</td> <td>LUMI support and LUMI documentation. <ul> <li>What can we help you with and what not? How to get help, how to write good support requests.</li> <li>Some typical/frequent support questions of users on LUMI?</li> </ul> Presenter: Jørn Dietze (LUST) </td> </tr> <tr> <td>17:00</td> <td>Open Questions & Answers (participants are encouraged to continue with exercises in case there should be no questions) </td> </tr> <tr> <td>17:30</td> <td>End of the course day </td> </tr> <tr> <td colspan="2" align="center"> DAY 3 – Thursday, 16.2.2023 </td> </tr> <tr> <td>09:00</td> <td>Performance Optimization: Improving single-core efficiency Presenter: Alfio Lazzaro (HPE) </td> </tr> <tr> <td>09:30</td> <td>Debugging at Scale – gdb4hpc, valgrind4hpc, ATP, stat Presenter: Thierry Braconnier (HPE) </td> </tr> <tr> <td>09:50</td> <td>Exercises: about 20 minutes</td> </tr> <tr> <td>10:10</td> <td>break (20 minutes) </td> </tr> <tr> <td>10:30</td> <td>I/O Optimisation - Parallel I/O <ul> <li>Introduction into the structure of the Lustre Parallel file system. </li> <li>Tips for optimising parallel bandwidth for a variety of parallel I/O schemes. </li> <li>Examples of using MPI-IO to improve overall application performance.</li> <li>Advanced Parallel I/O considerations</li> <li>Further considerations of parallel I/O and other APIs.</li> <li>Being nice to Lustre</li> <li>Consideration of how to avoid certain situations in I/O usage that don’t specifically relate to data movement.</li> </ul> Presenter: Harvey Richardson (HPE) </td> </tr> <tr> <td>11:40</td> <td>Exercises: about 20 minutes </td> </tr> <tr> <td>12:00</td> <td>lunch break (90 minutes) </td> </tr> <tr> <td>13:30</td> <td>Introduction to AMD ROCm ecosystem and HIP Presenter: George Markomanolis (AMD)  </td> </tr> <tr> <td>14:30</td> <td>Exercises </td> </tr> <tr> <td>15:00</td> <td>break (30 minutes) </td> <tr> <td>15:30</td> <td>Debugging Presenter: George Markomanolis (AMD) </td> </tr> <tr> <td>15:55</td> <td>Exercises </td> </tr> <tr> <td>16:15</td> <td>Introduction to AMD Rocprof Presenter: George Markomanolis (AMD) </td> </tr> <tr> <td>16:35</td> <td>Exercises </td> </tr> <tr> <td>17:00</td> <td>Open Questions & Answers (participants are encouraged to continue with exercises in case there should be no questions) </td> </tr> <tr> <td>17:30</td> <td>End of the course day </td> </tr> <tr> <td colspan="2" align="center"> DAY 4 – Friday, 17.2.2023 </td> </tr> <tr> <td>09:00</td> <td>Introduction to Perftools - Perftools-lite modules <ul> <li>Overview of the Cray Performance and Analysis toolkit for profiling applications.</li> <li>Demo: Visualization of performance data with Apprentice2</kli> </ul> Presenter: Alfio Lazzaro (HPE) </td> </tr> <tr> <td>09:40</td> <td>Exercises </td> </tr> <tr> <td>10:10</td> <td>break (20 minutes) </td> </tr> <tr> <td>10:30</td> <td>Advanced performance analysis <ul> <li>Automatic performance analysis and loop work estimated with perftools</li> <li>Communication Imbalance, Hardware Counters, Perftools API, OpenMP</li> <li>Compiler feedback and variable scoping with Reveal</li> </ul> Presenter: Thierry Braconnier (HPE) </td> </tr> <tr> <td>11:30</td> <td>Exercises </td> </tr> <tr> <td>12:00</td> <td>lunch break (90 minutes) </td> </tr> <tr> <td>13:30</td> <td>Introduction to AMD Omnitrace Presenter: George Markomanolis (AMD)  </td> </td> </tr> <tr> <td>13:55</td> <td>Exercises </td> </tr> <tr> <td>14:15</td> <td>Introduction do AMD Omniperf Presenter: George Markomanolis (AMD)  </td> </tr> <tr> <td>14:40</td> <td>Exercises </td> </tr> <tr> <td>15:00</td> <td>break (30 minutes) </td> </tr> <tr> <td>15:30</td> <td>Best practices: GPU Optimization, tips & tricks / dem   </td> </tr> <tr> <td>16:30</td> <td>Exercises </td> </tr> <tr> <td>17:00</td> <td>Open Questions & Answers (participants are encouraged to continue with exercises in case there should be no questions) </td> </tr> <tr> <td>17:30</td> <td>End of the course </td> </tr> </tbody> </table> ## Slides and other material All slides will be made accessible during the training on LUMI at `/project/project_465000388/slides`. You need to join the training project via the link you received in the email after you signed up. For CSC users that involves setting up a new account via Puhuri. Some training documents will also be published on https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20230214/ # Q&A :::danger Please always ask new questions at the end of the document. ::: ## Day 1 ### Organisation & HedgeDoc #### Ice breaker: Did you manage to join the training project (Put a "x") Yes: xxxxxxxxxxxxxxxxxxxxxxxxx No: xxxxx In case you had problems, please open a ticket [here](https://lumi-supercomputer.eu/user-support/need-help/account/) ### Other questions regarding organisation or LUMI in general 1. I managed to log onto Lumi, but after a few minutes everything "freezes" and I have to use a different terminal to log in again: is it normal? That already happened severla times since this morning, even using different login nodes). - It depends. If it freezes forever than it may be your terminal application or unstable connection. Shorter freezes that can still last 30 seconds or more are currently unfortunately a common problem on LUMI and caused by file system issues for which the technicians still haven't found a proper solution. There's only two of the four login nodes operating at the moment I think (one down for repair and one crashed yesterday evening and is not up again yet, at least not when I checked half an hour ago) the load on the login nodes is also a bit higher than usual. - uan02 seems to work a bit better 2. I do not have access to the training project yet. I opened a ticket [LUMI #1509] and it seems that is not straightforward to activate it using my UFZ account. - We will discuss this further in the ticket. It is more than a UFZ problem as any of the fallback mechanisms also don't seem to work for you. - Okay... Thanks 3. Will we find material in the /scratch/project_465000388 folder? - The location of the files will be posted on here and later appear in https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20230214/schedule/ 4. Is LUMI up? I am not able to connect at all. - One of the login nodes has crashed but is unfortunately still in lumi.csc.fi. Try lumi-uan01.csc.fi or lumi-uan02.csc.fi. 5. When I try `cd /project/project_465000388/slides/` got "-bash: cd: project_465000388: Permission denied" - Do you use the account you got for the training project? (In case you had a CSC account earlier). You should see the project with `groups` - not in my projects list!!! I used my CSC account - You have to use the Puhuri account (can you please provide link?), you got when you set up the training projecty. - (In my case, I've atried to associate my CSC user to the training project and did not managed, because MyAccessId was not valid at national level, since there is a validation problem; how can I still associate it ? I have sent an email to your services and got no reply so far.) another user - please provide link?: https://puhuri-portal.neic.no/invitation/5179d19292e94f458abab1a6c3117300/ - still can't reach the training project!!! - 6. Which is the link to puhuri? - You'll have to check your mail as it is a personal link. 7. Would it be possible to access the recorded videos for the sessions? - They will be made public in the project but it might take a few days. We do some processing and cut them in pieces according to the talks. They should only be used by LUMI users though due to copyright reasons with some of the vendor-provided material. 8. Is LUMI planning to introduce MFA at some point in the near future? - No such plans for ssh so far, it is already complicated enough and we already have enough "connot log on tickets"... But identity providers may require it independently of LUMI when you log in to MyAccessID. 9. I read about a Lumi partition for "visualization", with Nvidia GPUs, is that meant for instance to use Jupyter notebooks? - That service will be offered later via Open OnDemand. No date set yet, but hopefully before the summer. The nodes have only just become available and still need to be set up. Be aware though that you have to use Jupyter in a proper way or other people can break into your account via Jupyter, and that it is not meant for large amounts of interactive work, but to offer an interface to prepare batch jobs that you can then launch to the cluster. LUMI-G is the most important part of the LUMI investment so it is only normal that getting that partition working properly has the highest priority. - That makes perfect sense - Looking forward to it 10. There are no slides in `/project/project_465000388/slides`. Is that intentional? - Yes. They will be uploaded during/after each session. - Ok, thanks! ### Introduction to HPE Cray Hardware and Programming Environment :::info Slides will be made available after the session on LUMI at `/project/project_465000388/slides`. ::: 11. Once a job starts on a particular node, can we get direct access to this node (I mean while the job is running, can we interact with it, for monitoring purposes for example)? - https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/interactive/ - See the sessions on Slurm. Currently only with `srun`, not with `ssh`, as `srun` is the only command that can guarantee your session would end up in the CPU sets of your job. 12. Why was LUSTRE chosen as the FileSystem? What others were considered? - Almost all really big clusters run Lustre. Spectrum Scale is very expensive and BeeGFS probably doesn't have the maturity for such a cluster. And it is actually not a choice of CSC but a choice made by vendors when answering the tender. HPE Cray only offers Lustre on clusters the size of LUMI, with their own storage system which is actually a beautiful design. - There is an ongoing discussion in the supercomputing community whether the whole concept of a global parallel file system will work in the future. There might be a scale, when it simply does not work anymore. - I agree. And it is part of the reason why the main parallel file system is split in four. But there is currently no other affordable and sufficiencly scalable technology that can also run on affordable hardware. I know a flash based technology that claims to scale better, but just the hardware cost would be 10 times the hardware cost of the current main storage. There is a reason why we bill storage on the flash file system at ten times the rate of the disk based storage, as that is also the price difference of the system. And HPE is working on local buffers that are rumoured to be used in El Capitan, but even that is still a system that integrates with Lustre. Google for "Rabbit storage HPE" or something like that. 13. Can you use MPI MPMD to run one program on LUMI-C and another on LUMI-G including communication between the two programs? - We have sessions on SLURM and one has MPMD in the title. Please ask such questions in the sessions about that topic. This session is not for answering all questions about all topics in the course. - Ok, but the question is not about how but if it is possible - Yes, it is possible, but not so well tested yet. We are interested in your experiences if you try this! - Great. Thanks! - There is a known problem with the scheduler if you do this across partitions though. In trying to make life easier for "basic" users a decision was taken that makes MPMD more difficult. So the LUMI-C + LUMI-G scenario is currently difficult basically because those jobs have difficulties getting scheduled. - That's too bad. Are there plans to improve it? - If you can convince the sysadmins and technical responsible of the project... It would mean that every user has to change the way they work with LUMI so I'm afraid it is rather unlikely and will require a lot of discussion. I'm in favour though as this is also the model EuroHPC promotes via the DEEP series of projects. - Indeed and it is one of the advantages of having two or more separate partitions. - If you look at the EuroHPC supercomputers, they are all designed with different specialised partitions. The problem is probably that a select group of researchers and compute centres directly involved in the projects that explored this design are very aware of this but many other centres or in the case of LUMI also other groups involved in the decision process on scheduler policies are not enough aware of this way of designing applications. We do see it used by climate scientists already with codes where simulation, I/O and in-situ visualisation are collaborating but different programs, but I'm only aware of one project which asked this for LUMI-C and LUMI-G so my answer is based on what the technical responsible of the LUMI project answered about the problems that can be expected. - Ok. Thanks alot for the answers. I will try it in the near future so perhaps you will see another request soon :) In my case it is for multiscale molecular dynamics simulations (computational chemistry). - I've added the request to the items to be discussed with sysadmins and technical responsibles of the project. - Thanks! 14. Is there any difference between Trento and Milan that the user should care about? - The only difference I know is the link to the GPUs. From the ISA point-of-view they are the same. - The main difference seems to be in the I/O die as now all 128 lanes coming out of the chip support xGNI/Infinity Fabric rather than only 64 of them while the other 64 only supported PCIe. I wouldn't expect much more changes as this is a really low production part, only used in HPE Cray systems with MI250x. 15. Is it possible to use RStudio Server (for interactive programming with R) on LUMI (probably as a singularity container)? - Singularity is installed, so if you have a container, it should run. - It might also come in Open OnDemand, a service that is still under development, but in that case it might be more to simply prepare data for a job that would then be launched or to postprocess data. 16. When will be the recordings available? - Some days after the course. We don't have a pipeline yet to upload them immediately after the training. - It takes some postprocessing and this requires time. We are all busy with the course so this is basically evening work and work for after the course. The place where they are stored will be announced in https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20230214/schedule/ 17. It seems that the On Demand service will provide things like Rstudio, Jupyter, etc. but most users do not need the "basic" Rstudio or Jupyter but a lot of various packages with them: how will that be managed? - Not clear as we don't have the personpower to install everything for everybody so we focus on the main components that have a lot of users. I guess local support teams will have to develop containers that we can then fit in the setup. - Will this On Demand service allow users to run their own containers with all they need inside? (because nobody really uses bare Jupyter or Rstudio, do they?) - We cannot answer these questions yet as the service is not being set up by us but offered by one of the LUMI partners (in this case CSC) who will do the main setup. 18. What is the typical daily KWh of LUMI? - It has rarely run at full load I think but from what I remember its design power is around 6 MW. 19. Is there a way for users to get accurate figures about the actual electrical power consumption of particular jobs, on CPUs - Not at the moment and I doubt this will appear soon. It is also largely impossible as measurements are on the node level so it doesn't make sense for shared nodes. And on exclusive nodes you only get data for the node as a whole, so if you use only one core you'd likely still see 80W or 100W basically because of all the power consumed by the I/O die and network interface, even wehn idle. - even at that level electrical consumption information would be useful, to compare several simulations, etc. - I don't know what your experiences are with it, but I have used it on one PRACE cluster and the results were totally worthless as a comparison as there was too much background power consumption. So I don't think this has a high level of priority for LUMI. Profiling an application and getting an idea of how well it uses the cache hierarhcy and how much bandwidth it requires to memory would be a much better comparison. But unfortunately even that is limited on LUMI at the moment I believe. Hardware counter monitoring by users had to be turned off due to security problems in the Linux kernel. - I was thinking about comparisons between a single run on Lumi using thousands of CPUs vs. a similar run on a smaller machine with less CPUs during a longer time - I hope you realise how much power is consumed by, e.g., the network? Every switch blade in LUMI actually has a power consumption of up to 250W (and is therefore also water cooled) - , about as much as a processor socket, so any measurement would still have a very large error margin. And in fact, the answer is obvious. The run wil less CPUs on a smaller cluster will always consume less assuming the cluster has a similar design with respect to efficiency, as with more CPUs for the same problem you always loose parallel efficiency and as the bigger the network becomes the more power you consume. The latter is also nicely shown in the Green500 list, You'll see there bunches of similar machines together with the smaller one always on top since the network power is less. Which is whey the Frontier TDS (which is not Frontier but just its test system) is in that list ahead of Adastra, Frontier itself and LUMI even though these are all systems with the same design. I guess the reason why Frontier is above LUMI in that list is probably because they seemed to have access to a different version of some software for their Top500 run as they also get better scalability than LUMI despite using the same node and network design. 20. Is see that there are plans for a Container Orchestration Platform - LUMI-K. What will be the purpose of this partition? - It will likely never appear due to lack of personpower to implement the service. The idea was to have a platform for microservices (the Galaxy's etc. of this world) 21. What is the average waiting time until a SLURM job get submitted to LUMI [I understand this may vary depeding on the requested RAM/time/etc, but I mean is it a matter of hours or days...]? How the priority of jobs is determined? - Generally speaking, LUMI, like many HPC clusters, is optimized for throughput and not short waiting times. It is not really meant for "interactive" use like this. That being said, there are special queues for short interactive jobs and debugging, where the waiting time is short, but you cannot run large jobs there. - We don't know ourselves what goes in the priority system. Currently the waiting time is often very low but that will change when LUMI becomes used a lot more. - The maximum walltime in the standard queue is 2 days, meaning that if your job has top priority (for example, if you have run very little in your project), it will start within 2 days. It will often be a lot faster than that. - Is it possible to have walltime more than 2 days for specific jobs expected to need more time? - Unfortunately not. You have to use so-called "checkpointing", i.e. saving intermediate results to disk, so that your job can be restarted. Even if you have a lot of data in RAM, this should be possible to do using e.g. flash file system. Also given the general instability seen on LUMI now, it is not advisble to try to run very long jobs, hardware may break... This is not necessarily a "fault" in the LUMI design, as clusters grow larger, with many components, some nodes in your jobs will eventually fail if you run e.g. a 1000-node job. - is it possible to request an extension for already running job, if it is expected to be finished in longer time? - No. 22. Is it possible to provide some hints on the role of containers and its possible role in LUMI ? - We will discuss containers tomorrow. But we expect more and more workloads to use containers. But be aware that containers need to be optimized/adapted to run efficiently (or at all) on LUMI. Usually, MPI is the problem. 23. When will GCC 12 become available on LUMI? - In a future version of the Cray programming environment. We do not know the exact date yet. Which special feature of GCC 12 do you need? - There is a chance that the CPE 23.02 will be installed during the next maintenance period as it contains some patches that we really need in other compilers also, and that one contains 12.2 24. I still don't have access to the training project? - Did you open a ticket? https://lumi-supercomputer.eu/user-support/need-help/account - yes - You will get an answer per mail - ok - 25. Which visualization software will be available on the nvidia-visualization nodes? ParaView? VisIT? COVISE? VISTLE? - Partly based on user demand and partly based on what other support organisations also contribute as we are too small a team to do everything for everybody. The whole LUMI project is set up with the ideas that all individual countries also contribute to support. More info about that in a presentation tomorrow afternoon. Just remember the visualisation team at HLRS is as big as the whole LUMI User Support Team so we cannot do miracles./ 26. Looking at the software list, is distributed computing/ shared computing supported/ have been tested? https://en.wikipedia.org/wiki/Folding%40home 27. ... 28. ... :::danger Break until 10:55 CET / 11:55 EET ::: :::info **Please write your questions above this note** ::: ###### EOF