Infos and important links
- Do not put names or identifying information on this page
Day1 and Day2
HackMD archived here: https://hackmd.io/@AaltoSciComp/ARCHIVEintroWinter2022
Day3 4/Feb/2022 Introduction to HPC
Today, parallel computing and scaling up to Lumi
https://scicomp.aalto.fi/training/scip/winter-kickstart/
Questions from yesterday
- Thank you for clarification. How to choose the volume of memory and GPU/CPU usage for JupyterHub? After logging in I cannot see any choice for these parameters.
- Currently there is not an option for GPUs. The GPU resources are so highly requested, that we need to prioritize non-interactive usage as that utilizes the GPUs to greatest extend. There's also no CPU selection for similar reason. Memory limit should be chosen based on the size of the data you're processing.
- Not sure other people had the same problems, but I get an error when using
lfs find directoryname
: "cannot get lov name. inappropriate ioctl for device". what am I doing wrong?
Other resources
Icebreaker: Do you think your work could be parallelized? Could it be run side by side or with multiple processors?
- All the data points in my research code can in principle be calculated independently in parallel. At the moment I'm only using core level vectorizations and omp.
- Sometimes I have to calculate aggregate statistics/estimate models on various subsets of the data.
- Right now, due to my overall ignorance of How and where to parallelize, I fear I can not even answer this question! I mostly run ML pipelines to analyse data though and do not really write a software.
- I too am going to run two ML pipelines for feature detection and not quite sure whether that would require parallelization
Yesterday was:
- useful: oooooooooo
- not useful: o
- too fast: o
- too slow: o
- just right: o
- I would recommend this course to others: ooooooooo
How should this course be made better:
- It should be integrated in a full course ordered by the skill level: shell -> this course -> Git -> Software design
- good idea! We sort of do them separately because otherwise it becomes too long, but check out https://hands-on.coderefinery.org/ (and CR may try to advertise them together more).
- .
- .
- .
Array jobs
https://scicomp.aalto.fi/triton/tut/array/
- Question: What happens if I add multiple srun commands to the same script?
- each appears as a command in the history, with resources tracked separately. They still run one after another.
- Q: Will the above wait for the completion of the previous step?
- No. Array jobs are parallel and independent of each other. They are not running in a series.
- Sorry, there is a misunderstanding. The question is about the multiple srun commands in the same script.
- All srun steps within the array job task will be executed one by one within that particular array task indepenedently on what is going on with other array tasks.
- Overall, array job is just a number of jobs that run using the same SLURM batch script template, they only differ by the $SLURM_ARRAY_TASK_ID.
- Q: How long should each parallel job in an array take to be useful instead of just running in series?
- Depends on the system. I would say ~1 hour each, at least.
- You can always play and adapt your workflow, make one array task with enough 'srun' steps so that it would take 1h at least. If the question is: array job with N tasks vs N sbatch submissions, then in case you have a serie of simulations, like same binary but just different inputs (ie the same SBATCH options, running environment etc), it make sense always put them into arrays. Arrays jobs (i) makes SLURM work more efficient, SLURM controller checks/sets up all the parameters only once for one job, and then copies the same parameters to all the other array tasks, (ii) easy to follow from the user point view with squeue/scancel etc.
- Is the %A and %a a part of the syntax in slurm?
- Correct.
man sbatch
will tell more.
- Q: Will the array jobs queue seperately? Or will they wait until all of them are launched simultaneously?
- Array jobs will queue independently. For the arrays jobs with large amount of tasks, it is usually common that few tasks are running at a time while rest are pending.
- Q: When you request memory for X number of array jobs. Will the amount of memory be divided into the array jobs or does each array job get the X amount of memory?
- Each will get the same memory requirement, so there's no division. All other
#SBATCH
-requirements are identical (basically copy-pasted) for all individual array jobs. So if you say #SBATCH --mem=2G
and #SBATCH --array=1-10
, you won't get 20GB memory requirement, but each of the 10 jobs gets 2G memory requirement.
- Consider submit script as a template. That is –array= .. number of jobs will run using that the same template.
- Q: What is the maximum length of array and is there a maximum for the argument?
- depends on cluster, in 10000s on Triton I think.
- But please test with smaller numbers first :) If you are going to run 10K jobs and each job lasts for 2 minutes, then this is not good for you and for others. It is better to pack iterations or parameters so that 1 job lasts few hours. Otherwise you will be spending most of the time queueing, and only few minutes running.
- If you think of a number of array tasks in one array, that is 'scontrol show config | grep -i array' : MaxArraySize = 200001; thatt means –array=0-200000 is ok, but –array=0-200001 will shoot an error (Triton example)
- Regarding a max for arguments; if in question how many arguments you can give to your running binary, it is not up to SLURM, but BASH/Linux setup, see 'getconf ARG_MAX' for the bash line length in bytes. It is huge anyway.
Type-along: your first array job, until xx:31
I managed to do the sample:
- yes: ooooooooo
- no: o
- I did not try: oo
Array exercises until xx:00, then break until xx:10
Helsinki:
There is again the University of Helsinki "type-along" session in the Break Out Room 1. Paying attention to the differences between triton and turso.
- .Where do i download the pi.py file?
- If you did the exercise yesterday, it is in the git repository, otherwise do these commands:
the file is in the subfolder ./hpc-examples/slurm/pi.py
I managed to do the exercise 1:
Exercise 2: Do you think your problem could utilize an array-structure? Are you interested in it but you're not certain how to split your data / parameters? Answer/ask below:
- This has been certainly the direction for me. Each data point takes 1h or more depending on case and running a full analysis takes a week or so in series. I have been now changing my C++ code to handle and organize data points independently and save/load data also indepedently. I'm not sure how to combine algorithm level parallelism with the embarassing parallelism (running jobs in parallel) at the moment.
- If needed, help is available for Aalto people (either garage or even an RSE project), just in case.
continued
Parallel
https://scicomp.aalto.fi/triton/tut/parallel/
- What is the difference between CPUs, cores, and nodes?
- in SLURM terms, "core" == "cpu", and physical one processor (with multiple cores) is called "socket", it's the shorthand we use.
- Though in hardware design, they would say one CPU==one chip and one chip can have multilpe computing cores in it.
- Node == "one discrete computer's hardware"
- Indeed I used term "CPU" in a bit ambiguous way. Usually our servers have two actual CPU's, physical chips, in two sockets. Each of these CPUs has multiple logical processors aka. cores that can do calculations. They are often also referred to CPUs. Slurm considers these cores as CPUs for
--cpus-per-task
Simo
- Hi Embarrassingly I missed the MPI part of the video, is there a way to get back to it right now on twitch?
- What is the difference between OpenMP and MPI?
- OpenMP is a standard and an implementation of the shared memory programmig model, one of them; while MPI is a message passing interface, a standard developed for the distributed memory programming. MPI has several implementations, the most ofen used is OpenMPI (do not mix with OpenMP), but there are others, like Intel MPI, mvapich, etc
- The names are similar, but that is just a unfortunate choice by the standard developers. The function is like mentioned in the above comment.
- Do all Triton nodes share the same basic processor architecture so that e.g. Intel core level vectorizations should work?
- Triton has several CPU arch generations, they all are x86_64, but differ in details; yes all support vectorization, the oldest we have are on the pe[] and c[] nodes with the Intel Xeon E5-2680 v3 (Haswell, AVX2 instructions).
- If you want to choose a specific processor architecture with
--constraint=X
-option. The available features are listable with slurm features
. For example, if you need avx512
instructions
- In total: https://scicomp.aalto.fi/triton/overview/ see in particular 'Arch' column
- So do I understand this correctly? openMPI would take you across nodes but openMP is for distributing over the cores on the same node?
- OpenMP is for distributing work over the cores on the same node
- MPI allows programs to communicate accross nodes
- So yes :)
- OpenMPI is one implementation of MPI.
- OpenMP, a standard for doing shared-memory calculations, sounds unfortunately similar to OpenMPI, which is a popular MPI implementation.
- If I have a Mathematica code on my computer that takes a long time to run, how can I run it on a cluster?
- You can probably use functions from Wolfram's documentation and
--cpus-per-task
together to run your program in parallel. It really depends on the program. It might be a good idea to contact RSE if you need help with the implementation.
- For usage in Triton, see our Mathematica page. We'll need to add info on parallel runs there.
- .
- .
Break until xx:04
From laptop to LUMI - CSC services for researchers
Slides: https://github.com/AaltoSciComp/scicomp-docs/raw/master/training/scip/CSC-services_022022.pdf
Question: I have used some CSC services
- Yes: xxxxxxx
- No: xxxxxxxxxxxxx
- I am going to: x
Questions to Jussi & questions about CSC:
- Does CSC provide RSE like support for free? For anyone or just CSC users?
- Well, most of the services are funded by the Ministry of education and culture, and hence most of the academics (people from universities) support and services are free. (free of use cases). "deep support", i.e. something that takes days can be provided in special cases only. E.g. if there is some specific funding for doing that (like an EU-development project related to X) or if it results in a new service or will be generally / widely useful for other users, too. Anyway, if you have questions, please ask!
- Can you send us the link for the CSC course? I can not find it in Google.
- Training materials by CSC in GitHub: https://github.com/csc-training/
- Training materials on Docs CSC: https://docs.csc.fi/support/training-material/ (+ tutorials https://docs.csc.fi/support/tutorials/)
Break until xx:00
Then GPU.
GPU
https://scicomp.aalto.fi/triton/tut/gpu/
-
How do you draw a monster for gaming?
- Usually they are models constructed from voxels (3d pixels).
-
How much faster should the job be on GPU compared to CPU? (to be eligible for GPU)
- That depends a lot on the algorithm, implementation, type of GPU card, amount of videomemory and other factors; some code may give 50x times speed increase, some 2x, some none
-
current example: https://scicomp.aalto.fi/triton/tut/gpu/#simple-tensorflow-keras-model
- If the speed is the same as on CPU, but GPU is more expensive, does it make sense to run on GPU?
- Nope. CPU way is chepaer, if no diff in performance, CPU is easier to get
- Is there some price comparison for them to know how much faster GPU should be to make it worth it?
- One Nvidia Tesla A100 costs us 7-8ke, a box with two modern Intel CPUs and 128G of mem costs 5-7ke; but A100, theorerically, provides 9.6 TFlops (double precision), and a two socket node, with 40+ CPU cores in total is ~2.5TFlops, depends on the type; very rough (=theoretical) numbers; live benchmarks to be run to see realistic FLOPs per euro
-
Can you also use newer versions of, for example, tensorflow than there are in the anaconda module at the moment?
- yes, you can install your own anaconda enviroments with any versions of things you want. We haven't covered it much so far but if you read the Python page we have hints.
- https://scicomp.aalto.fi/triton/apps/python/
- You can also let us know and we can install a newer version for you
-
How do other clusters monitor their GPU performance?
Exercises (done as demos/type-along)
https://scicomp.aalto.fi/triton/tut/gpu/#exercises
- why do I get this error?
srun -M ukko -p gpu nvidia-smi
srun: job 270697145 queued and waiting for resources
srun: job 270697145 has been allocated resources
srun: error: task 0 launch failed: Slurmd could not execve job
That is strange, I can't reproduce that now (Pekko Metsä).
If you say 'module purge', do you still have the same problem?
- yes
Strange.. If you could visit in our HPC Garage on Monday, we'd have a closer look.
Announcements
Feedback
Note: registered participants will also receive a form for anonymous feedback. It helps us a lot to receive your feedback, because we can make future version of this course even better! <3
Did you feel the course was engaging, even though it was online:
- better than in-person: oooooooo
- same as in-person: o0o
- worse than in-person: o
I would have:
- attended hybrid in a lecture room: o
- online-only was good enough: oooooooo
- preferred lecturers to be in lecture hall: o
One good thing about the course:
- you didn't make us feel stupid about our stupid questions
- the fact that it can be done remoetly
- I really liked that this was divided to several shorter days, and not one or two long days. It’s easier to concentrate this way and get more out of each part.
- Getting to use the system during the lectures
- You guys are fantastic! Thank you!
One thing to improve for next time:
- One longer break would be nice
- .
Favorite lesson of the course:
Lesson that needs most improvement:
Lesson that could be added:
This is the end of the document, WRITE ABOVE THIS LINE ^^
HackMD can feel slow if more than 100 participants are editing at the same time: If you do not need to write, please switch to "view mode" by clicking the eye icon on top left
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →