# ARCHIVED 01/02/2021 - Intro to High Performance Computing :::danger ## Infos and important links * To watch: https://aaltoscicomp.github.io/scip/ * To ask questions and interact (this document): https://hackmd.io/@AaltoSciComp/IntroWinter2021 * *click on the "eye" icon on the top right corner and write at the bottom, above the ending line. If you experience lags, switch back to "view mode"* * Questions that require one-to-one chat with our helpers (e.g. an installation issue): - Finland: ZOOM LINK SENT TO REGISTERED PARTICIPANTS * Program (Mon-Tue) https://scicomp.aalto.fi/training/scip/winter-kickstart/ * Previous session HackMD archived at: https://hackmd.io/OcTFy7sTQ4uOjuMWtkU3Ow * If you have just one screen (e.g. a laptop), we recommend arranging your windows like this: ``` ╔═══════════╗ ╔═════════════╗ ║ WEB ║ ║ TERMINAL ║ ║ BROWSER ║ ║ WINDOW ║ ║ WINDOW ║ ╚═════════════╝ ║ WITH ║ ╔═════════════╗ ║ THE ║ ║ BROWSER ║ ║ STREAM ║ ║ W/HACKMD ║ ╚═══════════╝ ╚═════════════╝ ``` ::: # 01/02 - Intro to High Performance Computing ## Icebreaker - What is your background and what brings you here? *Please write your answer here below* - I am a staff scientist at Aalto and I work with brain imaging data. I am here because I would like to learn more about HPC to process large amount of brain images. - I am a research assistant in the department of mechanical engineering working on CFD simulations of combustion engines. I have been using triton for some time. I am here to see if I have any holes in my knowledge of HPC - I am a research assistant and master's student at EEA. I have used the OpenPBS scheduler before but I am new to Triton. - Research assistant at NBE. I needed distraction from master's thesis >_< but for real this might be useful in the future - Doctoral candidate at Aalto in computer science (PML group). I'm here to learn more about HPC to process my bioinformatics/machine learning workloads. - Postdoc at Aalto department of applied physics. Working in the NEMS group. I'm looking to see how to use triton for simulations and thinking about whether it will be useful for our simulations. - Doctoral student at Aalto in the department of applied physics. I would like to know more about GPU computing for running GPU based simulations. - PhD researcher at TUNI and interested in learning using computing clusters with dedicated GPUs for deep learning research - Master Student at Aalto and a research assistant at Helsinki university. Here to understand how the scientific computing resources can be availed for my thesis work. - Postdoc at UH. Here to increase my knowledge on HPC programming in order to improve my scripts and productivity - Master's student, computer science, interested in scientific and high-performance computing - Postdoc at UH. I want to learn how to use the clusters. - Service staff at Aalto IT Services. Want to learn the basics of Triton and HPC to better support researchers in Aalto. # Questions *Ask anything, write always at the bottom* (*please include your organization to the question*) - This is a demo question or is it? - This is an answer/comment - This is a question - Someone will answer - You can even have a follow-up question - . - Joining via Twitch. Are you going to record this? Unforuntately, I don't have time today, but this sounds super useful. - The Twitch stream is recorded and will be publically published. The Zoom rooms are not and remain private. - Sure, but better than nothing :-) Thank you so much! - Does my default shell have to be bash? Or is it enough if I just change it to bash - Module system (that will be introduced soon) works with Bash. I prefer to set default shell to bash on the cluster, but if you know what you are doing/using you can have more shells. - I don't :D updating it is --> :100: - Out-of-the-box ZSH works? - Switching to bash is better, there is documentation [here](https://scicomp.aalto.fi/triton/tut/connecting/#change-your-shell-to-bash-aalto) and most likely Richard and Simo will go through it. - There have been some quicks with zsh, like not sourcing the same startup files to make modules work. It may work but we can't support problems during this workshop (during garage tomorrow, we can!) - **If you are at Aalto please refer to this page if you are asking for help: https://scicomp.aalto.fi/help/. We run a daily zoom garage from 13:00 to 14:00, you can come around and ask about anything not just HPC/Triton (e.g. code errors, data management, open science, ...) ([name=enrico])** - **Aalto SciComp Chat: https://scicomp.zulip.cs.aalto.fi** - Are we gonna do something together or should I just move on, on my own - We are connecting to the cluster together. If you are unable to connect, join the zoom for help. After that we will load modules together and submit some jobs with SLURM. Exercise coming soon - Oh damn the zoom wasn't the main session. - We are watching the stream together. See top of this document (To watch: https://aaltoscicomp.github.io/scip/) - I had previously ran into this issue when running the tensorflow in kosh.aalto.fi. "The TensorFlow library was compiled to use AVX2 instructions, but these aren’t available on your machine". I was trying to use tensorflow-gpu I guess. - Was this a warning or error? i.e. did the code still run? - ~~This is related to the different types of CPUs available on Triton. Tensorflow was compiled so that it can run on any of them, but some newer models support faster instructions. The results will be the same.~~ Sorry, I misread. It is compiled for a newer CPU than the one you were using. The code may not run on a machine that does not support the instructions used in the compiled program. - If the code does not run, the solution is to use a newer computer or recompile the program for your hardware. Such as the newer computing nodes of triton. - Kosh certainly does not have GPUs, so it is not a place to run GPU-codes. - Tensorflow binaries installed via pip / anaconda have to be compiled with certain level of CPU optimization. Newer versions are compiled with AVX2 (vectorization) optimizations. These only work on newer machines where AVX2 instructions are present. One shouldn't run tensorflow on kosh, but in theory you could run an older version of tensorflow without AVX2 optimizations. In Triton we're removing our old nodes that do not have AVX2 instructions so in the future all software will be built with AVX2 optimizations. - Yeah just to be clear, this issue was encountered in kosh and not triton. - Kosh shuld not be used for computations as it is just a login node ## Hands-on session :::success Hands-on session until xx:45: https://scicomp.aalto.fi/triton/tut/connecting/ - Go to Zoom session, ask about local problems there - Connect to cluster - Verify shell - Take a short break if necessary ::: ## Applications https://scicomp.aalto.fi/triton/tut/applications/ - What is the main difference between Singularity and Docker? - Docker is sort of designed for system services, and thus requires administrator access. Singularty designed for computing systems. Docker has complicated image format. Singularity has one file per image. - Is there some instruction so I can follow to implement something on the cluster - The pages we are going through will go through a tutorial to run python on the cluster. In our website scicomp.aalto.fi we have other examples for other types of software and for GPUs. For university of Helsinki specific things, please check their documentation - If you mean "write your own software that runs on cluster", we don't so much get into that, but basically a) be able to write on laptop b) use what you learn today to run it multiple times c) if you need code itself to be parallel, study OpenMP or MPI. - For the exercise session ran into this issue : srun: error: task 0 launch failed: Slurmd could not execve job; - The command I ran was srun --time=00:15:00 --gres=gpu:1 python tensorflow_mnist.py. Was referring to the documentation here : https://scicomp.aalto.fi/triton/apps/tensorflow/?highlight=tensorflow - For me it works if I run on the triton.aalto.fi login node the commands: ``` wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/tensorflow/tensorflow_mnist.py module load anaconda srun --time=00:15:00 --gres=gpu:1 python tensorflow_mnist.py ``` - Issue above was related to zsh vs bash. - Still running into the same error after changing to bash - Is bash your default shell now? -yes echo $SHELL /bin/bash - Let's go to breakout room during a break. ## Modules https://scicomp.aalto.fi/triton/tut/modules/ - how do I run multiple matlab scripts at the same time? - When we get to "serial jobs" and "array jobs" you will see how to script this. - why does the matlab give me results names different from the name I defined in my scripts - Can you paste your script? ```delimiterIn = ' '; headerlinesIn = 28; xcells=4096; ycells = 4096; cell_size=14.0/1000; iteration = 3;%loop times start = 1;%loop start temp = cell(1,iteration); for i=start:(start+iteration) temp{i} = num2str(i,"%04u") ; end figure(1) colorbar for i=start:(start+iteration) xarr=([1:xcells]-1)*cell_size-cell_size*xcells/2; yarr=([1:ycells]-1)*cell_size-cell_size*ycells/2; str_e= ['m_00',temp{i},'.ovf']; A = importdata(str_e,delimiterIn,headerlinesIn); Mz=A.data(:,3); Mz=reshape(Mz,xcells*ycells,[]); % average mz Mz = mean(Mz,2)*1000; Mz = reshape(Mz,xcells,[]); Mzt = Mz'; % temp_x = 0:xcells; temp_x = (temp_x-xcells/2)*cell_size; temp_y = 0:ycells; temp_y = (temp_y-ycells/2)*cell_size; [X,Y] = meshgrid(temp_x,temp_y); clims = [-0.1,0.1] d = imagesc(temp_y,temp_x,Mz,clims); %d = imagesc(temp_y,temp_x,Mz); axis equal xlabel('position y (um)'); ylabel('position x (um)'); ax = gca; ax.YTick = [-30 -20 -10 0 10 20 30]; ax.XTick = [-30 -20 -10 0 10 20 30]; axis([-30 30 -30 30]); time = i*10/1000+4; title(['data ',temp{i},', time:', num2str(time), ' ns']); colorbar m = colorbar; m.Label.String = 'Mz 10^-3'; saveas(d,['pictures_saved\\',temp{i},'fig']); saveas(d,['pictures_saved\\',temp{i},'.jpg']) % clf; % e = imagesc(temp_y,temp_x,Mz,clims); % %d = imagesc(temp_y,temp_x,Mz); % axis equal % xlabel('position y (um)'); % ylabel('position x (um)'); % ax = gca; % ax.YTick = [-20 -10 0 10 20]; % ax.XTick = [-10 0 10]; % axis([-10 10 -20 20]); % % time = i*10/1000+4; % title(['data ',temp{i},', time:', num2str(time), ' ns']); % colorbar % m = colorbar; % m.Label.String = 'Mz 10^-3'; % % saveas(e,['pictures_zoomed\\',temp{i},'fig']); % saveas(e,['pictures_zoomed\\',temp{i},'.jpg']) end ``` - "PGIOQ7~9.FIG" "PPMIZU~K.FIG" are the files after running - Looks like you're combining directory names by just joining strings together with `\` as a directory separator. This will only work in Windows. Use [fullfile](https://se.mathworks.com/help/matlab/ref/fullfile.html)-function instead. - Thanks!!!!!! - I succeed - How do I quickly quit the module listing :D - Type `q` if it is displaying text. Type `ctrl+c` if it not displaying (the latter will cancel currently running program) - . :::success Exercises: https://scicomp.aalto.fi/triton/tut/modules/#exercises - Ends at xx:42 - Try to do at least #1 and #2 on that exercise page. Don't worry if you can't do everything, just become familiar with the general idea of `module`. ::: - do we use the triton login node for doing all these things like setting up an anaconda environment, and submitting our jobs? or do we need to change to some dedicated node? - triton login node is good for these things. dedicated node if you need to interactively run python/r/matlab etc - Triton is always available for accessing? - It's up 24/7 unless there's a problem or maintenance. ## Data storage https://scicomp.aalto.fi/triton/tut/storage/ - .cmder ## Interactive jobs https://scicomp.aalto.fi/triton/tut/interactive/ - - What is the maximum RAM one can request? - No limit up to the size of our nodes: "normal" nodes go up to 256GB, we have some huge-memory nodes that go up to 1-3TB - . - .in matlab when I want to run 2 scripts I need to exit and run again - matlab -modisplay -nosplash -nodesktop -r "run 'name of your file'" - `matlab -modisplay -nosplash -nodesktop -r "run 'name of your file ; exit(0)'"` <- remember to put an exit statement at the end. Otherwise matlab interpreter will not quit. - I'm getting error: - srun: error: task 0 launch failed: Slurmd could not execve job - This was for the first srun command. What to do - there might be an issue with your user account. We can check it on a breakout room, please join the zoom. - Why am I getting message "srun: error: Unable to allocate resources: Invalid account or account/partition combination specified" if I try `srun`? - ask in zoom wiht your username - - What is the exercise now? :::success Exercises: https://scicomp.aalto.fi/triton/tut/interactive/#exercises - Until xx:55, then 5 minutes regroup - Try to experiment with these exercises. Make sure that you can run these commands and understand what the different parts mean. We start connecting these togother tomorrow! - It's OK if you don't finish everything, but you can continue exploring yourself later. ::: # Feedback for the day *Write something positive, something negative, anything that help us improving tomorrow and next time we do this course* - . - Thanks for running the course today! Really well organized with the whole twitch/hackMD/zoom setup. I'm new to Triton and I learned a lot by following the exercises and being able to easily get help if needed - Very interesting course and well structured for beginners. Also for people with previous knowledge to improve the basic understanding of how the system we are using already works. Can't wait for tomorrow to get hands on the parallel scripting - Very well structured course! I like the setup with the streams and hackmd. - Thanks for the introduction to Triton! The batch script was similar to the one I have used before in OpenPBS. I look forward to tomorrow's tutorials. - Thanks, the slow pace was very good as there were occassional problems. Help was very quickly available in Zoom. It is a hassle to follow several channels at once (Twitch, Zoom, HackMD, web tutorial), but once you get the setup ready it's manageable. I like that you're being open on twitch. Zoom breakouts for support are awesome. At some point the twitch stream felt disconnected from the Zoom reality, as it started to go on while we were doing exercises. Very ambitious way to organize online course. But in the end of the day, I learned lot of new stuff, so the course worked! ---- :::info ^^^ Please write above this line ^^^ :::