Introduction to Scientific Computing and HPC

# Introduction to Scientific Computing and HPC - [x] registration form set-up and test ==EG== - [x] update webpage with dates and registration ==EG== - [x] create event page on aalto.fi. See previous ones for list of tags to include so that event is visible on each dept homepage ==EG== - [x] **Deeper update to webpages (this can be its own list)** - [x] Email Aalto communications to promote the event ==EG== - [x] Promote the event wiht Aalto summer worker (via HR) ==EG== - [x] Link from relevant webpages (e.g. doctoral) ==EG== - [x] email collaborators at HY, Oulu, TUNI ==EG== - [ ] email data-agents@aalto.fi and ask to fwd to their depts. ==EG== - [ ] email STEM student guilds viestintavastaava@fyysikkokilta.fi; hallitus@tietokilta.fi; tiedotus@inkubio.fi; data.guild.dg@gmail.com ==EG== - [ ] update motd on login node ==EG== - [x] email triton-users (remember the subject) - [ ] email them again in 1st week of June - [x] email scicomp-announcements@list.aalto.fi - [ ] post on Teams ASC and other relevant teams group - [ ] post on zulip announcements stream - [ ] OPTIONAL: email depts (staff-cs@aalto.fi, nbe-personnel@aalto.fi, PHYS? ) - [x] international promotion (linkedin, coderefinery, nordic partners) - [ ] email reminders on week before - [ ] email to registered participants on same week monday - [ ] creation of triton accounts for those registered without it --- # OLD STUFF BELOW * Preparation * Watch previous videos * Connecting * Moving data * Shell crash course / cluster-shell * Suggest watching HPC-kitchen * Day1 morning * (t10) intro (EG, ) * (t10) HPC kitchen (, RD) * what is Slurm * (t30) Intro to SciComp (+ data classification, cloud, etc.) (EG, ) * (+30) * ... (introductions and chat with other HPC teams) * (t10) Connecting (TP, ) * (p20) Connecting - Q&A and help (TP, ) * Day1 afternoon * (p20) Setting up for a new project (RD, ST) * Big example: create kickstart-2025 folder with git clones * One of us pretends to be a new person and asks things to get set up * e.g. ask for new project folder * Create $WRKDIR/kickstart-2025 * Clone git (learn git!) * TODO: adjust/split the cluster-shell to "Set up your project" * (p20) Data transfer (RD, ST) * Big example: copy Gutenberg to your workdir * (some method) + also show with OOD * (t10) What is Slurm? (ST, RD) * (p20) Interactive (RD, ST) * Big example: run Gutenberg * (p30) First serial job script (e.g. gutenberg) (RD, ST) * Big example: run Gutenberg * (+0) * Day2 morning * (t20) Humans of SciComp (RD, ) * (40) Conda (JR, YT) * Big example: Make conda envs for LLMs * (40) array jobs (ST, RD) * Big example: gutenberg in parallel * (+0) * Day2 afternoon * (30) monitoring (ST, RD) * Big example: Gutenberg monitoring for multiprocessing vs single-core. * Instructors submit jobs durig break, users examine that job's performance (only works on Aalto cluster). * (t20) Applications (RD, ST) * modules, containers, conda * (p20) more filesytem stuff? * (40) parallel (ST, RD) * Big example: ???? * Discuss why it's hard to know what is parallel * submitting * (+10) * Day3 morning * (t20) getting help ((RB), ) * (t30) GPUs (, ) * Pytorch mnist from GPU * (p40) LLM example (YT, ) * Big example: Use scicomp-llm-env with copy-paste code to run some LLM. * (t20) Wrap-up Problems compraed to last year: * VSCode intro * Computational reprodubility (combine with sicomp intro) * Laptops to Lumi (we can do without/mention CSC in other things) ## Brainstorm - visiontaabgg kkouwa TP: - Starting assumption: - User having general eperience with coding in their field and no experience of HPC work/remote work - Target (p - being practical, t - being theoretical knowledge ): - Having a working way to connect to their cluster (p) - Being able to build/run the environment they need on the cluster (either by loading modules or creating an environment?) (t) - knowing a way how to get their data to and from the HPC system (p) - knowing how the cluster works i.e. - Where to find available resources - how to write a job script (p) +1 - how to submit a job (t) - concept of Array jobs/ running the same code with multiple different conditions (t) - Knowing where to get further help RD (basically like last year with fine-tuning): - Purpose: (unchanged from last year) Get familiar with us and principles of cluster usage, some hands-on but you still have to learn more later. - day 1: Broad discussions, HPC kitchen, intro to SciComp, plus one end-to-end demo of everything (fast version of winter kickstart) - day 2-3: Go through tutorials in detail, like before. - Combining idea below: - HPC-kitchen videos (make more if needed) as pre-watching, summary of important points in course - day 1-2 morning=discussion - day 1-2 afternon + day 3 = exercises - New/improved emphasis: building environments, data management, exercise about running LLMs in batch on GPUs (code already written, just write batch scripts) YT: - Recommend some previous video watching. - This is the stretagy of some Aalto courses nowadays, watching videos + exercise sessions. EG: - **Learning outcome** - Do reproducible data science using Aalto systems (and not own laptop) (but what about other unis?) - **Hands-on or demo?** I like alternating between demos and exercises, is it better morning (demo) afternoon (exercises)? or 1h demo, 1h exercises. Risk: very small amount come to exercises (this was the reality with TTT and winter kickstart) - **How do you do computational research at Aalto?** We could also cover general good practices for doing computational research at Aalto even without triton (e.g. local conda env, classification of information, vdi vs ondemand, accessing various storage from own laptop/workstation (aalto home, one drive, gdrive, teamwork, triton home, scratch)) and show why "cloud" storage is not good - **How to do interactive development?** If we oversell OOD, we need to make sure it scales (I know VDI can scale to hundreds concurrent sessions) - Parallel stuff has *maybe* a limited audience, but I feel we will get lots of **"my supervisor asked me to try some local LLMs"** - **What will you do?** Would it make sense to ask "our people" (CS/NBE/PHYS) what they plan to give to summer workers in case we get too many that want to do something large-ish? (e.g. llms) OR we can ask in advance at registration - Idea for a metric: given last year's participants, how many did actually run jobs on triton or stored data on scratch? JR: - Learning outcomes: - Able to log in to their local HPC system - Able to move files to and from the cluster - Able to run a command through the batch queue - Able to navigate the file system - Understand how the cluster is set up, where the power is and what are the limitations - Connect those limitations to best practices. Why do we recommend conda environments? - Prerequisite: - Can run their workflow on a laptop or desktop - These outcomes require exercises, but the exercises can be relatively short. - Advanced topics on day 2 and 3 are more specialized. Exercises are still probably helpful, but most of the topics are not for everybody. - Some exercises should be only for the interested (choose from a list of possible exercises) - Discussion (in notes, if face to face is not possible) helps in the understanding goals ST: - We would do a quick first day when we would do connecting, interactive, serial. Filled with exercises. Similar to the day we did in winter. But with lots of hands on stuff to keep users engaged and doing things. - In second day and third day we would show actual use cases / examples for more complex array jobs / multiprocessing / MPI / GPU / LLM etc. and using them we would describe the more complex Slurm flags and workflows. - Learning outcomes would be: - First day would teach how to interact with Triton (OOD/command line/Slurm) - Second and third day would teach about workflows and concepts you can run on Triton - Idea would be to learn by doing and to teach what the users need instead of teaching possible problems they might encounter. - Start of the days could have some more philosophical topics like the Kitchen metaphor and talking about computing in general, but afternoons should be mostly exercises. - This would require rewrites to materials to reformulate them around exercies. I would put them in a different doc / doc root and keep current tutorials as the documentation. ## Old archive - https://hackmd.io/@AaltoSciComp/scicomphpc2024_archive - Some feedback condenced: - day 1: - Explaining the key terminologies in dummy language or concrete examples would be very much appreciated.+1 - day 2: - There was a great bunch of information, but the discussive presentation style was very comfortable to follow - The schedule for exercises was very strict, however, the demos and the explanations worked really well. - It will be good to have a demo with jupyter notebook (.ipynb files). - I feel like the first day could be condensed into 2 hours and some of the topics of 2nd and 3rd days could be expanded with more practical examples. - day 3: - Condensing the first day a little bit. In my honest opinion, the practical information as well as conceptual understanding are the most important, background information is less relevant (it's also possible to add more information to that day). Though it's possible to not attend those parts, I prefer to attend the whole thing in order to not miss anything