TTT4HPC (Tuesdays Tools & Techniques for HPC)

tags: `Training` `TTT4HPC`

Name of the course: TTT4HPC (Tuesdays Tools & Techniques for HPC)
Days: Tue 16/April; Tue 23/April; Tue 7/May; Tue 14/May
Lesson site: https://coderefinery.github.io/cluster-workflows/ (TO BE RENAMED + UDPATED)
Chat via CodeRefinery chat: https://coderefinery.zulipchat.com/#narrow/stream/360548-workflows-course/topic/misc/near/316825670
Old planning notes are copied here: https://hackmd.io/@coderefinery/workflows-course-planning-old
Format: 2h of teaching+short exercises on stream (with a 10 minute break in between) + 1h lunch + 1.5h (max) of hands on exercises on zoom
People involved and initials legend:
- EG: "Glerean Enrico" enrico.glerean@aalto.fi;
- SW: "samantha.wittke@csc.fi" samantha.wittke@csc.fi;
- RB: "radovan.bast@uit.no" radovan.bast@uit.no;
- DI: "diana.iusan@uppmax.uu.se" diana.iusan@uppmax.uu.se;
- RD: "Darst Richard" richard.darst@aalto.fi;
- TP: "Pfau Thomas" thomas.pfau@aalto.fi;
- SR: "sabryr@uio.no" sabryr@uio.no;
- DP: "Dhanya.Pushpadas@uib.no" Dhanya.Pushpadas@uib.no;
- PM: "Pavlin.Mitev@uppmax.uu.se" Pavlin.Mitev@uppmax.uu.se;
- LF: "Ferranti Luca" luca.ferranti@aalto.fi;
- ST: "Tuomisto Simo" simo.tuomisto@aalto.fi;
- JR: Rantaharju Jarno jarno.rantaharju@aalto.fi;
- HF: Hossein Firooz hossein.firooz@aalto.fi;
- RK: "rasmus.kronberg@csc.fi" rasmus.kronberg@csc.fi;
- MP: maiken.pedersen@usit.uio.no

Timeline for development (in weeks)

Week starting 5/Feb
- EG to update main course page + registrations
- EG to contact individual groups
- Encouraged async chats+discussions within and between groups via zulip
Week starting 12/Feb
- Groups update their and present their plan to a team meeting
Week starting 19/Feb
- Groups work as they prefer
Week starting 26/Feb
- Group work continues
Week starting 4/March
- Show what you have done to other groups and gather some feedback. Rough drafts are fine.
Weeks starting 11+18+25/March
- Break or work as each group wants
Week starting 1/April
- Final dev sprint
Week starting 8/April
- Final meeting before course starts

9/April meeting

Agenda:

revising goals of the course/episodes and structure
checking on episode 1 materials
a quick look at registrations for episode 1

25 I have access to Aalto University Triton cluster
11 I have access to CSC (Puhti and Mahti)
5 I have access to LUMI
4 I have access to NRIS/Sigma2 clusters
3 I have access to Uppsala University's UPPMAX
0 I have access to another cluster (please write which one in the comment box)
1 I do not have access to any HPC cluster, but I still want to watch and learn
7 I have access to Tetralith at NSC, Linköping
5 I have access to Dardel at PDC, Stockholm
3 I have access to Leonardo Booster (Italy)
9 I have access to another cluster (please write which one in the comment box)
2 I do not have access to any HPC cluster, but I still want to watch and learn

assigning roles for day 1
- Streaming setup: RD
- Exercise Zoom: EG
- part 1: DI and RB meet later today and will talk about the draft and next steps
  - goal is to share the demo example and exercise within the next 2 days so that we can test it
  - text is being fleshed out over the course of the week
  - we will condense the "please test this on your cluster" into a brief page assuming power user reader
  - demos in lecture part, exercises in the exercise part
- part 2: JR and ST
  - Draft online, writign happening tomorrow
  - goal is to add exercises tomorrow
  - Add information on testing demo
zoom studio room

List of supported clusters and persons who will test the exercises

List of people who will test the exercises

Cluster	Name of tester 1	Name of tester 2
Aalto Triton	Enrico	…
CSC Mahti/Puhti	…	…
LUMI	…	…
NRIS	Dhanya	Radovan
UPPMAX	Diana	…
Tetralith at NSC, Linköping	…	…
Dardel at PDC, Stockholm	…	…

Days/content

Here below, mostly a copy paste from the past + a whole new day for singularity. Please comment in a way that it is clear that is new content, for example: EG comment: I think this is great!

1. Tue 16/04/2024 :: Computational resources (memory/cpus/gpus, monitoring computations, monitoring I/O, local disks/ramdisks) -> this will become day 1

Suggested coordinator: Jarno
Helpers/Instructors/LessonsDevelopers: Richard Darst, Simo Tuomisto, Diana Iusan, Dhanya Pushpadas, ??, ??

1.1 Benchmarking & choosing job parameters (50 min) (DI, RB: we have material in Norway for memory and num of cores calibration, DI: same in SE, I can contribute with something)
- Three points
  - Start with large parameters and narrow down after you obsreve how it goes
    - But also: start with small system and grow it and observe and try to extrapolate
    - Small system makes trouble-shooting easier and quicker
  - How to check actual usage and example of tuning
  - It's not just CPU: I/O and memory are also important
    - show that memory and CPU are sometimes correlated
- Example case:
  - https://documentation.sigma2.no/jobs/choosing-memory-settings.html
  - https://documentation.sigma2.no/jobs/choosing-number-of-cores.html
  - Start with some "black box" code: we don't tell the audience what it does. We run it repeatedly to figure out the scaling.
  - Potentially show seff to get ideas about actual use?
  - Aalto has this document about how to determine the program size / execution length: https://scicomp.aalto.fi/triton/usage/program-size/#how-big-is-my-program
  - Teach user on how they can create a test setup from their scientific problem that just tests the resource requests (e.g. reduced timesteps, only use part of bigger dataset)
1.2 Monitoring I/O (~50 min) (JR, ST)
- Three points
  - Filesystem can be the slowest part of many jobs
  - networked filesystems tend to be best at large files, bad at many small
    - This point mainly applies to file reading (writing commonly is independent)
    - Also ( at least to my knowledge, they deal well with sequential reads, not necessarily large files, so if you try to randomly access your large file, the IO speed is not better than many small ones)
  - Tools to run to see the performance
- Motivating example/exercise:
  - we demonstrate a tool that is accessible to all which demonstrates this
  - https://github.com/simo-tuomisto/data-format-tests
  - https://users.aalto.fi/~tuomiss1/data-format-tests/talk/io-presentation.html
  - https://github.com/AaltoRSE/ImageNetTools/tree/master (Some description about how to shard)
  - You can check single command's IO calls with: strace -c -e trace=%file,read,write ./command
- Container/archive formats for data
- story 11
- Three points
  - Storing many small files is slow (see previous section)
  - There are different formats to combine many files to one
  - You can still efficiently use these container files from your code
- Example case: conda+container? better data formats?
- mldb? Loads the data into memory, so only for small datasets.
- webdatasets
- Using local disks and ramdisk
- Three points
  - Local disks provide dedicated i/o performance
  - When to use local disks
  - Example of how to script the data management on these.
- https://github.com/hoytech/vmtouch (tool to mention perhaps)
- Motivating example/exercise:
  - demo of moving an example from network disk to local disk to ramdisk and compare timings

UPPMAX material:

course overview: https://www.uppmax.uu.se/support/courses-and-workshops/uppmax-introductory-course/
Slurm presentation(pdf but DI can rewrite it in a diff format): https://www.uppmax.uu.se/digitalAssets/560/c_560271-l_1-k_uppmax-slurm-2024-01.pdf

2. Tue 23/04/2024 :: Working on clusters (interactive sessions, data (and code) access/moving/versioning, graphical tools) -> this will become day 2

Suggested coordinator: Samantha
Helpers/Instructors/LessonsDevelopers: Enrico Glerean, Jarno Rantharju, Hossein Firooz, ??, ??

2.1 From laptop to cluster: Syncing code (and data) (45 min) (EG, RD, SW)
- story 2 + 5 (or keep them separated, or sequential)
- Three points
  - Arranging projects well and using version control makes it possible to work between laptop and cluster, and this is important
  - Understand example of a good project arrangement
  - How this works with multiple people
- Motivating example/exercise:
2.2 Side episode: sshfs, short demo (10 min) (??, ??)
- Three points:
  - Mounting filesystem is distinct from copying
  - Sshfs works on "anything" you can ssh to
- Motiving example: demo only
2.3 Interactively working on a cluster (45 min) (EG, RD, SW)
- story 2 + 5 (or keep them separated, or sequential)
- Three points
  - A cluster is not just batch: interactive is possible and good for debugging
  - Basic interactive jobs from terminal
  - Interactive with graphical interfaces
- Motivating exammple/exercise:
2.4 Remote interactive example: vscode (25 min) (HF, RD)
- Three points
  - Gui tools like Spyder/vscode can run on cluster, from your computer, but isn't always a good idea
  - What is required to set up that connection
  - Common problems that occur with this
- Motivating exampl/exercise:

3. Tue 07/05/2024 :: Containers on clusters (everything from zero to infinity)

Suggested coordinator: Simo Tuomisto (+ Enrico Glerean)
Helpers/Instructors/LessonsDevelopers: MP, DP, ??, ??
Note, text below has part 3 and 4 from a brainstorm with ChatGPT, + past discussions with Simo, + Enrico's ideas

3.1 Introduction to Singularity for HPC (30 minutes)
- Overview of Singularity containers
- Comparison with other container technologies (e.g., Docker)
- What's inside a container file
3.2 Basic Singularity Commands (30 minutes)
- singularity build: Building container images
- singularity shell: Accessing a shell inside a container
- singularity run: Running applications within containers
- singularity exec: Running specific commands within containers
3.3 Advanced Features and Parameters (30 minutes)
- –nvidia flag for GPU support
- Mounting file systems: syntax and best practices
- Networking in Singularity: overview and use cases
- Security and resource limits in SingularitySingularity and MPI (Message Passing Interface) for parallel computing
- Container registries and singularity
3.4 Hands-On Exercise Session (30 minutes)
- Exercise: Building a simple Singularity container
- Task: Running a pre-built application using singularity run
- Experiment: Accessing and modifying files inside the container with singularity shell
- Challenge: Utilize the –nvidia flag and mount a local filesystem

4. Tue 14/05/2024 :: Parallelization

Suggested coordinator: Thomas Pfau
Helpers/Instructors/LessonsDevelopers: Radovan Bast, Pavlin Mitev, Diana Iusan, Simo Tuomisto, Teemu Ruokolainen, ??

4.1 Parallelizing code without parallelizing (TP, RB, PM(15) DI interested in Slurm solutions, SW) (90 min)
- stories 1, 4, 15, 23
- Three points
  - You don't need (and actually don't want) to parallelize code that often: embarrasingly parallel instead. Sometimes you can't.
  - How to modify code to run parameters separately
  - (leads in to next section)
- Example: Parameter sweeps (in matlab or python?)
  - moving a loop to outside
  - provide parameters via CLI
- Notes:
  - workflow automation maybe not really covered here (see CR lesson)
  - but it can still be used to show
4.2 Workflow automation tools (TR, ??)
- array jobs and/or workflow automation
4.3 Hyperscaling pitfalls (??, ST)
- Three points:
  - Scaling is good, but when you schale toomuch using embarrasingly parallel scale too much, you get problems
  - How to identify these hotspots
  - Where to look next.
- Motivating example/exercise: packing parameter searching into one Slurm allocation

Day 4 practicalities

Main: TR and RD
Host: EG
Screenshare: always Richard
Lead: usually done by Richard and explicitly pass the mic to Teemu

icebreakers: 09:50 - 10:00
- EG to write icebreakers
Intro + usual practicalities 10:00 - 10:05
- EG will cover how the stremaing works, q&a, zoom exercises, credits
- EG introduces the lecturers for the day
1. Motivation 10:05 - 10:10
1. Concepts 10:10 - 10:30
- - Pitfalls here
    - ~~done: EG move pitfalls under concepts and add bullet points in the pitfalls pages to avoid the wall of text on the stream~~
    - Take homemessages ~~done: make sure they are somewhere (e.g. conclusion) TP~~
      - be careful with concurrent things (e.g. concurrent I/O)
      - collection scripts running afterwards (it can be even scripted after the arrayjob, depends on jobid)
      - be aware on how clusters store data (parallel I/O can be a bottleneck)
      - keeps individual jobs long enough especially if setup takes more time than computations
1. Convert a Jupyter Notebook to a Python script 10:30 - 10:35
- Do it fast? no need to demo
1. Parallelize using Script 10:35 - 11:00
BREAK 11:00 - 11:10
notes doc + other things that came up 11:10 - 11:15
- EG can lead after the break with interesting questions
- if needed we reuse leftovers from section 4
1. Parallelize using Slurm Array jobs 11:15 - 11:20
1. Parallelize using Workflow Manager 11:20 - 11:50
Q&A on the topics of the day 11:50 - 11:55
- EG can lead and ask questions / pick things from the notes
Final wrap up of the day and of the course + upcoming events 11:55 - 12:00
- EG can lead

ZOOM EXERCISES:

EG can be the host of the zoom room
List of exercises at https://coderefinery.github.io/TTT4HPC_parallel_workflows/exercises/
- Two mandatory exercises for those who need the credit
~~done: EG can add the coderefinery snakemake exercise as an optional exercise if somebody wants to try that~~

Other issues:

Check the repo issue and take ownership and do it

Meeting 27.3.

Agenda:

Quickly go over existing docs
Conceptual idea
- Current: Have one overarching example (needs updates in the notebook to array job part)
  - Can we find a good example for R as well ?
- Only use one language (python), and give a few examples how certain steps can be handled in other languages (i.e. parameter reading / generation etc).
  - Maybe discard scikitlearn since it needs to be installed.
  - advantage of not using scikit-learn: we can create the same example for Python and R
- there are several ways of implementing the parameter sweep: reading from input file? or "hard-coded" or …
  - we could present in the material 2 or 3 different ways and list their pros and cons but we could demonstrate one
Comments
- Motivation for snakemake (failing jobs e.g. to timeout)
- We thought about packaging the computation part into a container but then we would need to do subprocess but it is better if the computation part can be imported into a python script
- any aggregation step at the end? we discuss and we might have a small example for a script that collects results from all. can be just a mention why. don't show details how. it's more about realizing that sometimes it can be good to split a computation into the part that takes time and can run in parallel and the part that takes no time.
- discuss file formats and floating point precision
Next steps:
- RB will read through material and open issues and take issues. any non-trivial changes: will open issue and collect feedback.
Organizational
- Are we enough instructors? RB will be traveling on that day but has time and interest for lesson development.

Existing "starting" repos:

Parallelization / Embarassingly parallel: https://github.com/coderefinery/parallel-workflows
Workflow / Snakemake : https://github.com/AaltoRSE/ttt4hpc-parallelization/tree/drafting
Hyperscaling just some bullets yet:
- Writing to the same file between individual jobs will result in clashes
- Writing to same database can result in lock problems where one job can cause other jobs to wait for the lock to be freed
- Reading same files can result in hotspots in the filesystem. E.g. starting same conda environment on 1000 jobs at the same time will cause 1000 times the read operations. If you're doing deep learning where they read the same files over and over again the file system might have problems responding to the requests. Sharding datasets (splitting it into multiple files) is the easiest approach to fix this.
- Too many small jobs is bad for the queueing and a lot of time is used for the startup vs actual run time.
- One should try to collect jobs with similar time / resource requirements to the same array/workflow. Using same scripts to manage different solvers / different datasets that take wildly different times to execute will usually result in problems.

14/Feb Meeting agenda

Where: https://aalto.zoom.us/j/69608324491 (9am CET / 10am EET)
Agenda:
1. Intro: Refreshing your memory about goals, format, days (Enrico)
  - Example case: Singularity https://hackmd.io/@coderefinery/
  - reminder: target audience are existing HPC users, and this means we will need to test all lessons / exercises with each cluster.
    - e.g. a script collecting all commands that should be tested in various clusters
  - it would be good to check which lessons would be highly different across clusters (e.g. LUMI does not have many modules and so singularity images might be needed)
  - Remember: these tests are for the exercises, it's ok to say it doesn't yet work on lumi/sigma2/csc/etc.
  - Remember: most likely the audience is not interesting to move between clusters
  - Remember: we can mention the differences for the exercises or if it doesn't (yet) work at all.
  - Remember: each center can test the exercises and if there is enough time things can even be installed / fixed (let's not wait for the last moment!). If there are exception for exercises we can use sphinx tabs.
2. Decision item: Deciding if some episodes days needs to be swapped
  - At least there was discussion of having day 1 and 2 swapped (Samantha <-> Richard). Day 4 overlaps with ISC conference and some instructors are unavailable (Simo, Radovan, Samantha, ??)
  - Radovan removed as coordinator, swapped days and coordinators for day 1 and 2
3. Decision item: Revising the timeline for development
  - schedule should be fine
  - has everyone indicated which episode?
4. Discussion item: Sphinx templates for the episodes and web page for the course
  - We have this page https://coderefinery.github.io/cluster-workflows/
    - Action point: EG to update here and make the skeleton for singularity episode.
  - Q1: do we keep that as the container of all episodes or should each episode have its own lesson page?
    - Maybe easier if episode has its own lesson repository like coderefinery lessons
    - Then we have a course webpage that links to each lesson/episodes independently.
    - Make sure that we have same sphinx requirements from same template (coderefinery lesson template)
    - One day == one episode = one repository could start with a skeleton or one empty template for the skeleton.
    - name to show that it's the same line
  - Q2: do we keep that page as a generic reusable page (without dates) to contain (or link) the episodes, and host the page with dates somewhere else? We have for example https://scicomp.aalto.fi/training/scip/workflows-2024/
5. Discussion item: Registrations + advertisements
  - Since this is s a beta run, and considering that some tools cannot be self installed, I would focus on our clusters' users for advertisement. Other clusters' users are welcome, we should
  - Proposed strategy for registration: 4 calendar events as four registration form (one per day) so that when you fill them in you receive a calendar invite. All forms have a tick box "register me for all the other days" so you don't have to fill in 4 forms, but if you have special needs there will be the option "if you are interested only in certain days, go to individual registration forms". Is this reasonable? (the technological limitation is that to send four calendar events to email, four different instances need to be created… or then we give up to calendar events…)
    - Registrations can stay open until the very last day.
    - Let's keep the Aalto webropol for now, and let's see if next time we need something better like indico.
6. Anything else? (add here)
  - advertising: each site can send to their users, some newsletter at each site and intrapages (e.g. at Aalto we have event pages).

OLD CONTENT HERE BELOW

December 2023 meeting summary and voting

We will try 4 "standalone" days (see below)
They will be the same day of the week: Tuesdays or Wednesdays in April & May
We will try the format streaming (2h, with some exercises) +lunch (1h) +zoom interactive sessions (1.5h?) with hands-on exercises
For getting the work done: There will be a responsible instructor for each day, with a team of helpers / co instructors / co course materials developers. Enrico will try to push for deadlines and further meetings with the 4 teams.
We need to fix dates + a name asap, see the polls below.

Poll 1: Vote for a name

Note: "Workflows" might be confusing for those using snakemake or nextflow. For some options I tried to keep the name of the week + word starting with same letter.

Tuesdays Tools for HPC: ooo
Tuesday Techniques for HPC: ooo
Tuesday Tools and Practices for HPC: ooo
Wednesdays Workflows for HPC: ooo
Getting things done with HPC: o
HPC Workflows Month: oo
…??

Poll 2: Vote for the Days

Notes:

the week of the 1st of May is problematic, so we skip it. Wed 17/4 there is a CSC workshop overlapping.
RB: May 12-16 is ISC conference so I won't be able during those dates

Tu 16.4, 23.4, 7.5, 14.5: oooo
We 17.4, 24.4, 8.5, 15.5: oo
Tu 23.4, 7.5, 14.5, 21.5: oo
We 24.4, 8.5, 15.5, 22.5: oo

December 2023

Let's meet so that we can make this happen https://doodle.com/meeting/participate/id/erEjrWWd
People: "Glerean Enrico" enrico.glerean@aalto.fi; "samantha.wittke@csc.fi" samantha.wittke@csc.fi; "radovan.bast@uit.no" radovan.bast@uit.no; "diana.iusan@uppmax.uu.se" diana.iusan@uppmax.uu.se; "Darst Richard" richard.darst@aalto.fi; "Pfau Thomas" thomas.pfau@aalto.fi; "matteo.tomasini@lir.gu.se" matteo.tomasini@lir.gu.se; "sabryr@uio.no" sabryr@uio.no; "Dhanya.Pushpadas@uib.no" Dhanya.Pushpadas@uib.no; "Pavlin.Mitev@uppmax.uu.se" Pavlin.Mitev@uppmax.uu.se; "Ferranti Luca" luca.ferranti@aalto.fi; "Tuomisto Simo" simo.tuomisto@aalto.fi; Rantaharju Jarno jarno.rantaharju@aalto.fi; "rasmus.kronberg@csc.fi" rasmus.kronberg@csc.fi; maiken.pedersen@usit.uio.no

December 2023 meeting agenda

Proposal
- let's remember this is a "test run", so let's try things which might not work out
- the course could be 4 days, once a week, same day of the week, e.g. "Workflow Wednesdays" or some other catchy name; maybe we could call it "Workflow Days" as it is a 1-day workshop
  - 9-11 April NRIS overlapping workshop
  - 17-19 April CSC overlap
  - ISC May 12-16
- each day can contain more sub-episodes. Each episode should possibly be self-contained.
- When? April/May 2024
- Possible format: demo/lecture via twitch, break+small exercise accessible for everyone, more demo, lunch, longer hands on exercise on zoom for users of the clusters
  - another option is to do after Stockholm lunch (e.g. 12:00 CET) and have 3 hours non stop
- Episodes:
  - 1. working on clusters (interactive sessions, data (and code) access/moving/versioning, graphical tools)
  - 1. computational resources (memory/cpus/gpus, monitoring computations, monitoring I/O, local disks/ramdisks)
  - 1. containers on clusters (everything from zero to infinity)
  - 1. parallelization (see old day 3 below: Parallelizing code without parallelizing, workflow tools, Hyperscaling pitfalls)

Actions and comments

Enrico to set up a "poll" so that we can pick the best days + times
- May 1st is a Wednesday (but the Workflow Wednesdays is a good name)
- CSC user support coffees are each Wednesday at 14-15 EET
- 3 days can also be enough instead of 4
- Also every second week is an option +1
- Or it's called "workflow weeks" and not on a Wednesday (to avoid May 1st)? +2
Find one lead for each of the 4 sessions. Each lead appoints one co-instructor
snakemake nextflow workflow course
let's think of a name that might not confuse people when it comes to workflow tools
csc developing materials for the csc environment and some subjects will overlap, but no focus on workflow managers (more general, what issues might arise when scaling up e.g. with I/O or job steps and how to mitigate)

Old overall plan

3 days x 3h
Livestream course, everyone welcome to watch + help
smaller and more modular structure is the best, try to reduce dependencies and make sure you can learn something from each alone (even if tehy work better together)
We need to make sure examples work on "all" clusters (we assume Slurm though)
Materials could also highlight the differences between clusters and what this means for your work
- There are some "custom tools" only in some clusters (and it is good to hear about them and link them)
Let's try to have drafts of the lessons by end of January
We still don't have a good date for the course… for some February is too soon… is March or April better?
- NOT 7-9 february (CSC)

Schedule

Day 1: doing actual work with a cluster

From laptop to cluster: Syncing code (and data) (45 min) (EG, RD)
- story 2 + 5 (or keep them separated, or sequential)
- Three points
  - Arranging projects well and using version control makes it possible to work between laptop and cluster, and this is important
  - Understand example of a good project arrangement
  - How this works with multiple people
- Motivating example/exercise:
Side episode: sshfs, short demo (10 min) (??, ??)
- Three points:
  - Mounting filesystem is distinct from copying
  - Sshfs works on "anything" you can ssh to
- Motiving example: demo only
Interactively working on a cluster (45 min) (EG, RD)
- story 2 + 5 (or keep them separated, or sequential)
- Three points
  - A cluster is not just batch: interactive is possible and good for debugging
  - Basic interactive jobs from terminal
  - Interactive with graphical interfaces
- Motivating exammple/exercise:
Remote interactive example: vscode (25 min) (RD, ??)
- Three points
  - Gui tools like Spyder/vscode can run on cluster, from your computer, but isn't always a good idea
  - What is required to set up that connection
  - Common problems that occur with this
- Motivating exampl/exercise:

Day 2: managing resources

Benchmarking & choosing job parameters (50 min) (ST, RD, RB: we have material in Norway for memory and num of cores calibration, DI: same in SE, I can contribute with something)
- Three points
  - Start with large parameters and narrow down after you obsreve how it goes
  - How to check actual usage and example of tuning
  - It's not just CPU: I/O and memory are also important
- Example case:
- https://documentation.sigma2.no/jobs/choosing-memory-settings.html
Monitoring I/O (~25 min) (??, ??)
- Three points
  - Filesystem can be the slowest part of many jobs
  - networked filesystems tend to be best at large files, bad at many small
  - Tools to run to see the performance
- Motivating example/exercise:
Container/archive formats for data (25 min) (DP, ??)
- story 11
- Three points
  - Storing many small files is slow (see previous lesson)
  - There are different formats to combine many files to one
  - You can still efficiently use these container files from your code
- Example case: conda+container? better data formats?
Using local disks and ramdisk (~25min) (??, ??)
- Three points
  - Local disks provide dedicated i/o performance
  - When to use local disks
  - Example of how to script the data management on these.
- https://github.com/hoytech/vmtouch (tool to mention perhaps)
- Motivating example/exercise:
The filesystem-related

Day 3: making the most ouf of a cluster

Parallelizing code without parallelizing (TP, RB, PM(15) DI interested in Slurm solutions) (90 min)
- stories 1, 4, 15, 23
- Three points
  - You don't need (and actually don't want) to parallelize code that often: embarrasingly parallel instead. Sometimes you can't.
  - How to modify code to run parameters separately
  - (leads in to next section)
- Example: Parameter sweeps (in matlab or python?)
  - moving a loop to outside
  - provide parameters via CLI
- Notes:
  - workflow automation maybe not really covered here (see CR lesson)
  - but it can still be used to show
Workflow automation tools (??, ??)
- array jobs and/or workflow automation
Hyperscaling pitfalls (??, ST)
- Three points:
  - Scaling is good, but when you schale toomuch using embarrasingly parallel scale too much, you get problems
  - How to identify these hotspots
  - Where to look next.
- Motivating example/exercise: packing parameter searching into one Slurm allocation

February 2023 Meeting

Let's check where we are

When: 13:00 CET
Where: https://aalto.zoom.us/j/69608324491

Agenda

Worskhop dates
- In zulip early January there was a discussion about dates
- Tentatively last week of April seemed fine for some of us (Tue 25 - Thu 27 April)
- Is it too busy / too early already? We could also go for a "pilot run" without stressing about perfection of materials/formats and do a "better version" in Autumn. Let's decide/vote now.
  - It overlaps with CSC computational school in computational chem
  - WOuld second half of May be better? Maybe not
  - Summer trainees are of course also a nice target
  - June 1st week busy
  - June 2nd week also busy
  - May 9-11 is not good NRIS workshop
  - How about distributing it over large span of days? E.g. by theme
  - How about having low threshold, research software hours-like live demos?
  - Maybe mid / late August? And the summer trainees can attend?
  - Action: email someting to everyone with what we discussed here (EG can do that)
  - There are already some materials related to this topic existing somewher else, we could start from those. That also could lead us to invite others who have these existing materials. We shouldn't be making everything from scratch. E.g. in trömsö there was a course on GPU intro. Multitude of formats of course (slides, pdfs, etc) but it can be converted to
Name
- There were discussion on what name to use. https://coderefinery.github.io/cluster-workflows/ Is everyone happy with current name?
  - The current name seems intuitive enough on what it is about. Maybe we keep it as it is and eventually re-think it if we get feedback from learners?
Longer meeting to discuss the materials:
- We could have a longer meeting in end of March (after CR) to present the materials to each other. Should we decide a date?
  - This depends on item 1 here, maybe we should anyway have a larger meeting with as many as possible? Maybe after CR is done?
Other points to discuss…?
- Contacting/choosing a single coordinating person (Espen?)
- Temporary landing page should be maybe updated https://scicomp.aalto.fi/training/scip/workflows-2023/ and point to the course page

Old stuff for reference

Task 1 - Let’s describe real problems that the course + teaching materials can solve (a.k.a. user stories)

Write a list of potential questions a user might have that could be answered in this workshop (these are like “user stories”). The first examples below are based on the topics above. Feel free to be redundant and write similar questions, we can then merge things together (e.g. see the sweeping parameters question below).

I want to parallelize my code, how do I do it in practice? (TP: could do this)
- general issue of embaraslly parallel
- "move a for loop out of a script/code one level up"
- And what if your for loop is too fast?
  - Batching chunks of "for loops"
- nested for loops might need re-thinking
  - Order of the for loops usually matters (e.g. for each subject + for each model vs. for each model + for each subject)
- Is this a good story?: ooooooo
I develop on my laptop/workstation and also on the HPC cluster, how can I keep things in sync with minimum effort?
- develop code? it seems this is not code development but rather data and scripts
- maybe duplicate with version control (22)
- remote vs local editors. Git push/pull.
- maybe off topic: rsync of data? (git annex? simple rsync?)
- Is this a good story?: oooo
I want to automate my workflow, where do I start?
- What tools to use for automation?
- is this too close to the CR lesson on reproducibility/snakemake?
  - Maybe worth reusing that and eventually improve if something is missing?
- Is this a good story?: o
I need to sweep through multiple parameters, how do I code that? TO MERGE WITH 1
- (this could be merged with 1, e.g. using this case as a toy example with some sweeping parameters that take only few second for loops vs other that take hours)
- could be also done with job arrays
- Quite often this is related to automation. Users often have codes that do everything from parameter sweeps to submission automation. Unraveling this kind of code from "one big block" to individual pieces usually helps.
- Parallelizing without parallelizing
- Is this a good story?: ooo
How can I work interactively (non-gui or gui) on a cluster?
- non graphical
  - sinteractive?
  - srun + tty
  - screen/tmux
  - vscode?
- graphical
  - is open on demand available everywhere? No
  - how about VDI tools?
  - …but can the cluster do that? what issues can come out?
  - how to bypass the need of graphical stuff by scripting things
- How can one find the interactive part of their workflow and optimize for that? Usually only some part of the workflow needs interactivity.
- can the graphical interactive example be comsol? or spyder? or matlab? or jupyter?
- A good example where interactive use is necessary?
  - When there is too much data and has to stay on the cluster but you are still prototpyping and/or visualizing and/or debugging
  - Some tools require interaction to proceed e.g. doing QC of pictures
  - RD: sometimes I had to open a dataset with large memory and do a one-time exploration + processing. this was simplest in IPython+interactive, rather than debugging by submitting many jobs.
- Maybe this overlaps with story 2 (develping locallly + cluster)
- is this a good story?: ooo
I am getting different results on my laptop and on the HPC cluster: how can I move my environment around? (this could be not just conda, but also containers)
- "on the other cluster this ran without problems, please fix"
- problems between different OS (even with environments/containers)
- problems with permissions
- problems with modules
- e.g. gpu code that works on your machine but not on the cluster (or even different gpu arch)
- sometimes the issue is that the code is not written properly, very similar env in different machines can give different results. Teaching conda properly
- Maybe combining conda with singularity? (see radovan's repo)
  - Useful when no internet is available, e.g. sensitive data env.
  - file quota issue all the time every week in Norway
- This relates to story 28 (with the xample of csc container tool for conda)
- is this a good story?: ooo
Do I need to parallelize within my code to compute things at the same time? (When it is unfeasible or skills are lacking to do parallelization on code level)?
- this can be incorported in other stories
- stress that "speed" is a relative metric, do not be surprised if it is slower in the cluster (at the single process level).
- Talk with an "expert" when in doubt
  - or show it to somebody else and get feedback/tips
- How can one estimate the effects of the parallelization? Is there e.g. a for-loop that could be parallelized? Or an algorithm? Has the code been profiled or alternatives (e.g. embarassingly parallel) ruled out?
Why would I want to put effort in learning workflow tools when I can do a lot with bash scripts?
- relates closely to 3
What are good tools and best practices for developing and small scale testing code on cluster?
- See story 5
I would like to run a graphical IDE on HPC, how can I do that? And should I do that? (this could be matlab, rstudio, comsol, spyder, etc) (remote ssh in vscode?)
- relates to 5 (interactive use )
I am out of file-number quota, I have 1M+ small files as results of my analysis, is there a way to optimise disk space, e.g. by tarring them all together? Remapping them to a database? Using other file formats? Will I be able to read them again if I merged them or do I have to “unzip” them each time?
- again containers might save the day
- is it good to fix this issue before writing 1M output tiny files? or post-processing to gather all results?
  - Better catch this as early as possible (writing 1M files per hour is not good for the whole cluster)
  - Things just don't scale after a certain level
- Easily becomes a info dump as there are lots of different file formats. See for example file formats in Python for SciComp. Good examples could avoid this.
- File cleaning is important (this is a good habit to learn, beyond how to practically do that, especially if everything can be reproduced, no need to hoard files) "Live as if scratch were to die tomorrow"
- Think of "data appraisal": what is truly important
- Is this a good story?: ooo
The I/O of my code is slow and I have heard that I could take advantage of the local disk of the computational node: how can I do that in practice? Manually move files around? What if the job fails, will I lose the intermediate results? (DI interested to contribute, but it would be good to have someone else from a diff center)
- The amount of intermediate data can easily be calculated as compute time: how long does it take to re-create the data if I rerun the simulation? Teaching about how to relax about intermediate data might help people use temporary file systems.
- I/O best practices
- This is the same across various cluster, especially if there is lots of I/O and not everything is needed at the end
- This can be close to 11
- is this a good story?: ooooo
I have x TB of data to analyze, how do I get it on the cluster, where and how do I store it? How and where do I share the results?
- or the opposite? the data is on the cluster but we assume that users analyze on their computers and we make it hard for them to run their graphical/whatever analysis on the cluster
- web exports (sharing some folders with other users), but this is a specific request
- Maybe this is too specific to some cases
- Same in ML models to be shared
- Shared databases in the cluster
- There can be cases when databases for the whole community are populated
- How to share data reading code & tools for data access?
- Is this a good story?
I want to collaborate with others on a task using HPC, how do we share and organize code and data?
- Could this be integrated with the "syncing data from laptop to cluster" lesson? I (RD) think I could. +1 See story 2 above.
Pavlin Mitev Here is the situation - you have written a serial Python code that runs and everything is fine. You have done some reasonable optimizations and cleaning of the code, but there is this problem… You have to run the code on (let's say 500 000) inputs from a single file (Molecular Dynamics trajectory is the real data behind the inspiration of this tutorial). Just to make things worse - you CANNOT load the entire set of inputs in to the computer memory to use traditional methods of applying a function on to each element of a list… https://github.com/pmitev/almost-embarrassingly-parallel-python - the content online is working but it is far from compete.. and perhaps too long to be a part/section…
- This seems like a good capstone example somehow? Put everything together, and useful example.
- The way it uses existing libraries means this code is simple, meaning a good example for us!
- Can this be part of the toy example for the sweeping paramters / embarassing parallel story above
I need to run a set of similar jobs and after they are done and only after they are done, I can start a follow-up step with the outputs from step 1 (DI interested)
- Related to workflows story
- Job dependencies works too
  - +1, building workflows with job dependencies (more advanced than a "simple" workflow)
I am unsure where to parallelize: Inside the code? inside the Slurm script? Outside the slurm script? What are typical approaches and pros and cons?
- parallel story above (when loops are too fast, how to reshuffle for)
What are the options to connect to a cluster and do some work? (SSH, OOD, jupyter,…)
- Overlap with the interactive work + all the non-interactive stuff
How should I arrange my project efficiently?
- is this too close to CR lesson? Story 2
- Maybe we can build on top of those more basic lessons and integrate with how to debug/develop/visualize on a local+cluster (see stories above)
How should I arrange my project?

e.g. directory structure, data, etc.
Software packages that might be released.
Learning outcome: more organized data
see 19 above

Version control for small projects

What do you actually do?
Learning outcome: full-cycle of version control in practice.

Configuring runs

e.g. configuration file, specify name, name looked up in config and provides

Parallelizing without parallelizing stuff (DI interested)

Slurm job arrays
How to package smaller jobs into a batch job
Job dependencies

Development workflows for clusters (Working on the cluster without working on the cluster)

Remote mounting / sshfs
Practical git for local dev and remote running

Workflow automation (Running things over and over again without running them over and over again)

Why automation?
Makefiles
Snakemake or something similar
see 3 above

Data harvesting from an API
- Already covered in python-for-scicomp now.
COMSOL
- how to use COMSOL from windows? Triton can act as an extension of your windows workstation to run
  - larger models (triton compute nodes have plenty of memory)
  - faster (triton compute nodes have many CPU-cores)
  - parameter scan (triton compute nodes have many CPU-cores)
Effective use of conda -> see also story 6
- "Conda is nowadays widely used to create reproducible environments for scientific computing. However, one can easily run into problems with environment creation, environment updating and storing of the environments. In this talk we'll present best practices for the use of conda and teach how you can use conda for better productivity and reproducibility."
- Conda use at CSC: https://docs.csc.fi/computing/containers/tykky/
Data collection
Implementing "workers" for doing very large parallel jobs rather than thousands of array jobs (Pavlin, in that case it was done in bash because user did not want to switch to snakemake or similar workflow tools)
Is it better to write a script and never check your jobs? or check your jobs and "waste time"? when is the good balance between?
-. Enrico: I just had this conversation with a user
what tools do I have available? What can I do with it?
-. Counterargument: first I need to have a problem and then I should learn about the tool
- but if you don't have a problem, why are you here
  Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →
- many have a problem but don't know it yet and seeing solutions can help learning about problems
Presenting good general tools useful for many (some IDEs, some parallelization tools)
Scaling calculations in cluster: e.g. how to estimate job runtime from smaller job runtime, how to estimate memory consumption based on data size scaling, how to estimate how long an analysis takes based on how long single analysis takes, how to decide whether optimization is needed?
Benchmarking/profiling (big-picture, overall efficiency)
- Solutions might be different across clusters
- maybe can be combined with 34
- and in general "how much resources should I ask for?"

Notes

This cannot fit into one day, maybe same format as PyForSciComp (4 x 3h)
We have lots of stories, but we don't need to include them all.
The idea is: after the "intro to HPC" what is it like to actually do work?
Cluster is "scary" for some users, it would be good to address this. Also "cluster is too poweful for my needs" and users don't want to use it (and suffer with their workstations/laptops)
Three "big picture" take-home messages of the whole workshop
- Cluster doesn't need to be scary, you shouldn't worry about doing something wrong
- there are good tools to run many similar jobs that may or may not fail and may or may not depend on other jobs
In NO we have basic courses (intro to HPC + something intermediate), and then big jump to MPI,openMP,GPU programming. The scoupe of this course should be something in between, practical without going to these advanced stuff.
- I agree, this is also what we see at our support garage/tickets: people struggle with more "basic" stuff (conda, workflows) and only few really need help with more advanced topics
We should read the sweet spot of teaching what people need without being too basic.
- Many workflows are specialised (e.g. comsol), as soon as they get too specialised, you stop reaching the full audience
- We can filter the stories with this in mind
- We should teach good general tools
The modularity of the course can help solving the "multilevel" issue (pick your own level and follow what you need)
- We need to make it clear from the beginning that we expect people to "choose and pick" and not "sit through it".
Comsol can just be an example of that type of workflow
It would be good to have an overview/illustration of how things are done and then different level of learners can take home what they need
- demo vs type-along vs exercise: what do we target?
Toy problem approach could work well, rather than a specific tool (like comsol)
What about trying to provide (at least wherever possible) a kind of templates that can be used in different programming languages.

Zoom chat copy of what is relevant

From Simo Tuomisto to Everyone 11:01 AM
https://github.com/bast/singularity-conda
From Radovan Bast to Everyone 11:01 AM
i use it personally mostly to get python codes to run on my NixOS which is very strict about library dependencies

From Simo Tuomisto to Everyone 11:02 AM
https://github.com/CSCfi/hpc-container-wrapper.git
From Radovan Bast to Everyone 11:02 AM
i like that it forces people to document their dependencies in a file

From Pavlin Mitev to Everyone 11:35 AM
https://pmitev.github.io/UPPMAX-Singularity-workshop/

From Richard Darst to Everyone 11:52 AM
https://coderefinery.zulipchat.com/#narrow/stream/141114-help/topic/inspection.2Fperformance.20monitoring.20tools/near/308556198

From Radovan Bast to Everyone 12:05 PM
suggestion: it would be good to incorporate/syntesize the user stories and add instructions in writing on what we expect from all as next step. this will also allow those who were not here (Sabry, Samantha. Matias) to join

From Pavlin Mitev to Everyone 12:27 PM
https://github.com/hoytech/vmtouch
What is it good for?
Discovering which files your OS is caching
Telling the OS to cache or evict certain files or regions of files
Locking files into memory so the OS won't evict them
Preserving virtual memory profile when failing over servers
Keeping a "hot-standby" file-server
Plotting filesystem cache usage over time
Maintaining "soft quotas" of cache usage
Speeding up batch/cron jobs
And much more…

Extra

elevator pithces about special tools (snakemake, git annex, singularity)
maybe researchers are interested in giving these elevator pitches
- maybe these can also be done after the course
the final demo of everything from local data science project moved to cluster

* Containers (+ conda ?) (50 min) (ST)
* PM: Singularity, perhaps conda aand/or python-venv
* Three points:
* What is a container?
* Basic usage of Singularity
* The benefits of packaging code into a container.
* Is this going to day 2?

Example case: story #15 about MD stuff?
- This can be part of day 3

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

TTT4HPC (Tuesdays Tools & Techniques for HPC)

tags: Training TTT4HPC

Timeline for development (in weeks)

9/April meeting

List of supported clusters and persons who will test the exercises

Days/content

1. Tue 16/04/2024 :: Computational resources (memory/cpus/gpus, monitoring computations, monitoring I/O, local disks/ramdisks) -> this will become day 1

2. Tue 23/04/2024 :: Working on clusters (interactive sessions, data (and code) access/moving/versioning, graphical tools) -> this will become day 2

3. Tue 07/05/2024 :: Containers on clusters (everything from zero to infinity)

4. Tue 14/05/2024 :: Parallelization

Day 4 practicalities

Meeting 27.3.

Agenda:

Existing "starting" repos:

14/Feb Meeting agenda

OLD CONTENT HERE BELOW

December 2023 meeting summary and voting

Poll 1: Vote for a name

Poll 2: Vote for the Days

December 2023

December 2023 meeting agenda

Actions and comments

Old overall plan

Schedule

Day 1: doing actual work with a cluster

Day 2: managing resources

Day 3: making the most ouf of a cluster

February 2023 Meeting

Agenda

Old stuff for reference

Task 1 - Let’s describe real problems that the course + teaching materials can solve (a.k.a. user stories)

Other questions to answer

Notes

Zoom chat copy of what is relevant

Extra

tags: `Training` `TTT4HPC`