Try   HackMD

This is the old page. For the new page, see https://hackmd.io/@coderefinery/workflowsdevday

Workflows course planning OLD

Target audience

People who have taken the basic HPC course and probably know how to do their basic work, but need to see more practical examples of how to really use anything beyond your own workstation efficiently. We hope the lessons are general enough that the course is widely useful across many different organizations at the same time.

Format

A series of mostly independent presentations, each of which teaches some lesson while showing how people actually work. This is the time to show your advanced setups - learners aren't necessarily expected to be able to follow you in real-time but should get the idea and can go back and study.

Joining us

We welcome people to help make the plan and teach. The course would be livestreamed so you can teach some and also mentor your own local audience. Join CodeRefinery chat.

Brainstorming

Note: list below also copied to https://hackmd.io/@coderefinery/workflowsdevday

  • Connecting to the cluster - different options
    • SSH, SSH tips and tricks
    • jupyter
    • Open OnDemand
    • Learning outcome:
  • How to arrange projects
    • e.g. directory structure, data, etc.
    • Software packages that might be released.
    • Learning outcome: more organized data
  • Version control for small projects
    • What do you actually do?
    • Learning outcome: full-cycle of version control in practice.
  • Configuring runs
    • e.g. configuration file, specify name, name looked up in config and provides
  • Parallelizing without parallelizing stuff
    • Slurm job arrays
    • How to package smaller jobs into a batch job
    • Job dependencies
  • Development workflows for clusters (Working on the cluster without working on the cluster)
    • Remote mounting / sshfs
    • Practical git for local dev and remote running
  • Workflow automation (Running things over and over again without running them over and over again)
    • Why automation?
    • Makefiles
    • Snakemake or something similar
  • Data harvesting from an API
  • COMSOL
    • how to use COMSOL from windows? Triton can act as an extension of your windows workstation to run
      • larger models (triton compute nodes have plenty of memory)
      • faster (triton compute nodes have many CPU-cores)
      • parameter scan (triton compute nodes have many CPU-cores)
  • Effective use of conda
    • "Conda is nowadays widely used to create reproducible environments for scientific computing. However, one can easily run into problems with environment creation, environment updating and storing of the environments. In this talk we'll present best practices for the use of conda and teach how you can use conda for better productivity and reproducibility."
  • Data collection

Old stuff

Date + Time

9/12/2022 - 9:00CET 10:00 EET

Connection details

Let’s use Aalto Scientific Computing standard zoom, this increases the chances of getting other people interested. We will go to a breakout room there.

https://aalto.zoom.us/j/61322268370

Chat at https://coderefinery.zulipchat.com/#narrow/stream/135843-workshops/topic/data.20analysis.20workflows

Agenda

  • 0h:00m | 0h:30m Decide the format for the course + teaching materials. EG: I propose independent self-contained 1h or 2h episodes based on the user stories. RD: +1, should be possible to attend mostly independently (or with few clear dependencies that could be studied later.)
  • 0h:30m | 1h:30m Decide the topics based on the user stories above.
  • Maybe we don't need to split 1h:30m | 2h:30m Divide and each participant sets up one (or more) lesson page and writes learning outcomes and describes lesson structure, potential exercises, ideas for examples, etc…. Each person could work on one (or more) topic. If we are many we can form groups. (this can be less than 1h if needed, we can improvise)
  • Maybe not needed 2h:30m| 4h:00m Re join to present and discuss learning outcomes + descriptions. Improve and iterate descriptions.
  • Hackathon ends and each individual/subgroup develops the materials and we meet again in January to present the work.

Current workshop page (not for teaching materials)

https://scicomp.aalto.fi/training/scip/workflows-2023/

CopyPaste of what is on the page above about topics:

  • Parallelizing things without parallelizing things (array jobs)
    • Would things like gnu parallel, hyperqueue, prefect, dask or such tools also fit here? Or in workflow automation?
    • DI: I don't know exactly what all these tools do, but gnu parallel for sure fits in here as well. PM might be interested to do the gnu parallel, but he should confirm it. Also, he does not need to do it alone if more are interested.
  • Version control for small projects - how it’s actually used
  • Developing on the cluster - or locally?
  • Workflow automation
  • Configuring simulations to sweep through different parameters.
  • How we actually connect to the cluster and work there.

To do before the meeting

  • Task 1: user stories
  • Task 2: improve agenda for the day