# TTT4HPC - Interactive episode
###### tags: `Training` `TTT4HPC`
Day 2 of TTT4HPC - Tue 23/April/2024: 10-12, 13-14.30 EEST
## Instructors
EG, RD, HF, SW, JR ...?
## Materials
https://github.com/coderefinery/TTT4HPC_Interactive
-> https://coderefinery.github.io/TTT4HPC_Interactive/
## Things to discuss and decide
- :ok_hand: **Setup of the day:**
- Use morning for lectures (on stream), afternoon for exercises (all in zoom), only demos in morning
- :ok_hand: How to make this session not cluster/tool specific?
- First exercises could be "find out how to do X on your cluster" -> link to docs pages
- Tabs with different clusters/tools
- focus on command line tools to show the concept?, it is easy to switch to OOD/GUI tools afterwards?
- RD: some of the presentations *might* be, or might not be that easy to reuse, but it's the talking that has to make sure to provide the bigger picture. Have co-instructors from other clusters who can comment "oh, but is that really general? How would someone else do that?" and so on
- **Goals/topics of this day**
- About "geography of things",
- developing with the cluster (in mind)
- **moving data** and code
- RD: biggest unsolved problems I know of (at the early phase) are moving data around, debugging ("but can I run vscode on it?" - they don't know about cmd line debugging), can only work with "run" button in IDE, Jupyter→ Batch jobs, and *data transfer (most important!)*
- **Prerequisites**
- Anyone can follow.
- If you want to apply it: have access to cluster, basics of CLI
-
- Check timetable below, add your name, check and adjust timing if needed
#### Timing for the day
1) Live stream on TwitchTV 2h
- 00:00 - 00:05 Introduction and practicalities (EG, RD, JR)
- Welcome
- Connection to previous and next days in TTT4HPC
- Setup of the day
- Collaborative Doc
- Icebreaker
- 00:05 - 00:15 Motivation (EG, RD, JR)
- Today is about basic ways to do development work on the cluster
- Part of that is how to do the initial development and debugging on cluster (stuff that is easy locally but hard in batch jobs)
- What are ways you do development? (only on laptop? on login node? submitting jobs and waiting? `srun --pty` ?)
- Reasons you may need interactive work
- You need something visual with a GPU
- Initial development (testing and debugging) more quickly than submit-wait-fix-repeat
- Debugging (fire up debugger on live job)
- Next: we show how to do it with various options (terminal, screen, sinteractive, srun tty, sshfs, gui tools ood vscode)
- ~~start broad, what to do when getting access to a cluster -> "a day in the life with HPC"~~
- ~~story: maybe run a simple script that makes a figure ?~~
- ~~geography of things: different homes in different places, different speed of access to data~~
- 00:15 - 00:20 Good project arrangement (RD, EG)
- Why this is important to start with
- Examples
- No live example but show a project arrangement in the lesson.
- 00:20 - 00:40 Show and tell - syncing data (RD, JR)
- Make a data plan
- Example plan for distributed project
- Demo: show example in lesson
- Transferring data (rsync)
- Demo: show rsync
- Syncing data two-ways (unison)
- Demo: simple unison demo
- git-annex (mention only)
- Mounting data (sshfs)
- Demo: show sshfs, open up a file locally
- 00:40 - 00:50 Show and tell - syncing code (JR, RD)
- Basics of using git on two ends (commit, push, pull, run, edit remotely, commit, etc)
- Demo: doing this
- Discussion about syncing workflows: is this too slow? what if you want to work locally? etc.
- 00:50 - 01:00 Working interactively - GUI (JR, RD)
- There *are* GUIs avaliable
- May be good for editing and light analysis.
- Many start with GUI
- Demos
- Show RStudio via OOD
- Difference in speed between X forwarding and OOD/vnc/etc
- (Ways this is possible on HPC too)
- (RStudio/Jupyter/VSCode)
- (GUI software via OOD or -X)
- 01:00 - 01:10 Break
- 01:10 - 01:30 Working interactively - CLI (RD,JR)
- Interactive working from command line - possibilities
- When do and when is it time to move to batchjobs?
- Demos
- srun --pty bash
- screen/tmux
- cli debugging (pdb)
- shell tips and tricks (common aliases and bashrc config)
- 01:30 - 01:50 Show and tell - VSCode, remote SSH (HF, RD)
- "one foot on cluster, but still comfortable on laptop"
- other tools that provide this option: Matlab, ...?
- How to setup?
- Common problems/issues
- 01:50 - 02:00 Summary, future direction, what we will do in the exercises (EG, RD, JR)
2) Lunch 1h Go eat
3) Zoom 1.5h
- () Introduction to exercises
- Exercises, add your name for preparation
- We could provide a bunch of exercises and people can move into breakoutrooms depending on which exercise they want to discuss? Some cluster dependent breakoutrooms (if available) E.g.
- (SW) Docs: Find out what options your cluster has for
- data storage and transfer
- running GUI tools
- VSCode SSH remote support
- (RD) Try out syncing some data
- GUI
- web interface
- CLI
- (JR) Try out syncing code
- git
- (JR) Try out running some GUI interactively
- (EG) Try out running some CL tool interactively
- (HF) Try out SSH remote for VSCode
- () Summary, wrap up, where to go from here
#### Exercises
---
## From earlier, partly outdated
### User stories
* 2: I develop on my laptop/workstation and also on the HPC cluster, how can I keep things in sync with minimum effort?
* remote vs local editors. Git push/pull.
* rsync of data? (git annex? simple rsync?)
* 5: How can I work interactively (non-gui or gui) on a cluster?
- non graphical
- sinteractive?
- srun + tty
- screen/tmux
- vscode?
- Matlab?
- graphical
- OpenOnDemand (not available everywhere)
- VDI tools?
- sinteractive +X ?
- Comsol
- Spyder
- Matlab
- Jupyter
- -> how to bypass the need of graphical stuff by scripting things
- How can one find the interactive part of their workflow and optimize for that? Usually only some part of the workflow needs interactivity.
- A good example where interactive use is necessary?
- When there is too much data and has to stay on the cluster but you are still prototpyping and/or visualizing and/or debugging
- Some tools require interaction to proceed e.g. doing QC of pictures
- RD: sometimes I had to open a dataset with large memory and do a one-time exploration + processing. this was simplest in IPython+interactive, rather than debugging by submitting many jobs.
### Plan from old planning doc
* ** Motivation**
* geography of things
* **From laptop to cluster: Syncing code (and data)** (45 min)
* User stories:2+5
* Three points
* Arranging projects well and using version control makes it possible to work between laptop and cluster, and this is important
* Understand example of a good project arrangement
* How this works with multiple people
* Motivating example/exercise:
* **Side episode: sshfs, short demo** (10 min)
* Three points:
* Mounting filesystem is distinct from copying
* Sshfs works on "anything" you can ssh to
* Performance limits
* Motiving example: creating some figures (mention headless rendering)
* **Interactively working on a cluster** (45 min)
* user stories 2 + 5 (or keep them separated, or sequential)
* Three points
* A cluster is not just batch: interactive is possible and good for debugging
* Basic interactive jobs from terminal
* Interactive with graphical interfaces
* Motivating exammple/exercise:
* **Remote interactive example: vscode** (25 min)
* Three points
* GUI tools like Spyder/vscode can run on cluster, from your computer, but isn't always a good idea
* How to setup?
* Common problems/issues
* Motivating example/exercise:
* Remote Debugging
---