# RSE kickstart checklist This list is made for new people, who have been hired as RSEs at Aalto. This isn't a list of what someone has to know to apply, **and not what people should know before starting**. It only provides a map, a RSE will incrementally learn things here (and probably things not on the list - this list is what we already know). In the future, we expect this list to be copied and reused in other contexts. Someone might take ~6 months to slowly learn things on this list as they need them. Not everything will be needed. ## Linux and shell * Shell scripting and the OS interface * Advanced Bash SCripting guide: https://tldp.org/LDP/abs/html/ * Containers * Docker: https://docs.docker.com/get-started/overview/ (though we don't Docker on the cluster, it's good to know anyway) * Dockerfile reference: https://docs.docker.com/engine/reference/builder/ * Apptainer (formerly called "singularity"): * This is what is actually used on the cluster * https://apptainer.org/docs/user/latest/ * Key things: make own container, convert docker to apptainer, how to run on cluster * singularity_wrapper and how it works with Lmod: https://scicomp.aalto.fi/triton/usage/singularity/ (load a singularity module and you can see where the script is) * Lmod: * https://lmod.readthedocs.io/en/latest/ * (hint: personal modulefiles: mkdir ~/modulefiles ; module use ~/modulefiles) * Mainly basic use + writing modulefiles * Conda * especially resolving GPU code related issues * ## Software development tools CodeRefinery lessons (https://coderefinery.org) * git-intro: https://coderefinery.github.io/git-intro/ * and git-collaborative: https://coderefinery.github.io/git-collaborative/ * Reproducible Research: https://coderefinery.github.io/reproducible-research/ * Documentation: https://coderefinery.github.io/documentation/ * (Jupyter: https://coderefinery.github.io/jupyter/) * Automated Testing https://coderefinery.github.io/testing/ * [Modular type-along](https://coderefinery.github.io/modular-type-along/) or [Modular code developmenent presentation](http://cicero.xyz/v3/remark/0.14.0/github.com/coderefinery/modular-code-development/master/talk.md) * Social coding: https://coderefinery.github.io/social-coding/ There are a few other interesting CodeRefinery lessons: https://coderefinery.org/lessons/ ## HPC * Triton tutorials is what we expect our users to know, and reading through these is enough (it will be familiar): https://scicomp.aalto.fi/triton/#tutorials * And in general, browse (but not read in detail) the rest of the Triton https://scicomp.aalto.fi/triton/ ## Programming ## Python * Python virtual environments and Conda environments: https://scicomp.aalto.fi/scicomp/python/ , https://scicomp.aalto.fi/triton/apps/python/ * Be able to create a virtual environment * Python module/package structure * Python packaging * https://packaging.python.org/en/latest/tutorials/packaging-projects/ * setup.py vs pyproject.toml (newer) * Python command line interfaces (argparse), installing interfaces via packages, ... * Other steps for a good project * Good project structure (module-name/module_name/) * Command line interface * Modular and maintainable code * Installable: setup.py vs pyproject.toml * Linter (if worth it) * Test coverage (if worth it) * Good documentation (README, code-level docs, sphinx + RTD/gh-pages) * Automated tests to the degree useful for the project. At least minimal. * Github Actions * PyPI release * conda-forge release * GH-action for releasing to PyPI/conda-forge Examples: * ## Data processing * webdataset ## Data management * FAIR data * Open Science ## Web stuff * django?