# Moving your program to a HPC cluster ## Seeing things from program's point of view In this session we'll try to look into why moving your program into a new environment can be difficult. Meet our hero, a simple program that wants to go to a HPC cluster: <img src="https://hackmd.io/_uploads/BJgX79TRr-g.png" width="30%"> Let's investigate how our program would work in a new environment. ## Leaving home ![icon-moving](https://hackmd.io/_uploads/B1ZAIFkL-e.png) *Moving a program to an HPC cluster is akin to moving from your home to a university dorm.* <details> <summary>Full explanation</summary> Most programs are not created with HPC environments in mind. Programs are commonly written and run on laptops, desktops etc. This means that moving into a shared HPC system will be a major transition from the point of the program. Changes include, but not limited to: - Moving from private system to shared system - Operating system might change (e.g. Mac to Linux) </details> ## Planning the move ### Asking for help ![icon-question](https://hackmd.io/_uploads/H1i3xgULWl.png) *Before moving you'll want to ask for moving help from your friends or from a tutor.* <details> <summary>Full explanation</summary> With software, you'll usually want to do the following: </details> 1. Ask you colleagues or local support people if they have used the program before 2. Check existing documentation and the [the issue tracker](https://version.aalto.fi/gitlab/AaltoScienceIT/triton/-/issues) for mentions of the program 3. Check the installation instructions provided by the program's creators ### Check if your apartment is furnished ![icon-furniture](https://hackmd.io/_uploads/ByCgBxILZe.png) *If your apartment is already furnished, you won't need to move as much* <details> <summary>Full explanation</summary> In most HPC systems the system administrators install commonly used or difficult to install programs to all users. Users can then load these programs with a [software module system](https://scicomp.aalto.fi/triton/tut/modules/), that allows for loading different versions of the software one by one. Main commands you should know: - `module avail` or `module spider`: Show available modules - `module spider MODULE_NAME`: Show information on a module and how you should load it - `module load MODULE_NAME`: Load a software module - `module purge`: Unload all loaded modules </details> 4. Check if the software is installed as a module ### Mark fragile things <img src="https://imgs.xkcd.com/comics/software_updates.png"> *All fragile things should be packed in a way that they survive the move.* <details> <summary>Full explanation</summary> With programs, you'll want to record the dependencies, because sometimes programs are really sensitive to their versions. </details> 5. Use some file to keep track of dependencies and version numbers ### Organize your moving boxes ![icon-boxes](https://hackmd.io/_uploads/rJkdDcL8Ze.png) *By marking what is in your moving boxes you can unbox them in the right order* <details> <summary>Full explanation</summary> When moving, you'll want to pack your stuff so that things go to the correct places. This will make unboxing a lot faster. With software, you'll want to record what you're doing so that you or others can replicate the steps. </details> 6. Record all of the steps that you did while installing ## During the move ### Figuring where to put your stuff ![icon-room](https://hackmd.io/_uploads/ryypW4DUbx.png) *When you move into a new apartment, you'll want to know where you should put your belongings.* <details> <summary>Full explanation</summary> Similarly, when you're installing a software, you'll want to know where to install it. HPC clusters usually have a [fast storage system](https://scicomp.aalto.fi/triton/tut/storage/) meant for storing your installations and your data. </details> 7. Check the documentation or ask your local support people on where you should install the program ### Meet your roommates <img src="https://www.smbc-comics.com/comics/1577032064-20191222.png" width="80%"> *Having roommates can * <details> <summary>Full explanation</summary> If you live with roommates, you'll usually share some of the things with your roommatest (like pans and pots in the kitchen), while keeping most of your belongings private. Similarly, you'll want to make certain that other programs do not mess up the installation of your program. Usually the best approach is to give every program their own separate world that they can inhabit instead of trying to install every program alongside other every program: </details> 8. Try to install your program separate from other installations (e.g. when using Python create separate environments with [conda](https://scicomp.aalto.fi/triton/apps/conda/) or other similar tools) 9. Try to avoid installing things globally (e.g. avoid `pip install --user`) 10. Make sure that you activate each software one at a time ## Quippy demo Our first hero today is the [quippy package](http://libatoms.github.io/QUIP/). The inspiration behind this demo came from a Triton user that recently reached out for help with a workflow involving its installation (among many other things). Our mission here is successfully installing a conda environment with quippy in the Triton cluster. <details> <summary>What is quippy?</summary> The quippy package is the Python binary of the [QUIP software](https://github.com/libAtoms/QUIP) (QUantum mechanics and Interatomic Potentials). This Fortran library is a collection of tools for generating the models (known as interatomic potentials) needed to run atomistic simulations, in particular molecular dynamic simulations. This package was developed by scientists for scientists, and they are still maintaining it. It is worth checking if you are planning to do some kind of atomistic modelling. For example, your research involves the computational study of an exotic material (mithril, naquadah, beskar, chemical X, thiotimoline, you name it). These simulations usually require a lot of resources, so you will need to carry out the simulations in a HPC environment. </details> <details> <summary>Full demo with commands (part 1/2)</summary> (Note that the instructions below are specific to the Triton cluster. Remember to check to adapt them accordingly for other HPC resources, following their documentation.) First step is to check: - [Installation instructions](https://github.com/libAtoms/QUIP/tree/public?tab=readme-ov-file#binary-installation-of-quip-and-quippy): package name `quippy-ase`, - HPC cluster documentation for [conda environments](https://scicomp.aalto.fi/triton/apps/conda/#first-time-setup). Now we start setting up the software with the `mamba`-module, which is the recommended approach in Triton. ``` module load mamba ``` We don't want $HOME to be our installation directory anymore, better to move it to our work directory $WRKDIR: ``` mkdir -p $WRKDIR/.conda_pkgs mkdir -p $WRKDIR/.conda_envs conda config --append pkgs_dirs ~/.conda/pkgs conda config --append envs_dirs ~/.conda/envs conda config --prepend pkgs_dirs $WRKDIR/.conda_pkgs conda config --prepend envs_dirs $WRKDIR/.conda_envs ``` Next, we need to describe the environment that we want for our study. Let's move to our project's directory (`cd PROJECT_DIR`) and start with a simple `environment.yml`-file: ``` name: quippy-env channels: - conda-forge dependencies: - python>=3.9 - pip - pip: - quippy-ase==0.10.1 ``` We are only including quippy for now, but we can always add dependencies and update our environment later on. To create the environment, we run: ``` mamba env create --file environment.yml ``` </details> **The installation failed, what happened?** There are several things that could go wrong: - Others' mistake: [Github issue #703](https://github.com/libAtoms/QUIP/issues/703) - Our mistake: missing dependencies, wrong versions It is no one's fault, always remember to **ask for help** and check the documentation. <details> <summary>Full demo with commands (part 2/2)</summary> In this case, the problem is that `quippy==0.10.1` has no wheels available for `python>=3.14` while our environment is using `python==3.14.2`. Let's try again specifying the exact Python version in the `environment.yml`file: ``` name: quippy-env channels: - conda-forge dependencies: - python>=3.9,<3.14 - pip - pip: - quippy-ase==0.10.1 ``` We remove the previous attempt and re-create it again: ``` mamba env remove --name quippy-env mamba env create --file environment.yml ``` Let's activate the new environment to test if we were successful: ``` source activate quippy-env python3 -c "import quippy; print('Success')" ``` Remember to use `conda deactivate` to stop using this specific environment. In practice, the situation is usually more complicated. In this example: - adding more dependencies might or might not generate conflicts with our current environment, - you might need parallel support, - the underlying code is Fortran code, you will need to follow the manual compilation instructions for development purposes (e.g., twiking an existing feature). </details> <details> <summary>What we just did</summary> 1. Ask you colleagues or local support people if they have used the program before - Our user asked us to do this. 2. Check existing documentation and the [the issue tracker](https://version.aalto.fi/gitlab/AaltoScienceIT/triton/-/issues) for mentions of the program - We checked our conda docs on how to create environments. 3. Check the installation instructions provided by the program's creators - We checked installation instructions. 4. Check if the software is installed as a module - We checked that the installation was not installed. 5. Use some file to keep track of dependencies and version numbers - We wrote an `environment.yml`. 6. Record all of the steps that you did while installing - We have commands written in this document. 7. Check the documentation or ask your local support people on where you should install the program - We checked the storage docs / conda docs. 8. Try to install your program separate from other installations (e.g. when using Python create separate environments with [conda](https://scicomp.aalto.fi/triton/apps/conda/) or other similar tools) - We used mamba to install it into a separate environment. 9. Try to avoid installing things globally (e.g. avoid `pip install --user`) - We successfully avoided this. 10. Make sure that you activate each software one at a time - We used `source activate` to activate a single environment. </details> ## Living in your new environment ### What is available in/near your new apartment? ![icon-places](https://hackmd.io/_uploads/SktzcUD8Zg.png) *Sometimes you need directions to reach your destination* <details> <summary>Full explanation</summary> When you move into a new apartment you might not know what are the commonly accepted rules. Where is the laundy room? Where are the utensils? You might assume that they are in the same place as your previous apartment, but that assumption might be wrong. This is similar with programs: loads of programs assume that things are in certain places. Typically programs are configured using: - configuration files - environment variables - command-line arguments </details> <details> <summary>Common environment variables</summary> Common environment variables include: - `$PATH`: colon-separated list of paths where Linux looks for executables. - `$HOME`: Specifies the user's home folder. Lots of programs put stuff here by default and often in hidden folders. - `$XDG_CACHE_HOME`: By default this is `$HOME/.cache`. Lots of programs use this for temporary data storage. - `$OMP_NUM_THREADS`: This environment variable tells some programs how many CPUs it should use. </details> 11. Check what environment variables and command line arguments your program uses ### Forming good habits Once you're familiarized yourself to your new surroundings, you can start creating good habits like cleaning schedules. Writing these down in a calendar can make them even more concrete. Similarly, with programs you'll want to write down how you run the program. 12. Record everything needed to run your program in your submission scripts ## Huggingface demo For this demo we'll use a large language model (LLM) to answer a question. We'll be using library called [transformers](https://huggingface.co/docs/transformers/index) from the [Huggingface ecosystem](https://huggingface.co/). <details> <summary>Full demo</summary> Let's create a Python script that runs our LLM chat pipeline: `huggingface_example.py`: ```python from transformers import pipeline import torch # Initialize the pipeline pipe = pipeline( "text-generation", # Task type model="mistralai/Mistral-7B-Instruct-v0.3", # Model name device_map="auto", # Let the pipeline automatically select best available device max_new_tokens=1000 ) # Prepare prompts messages = [ {"role": "system", "content": "You're an helpful assistant. Answer to the questions with the best of your abilities."}, {"role": "user", "content": "Continue the following sequence: 1, 2, 3, 5, 8"}, ] # Generate text and print the response response = pipe(messages, return_full_text=False)[0]["generated_text"] print(response) ``` Let's then write is an example script: `huggingface_example.sh`: ```bash #!/bin/bash #SBATCH --time=00:30:00 #SBATCH --cpus-per-task=4 #SBATCH --mem=16GB # This is system memory, not GPU memory. #SBATCH --gpus=1 #SBATCH --output huggingface.%J.out #SBATCH --error huggingface.%J.err # By loading the model-huggingface module, models will be loaded from /scratch/shareddata/dldata/huggingface-hub-cache which is a shared scratch space. module load model-huggingface # Load a ready to use conda environment to use HuggingFace Transformers module load scicomp-llm-env python huggingface_example.py ``` We can then run the script in the queue with the following: ```bash sbatch huggingface_example.sh ``` </details> ## Bringing it together :::info *"Ask not what your ~~country~~ program can do for you — ask what you can do for your ~~country~~ program."* - John F. Kennedy ::: 1. Ask you colleagues or local support people if they have used the program before 2. Check existing documentation and the [the issue tracker](https://version.aalto.fi/gitlab/AaltoScienceIT/triton/-/issues) for mentions of the program 3. Check the installation instructions provided by the program's creators 4. Check if the software is installed as a module 5. Use some file to keep track of dependencies and version numbers 6. Record all of the steps that you did while installing 7. Check the documentation or ask your local support people on where you should install the program 8. Try to install your program separate from other installations (e.g. when using Python create separate environments with [conda](https://scicomp.aalto.fi/triton/apps/conda/) or other similar tools) 9. Try to avoid installing things globally (e.g. avoid `pip install --user`) 10. Make sure that you activate each software one at a time 11. Check what environment variables and command line arguments your program uses 12. Record everything needed to run your program in your submission scripts