taprs
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # What might be useful to know when working with the MPIPZ HPC cluster I tried to summarize what you will likely be doing often on the MPIPZ HPC cluster. I try to provide links for further help when possible. Sometimes there are more special "pro-tips" in citation blocks. ## What is the MPIPZ HPC cluster? A few powerful computers with a common file storage. The best of them (`hpc001..006`) are only usable through the `Slurm` job submission system for fair distribution of compute time. The rest (`dell-node-1..12`) have no such restrictions. A helpful summary by Saurabh is [here](https://rucola.mpipz.mpg.de/p/9834/) (intranet, so with VPN or from the institute). The part about LSF is not true anymore though. ## Where can I get help? - ask around! All we do is sitting by these computers days long anyway - rucola.mpipz.mpg.de -- A forum with questions answered by Saurabh, our bioinformatics expert. He is often helpful in figuring out why some errors occur. - it@mpipz.mpg.de -- The IT team may help set up the PC, install new software etc. - for coding questions, you can also ask Google or ChatGPT. ## Daily routines ### Connecting to the cluster We connect to the cluster using the so-called "SSH protocol". It allows us to use the command line of the remote server from the command line of a local PC. At the institute or with VPN, open command line prompt of your computer and type: ``` ssh -Y username@dell-node-11.mpipz.mpg.de ``` Replace `username`, and you may replace `dell-node-11` with any other server (e.g. `hpc001` if you want to submit a `Slurm` job.) If you can't get VPN access and want to work from outside the institute, ask the IT team for access to `cucumber` server. Then you can do the following to start an interactive SSH session on the cluster: ``` ssh -Y -oHostKeyAlgorithms=+ssh-dss -t username@cucumber.mpipz.mpg.de -t ssh dell-node-11 ``` > **PRO-TIP**: the argument `-Y` is to enable the X server, which is used to forward some graphics in addition to the command line text. You can test if it works by running `xclock`, and if it does, you can e.g. do `display image.png`. It might be hard to set up for Windows; there you can use the `MobaXTerm` terminal application which does this job for you. ### Navigating the filesystem, organizing files As you have connected to the server, you are using the **Linux command line** of the cluster. In particular, we use what is called the "`bash` shell". It has many commands to work with files and directories. If you are new to the Linux command line, [this](https://ryanstutorials.net/linuxtutorial/) looks like a good tutorial to start with (all chapters are relevant for us!), and a cheatsheet with most of the useful commands is [here](https://bioinformaticsworkbook.org/Appendix/Unix/UnixCheatSheet.html#gsc.tab=0). If you want to know a bit more context, check out the (overly) comprehensive ["Linux for bioinformatics" course material](https://drive.google.com/drive/folders/1J8C2olv2yQsBk-vi6VuyzXWE4870yu8E). As you will see, there are four main filesystem partitions accessible to you: - `/netscratch/dep_mercier/grp_novikova/` or "netscratch" -- the partition where we mostly work. We have more space for experiments there, but the files are not backed up. - `/biodata/dep_mercier/grp_novikova/` or "biodata" -- here we mostly store raw data (sequencing, microscope images etc.) and some crucial results (VCF files, genome assemblies, BAM files). This is backed up regularly, but because of that we should not move things around there (otherwise multiple instances of a file will be present in backups). If something is not needed for us on "biodata" anymore but it should stay in the backups for good, we can *archive* it to free some space -- Anna (aglushkevich@mpipz.mpg.de) does that. - `/groups/dep_mercier/grp_novikova/` or "groups" -- not much happens here, but this is normally used as space for photographs from some events etc. -- and occasionally for some experimental data. - `/home/$USER` where `$USER` is your username -- this is your *home directory* where you find yourself when you connect to the server. You cannot store more than 60G in there, and storing files there generally is not advisable because *only you* will have access to them. The exception are some configuration files (e.g. `.bashrc` and `.bash_profile` -- the list of `bash` commands to be executed on every login, setting up custom functions etc.) And here are a few useful places on netscratch: - `/netscratch/dep_mercier/grp_novikova/software` -- some programs that were not available for everyone so we installed them for ourselves - `/netscratch/dep_mercier/grp_novikova/Scripts` -- the programs we write, with subfolders by username ### Looking into files Besides the "normal" text files best handled by native Linux tools, we often work with some special file formats, it's good to know the most common of them: - FASTA (.fasta, .fa) -- simplest sequence/alignment storage format, we use it e.g. for assemblies - FASTQ (.fastq, .fq) -- raw sequencing data format (sequences + quality values) - BAM (.bam) -- read mapping format. Pretty complex, see [specifications](http://samtools.github.io/hts-specs/SAMv1.pdf). - VCF (.vcf, more often compressed .vcf.gz) -- variant calling format. Very complex, you will most likely need to consult [specifications](http://samtools.github.io/hts-specs/VCFv4.4.pdf) if you work with it. - GFF (.gff3, .gff) -- genomic feature annotation format (basically type of feature + coordinates). - BED (.bed) -- genomic region format (in the minimal form -- just the coordinates). Although these are all some kinds of text files and we sometimes process them like normal text, there are special software tools for working with these formats that make life easier. Consider the following tools: - `seqkit` for FASTA and FASTQ - `samtools` for BAM - `bcftools` for VCF - `agat` for GFF - `bedtools` for BED - there is also very nice `csvtk` tool for tables (CSV and TSV) ### Running and writing scripts You can look for examples of our programs in the `Scripts/` folder on netscratch. A script suitable for execution on an HPC node would look like this: ``` #!/bin/bash # The SBATCH comment lines specify run parameters for the submission system. #SBATCH --job-name=example #SBATCH --nodes=1 var1="value1" var2=5 command1 ${var1} command2 ${var2} # Other people might use your script, so keep it clean and # comment difficult places command3 ``` On dell-nodes, you would simply run this script (suppose `script.sh`) like ``` bash script.sh # or /path/to/script.sh # or even ./script.sh if you are in the script's directory ``` The `#SBATCH` lines will then simply be ignored as they are no different from other comment lines. On HPC nodes you would submit it through `Slurm`: ``` sbatch script.sh ``` Some scripts contain lines like `module load foo/bar`. Check [Make yourself comfortable](#make-yourself-comfortable) section to see how one sets up the module system. > **PRO-TIP**: most likely at some point you will need to do something that is difficult to accomplish with existing tools. We use simple scripting languages to deal with these cases. You will likely need some knowledge of the most famous ones -- `R` and `python`, look for tutorials and feel free to ask us for help. If you wonder which one to learn coding in: `R` is a bit harder to install, it is only good fit for statistics and graphics on smaller datasets (e.g. <100M rows), but the code is shorter for these purposes; `python` is a language used for anything, but the code for our tasks will be a bit more convoluted and more third-party packages will be needed. > **PRO-TIP** for dell-nodes: If you want to run a longer program without leaving your PC with the SSH session on, run it from within a `tmux` session and detach it. You will be able to reattach it in a new SSH session. Read more about `tmux` [here](https://hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/). I play it safe and only use `tmux` sessions with dell-nodes. ### Installing software Often you don't find the right tool on the cluster and have to install it. There are a few ways to do it. **The easiest** is to use `conda` if it is possible. Set it up once with `conda init bash` or load a module `module load mambaforge/self-managed/v23*`, and you will be able to create new environments where the right version of the program can be isolated to avoid conflicts: ``` conda create -n myenv softwarename ``` where `myenv` is the environment name (you will load it like `conda activate myenv`) and `softwarename` is the name of the program (make sure it is available in `conda` libraries). **Secondly**, you can compile the software ("build" it from the source code with a compiler program following author's instructions) yourself, but success in that case is not granted. It is easier to do on the build server (i.e. do `ssh build-stretch` first and then compile). It is best to put the self-compiled software to `/netscratch/dep_mercier/grp_novikova/software`. ### Making yourself comfortable A lot of daily routines can be simplified with the right commands and setup, and all of us accumulate our favorite tricks in own configuration files (e.g. `.bashrc`). Do it too! Make yourself a folder in `/netscratch/dep_mercier/grp_novikova` and keep your less useful stuff there. Also, make yourself a scripts folder so people would have access to them. Ask for someone else's `.bashrc` and copy to your home directory. E.g. take mine `/netscratch/dep_mercier/grp_novikova/nikita/.bashrc` but be aware that it's quite busy. Or start from scratch: ``` echo "source ~/.bashrc" > ~/.bash_profile echo "source /opt/share/software/scs/appStore/modules/init/profile.sh" > ~/.bashrc echo "export PATH=/opt/share/software/packages/miniconda3-4.9.2/bin:/opt/share/software/packages/miniconda3-4.9.2/condabin:/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc:/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/bin:$PATH" >> ~/.bashrc ``` Now, if you reload the shell or do `source ~/.bashrc`, you will get access to the *module system* with some programs that are preinstalled on the cluster in loadable environments. See `module avail` for a full list and do `module load name/version` to load one. After some weeks or months, think what long commands you type too often. They can be simplified by aliases or functions in your `.basrhc` like: ``` alias ll="ls -lah --color" # aliases are used for single long commands rebash() { source ~/.bashrc; } # functions are for chains of commands or commands with arguments # Functions can have arguments, specified by $1, $2 etc. or altogether with $@: tssh() { ssh -t -o User=$USER -Y $@ 'tmux attach || tmux new'; } # connect to specified node and attach to a tmux session there; create a new one if none exist. ``` ### Backing things up There are a couple of options to track changes in your files: - Copy most sensible things to biodata once in a while. We have a `scripts_backup` folder there, use it. E.g. i have the following alias in my `.bashrc` that I run once in a while: ``` alias sync.scripts="rsync -avhr --delete /netscratch/dep_mercier/grp_novikova/$USER/scripts/* /biodata/dep_mercier/grp_novikova/scripts_backup/$USER/" ``` - for small files like scripts, you can maintain a `git` repository. `git` is a version control system that saves all intermediate stages of your files. Most important `git` repos of our lab are kept at https://github.com/novikovalab (you will need a GitHub account, ask Polina to connect it to our organization. - Make sure to keep your most precious data (e.g. raw data) on biodata where it is backed up. Make sure it is archived before deleting it from there. ### Monitoring the cluster You can see how busy the cluster is in a few ways: - `Slurm` has tools for tracking HPC loading but I do not know them yet. In theory with them you can even know exactly who is running what and how heavy it is - There are monitor websites for [dell-nodes](http://dell-head.mpipz.mpg.de/ganglia/) and [hpc-nodes](hpc-head.mpipz.mpg.de/ganglia/) (in intranet).

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully