HPC Carpentry
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # High Performance Computing [![hackmd-github-sync-badge](https://hackmd.io/SzLd0RM_T06bCwP4XsdW1g/badge)](https://hackmd.io/SzLd0RM_T06bCwP4XsdW1g) ## An introduction to parallelisation and workflow automation by Alan O'Cais and Peter Steinbach With this single day introduction, we want to take your HPC cluster skills to the next level. We plan to introduce automated pipelines and parallelization suited to our learners. We assume that learners are able to submit single jobs to a SLURM based scheduler and have a basic understanding of the UNIX shell and Python. For parallelization, we aim to provide a thorough introduction on how to approach implementing data parallelism in Python. For this, we will use shared memory parallelisation using multiprocessing and distributed memory parallelisation using message-passing-interface for a compute intensive problem. The day will be concluded with an introduction on how to automate pipelines on a cluster - which are typically found with data intensive tasks. All teaching will be performed hands-on on a custom cluster provided to the students. ## Some House keeping - :handshake: please be advised that we conduct this lesson under [The Carpentries Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html) - :pencil: this is the shared document for collaboarative notes, please use it for notes, links, code snippets that support the content ## Agenda | time | topic | |:--------- |:------------------------------ | | 9am | Meet and Greet, icebreakers | | 9.15am | Jargon Busting Parallelisation | | 9.45/10am | Meet the cluster | | 10:15am | parallelisation for pi | | ~12-1pm | lunch break | | ~1pm-3pm | parallelisation for pi | | ~3.30-5pm | automated workflows | The timings above are estimates. If we finish earlier, we will do so. ## Ice breakers ### Share with us a mythical creature from the region where you grew up! Share what people attribute to this creature. - Rübezahl: a magical giant that roams the mountains, helps the poor and punishes the greedy - Lyho, a one-eyed flash eating entity that inhabits abandoned windmills and sleeps on a bed of bones - Baba Jaga, basically a witch which has something against kids - Кощей бессмертный, undead dude as the character to scare kids in fairytales - Garuda, brave eagle-stands up to evil. - Forest Witch: We lived near a forest which brought up the idea that a witch could live there like in the tale Hänsel und Gretel - Nixenkind (mermaid child), a stone figure of a child next to the church door. The legend claims it's the child of a mermaid and a human that was thrown against the church in anger and turned to stone. - Cu Chulainn, a big dog! - Rat catcher of Hameln - supposedly kidnapped the children of the town Hameln (as kind of revenge to the people who didn't treat him nicely I guess ?!?) - PG tips monkey - dragon(different from western dragon)-control the rain of the area,represents high authority. - Struwwelpeter: Boy who doesn't want to wash - Wolpertinger, a mixture of multiple animals living in the forrest. ### Share with us the moment you conciously used a (personal) computer for the first time in your life! What did you use the computer for? - playing CS1.6 at a lan party - playing a game called "Wolkanoid" on py parent's PC. - played FIFA 2000, maybe back in 2004, on a cousin's computer. - playing minesweeper on my father's PC, but not understanding how it actually works - Before getting my IBM-PC in 1985 (Intel 8088 without any x86 copro and 64kB RAM, no HDD and 360k Floppies), I won a BASIC handbook in a radio competition. As all my friends only had /other/ computers, I could not share games, but started try out all the BASIC language elements that I have read in the book to prove they are working as described. - playing brick buster on my father's windows 3.11 486 machine - we had a game called smelly mysteries on my paero - I got a book from the library that had a program in BASIC and tried to input it into our BBC Micro (it was a computer game and did not work!) - playing computer games, I remember things like Turrican - school project, internet/word processor - Playing a Winnie the Pooh game and Ms Pac Man on my family's Windows 98 PC when I was 3 or 4 years old. - playing some games when I was in primary school - helping my brother play lhx, a 386 game flying heli - playing Moorhuhn, a game where you shoot chicken # Introduction to Parallelisation ## Jargon Busting This exercise will guide us through the first half of the day. 1. Split up in 3 teams! 2. We will send you into break-out rooms. For each team: **Talk for three minutes on any terms, phrases, or ideas around parallelisation or HPC that you’ve come across and perhaps feel you should know better.** Collect these terms below. A list of terms/phrases/ideas around parallelisation or HPC that you’ve come across and perhaps feel you should know better (Retain duplicates.): - __Room 1:__ - OpenMPI : Open Source Library that implements message passing for hpc (MPI=Message Passing Interface) - "Scaling" in the hpc context - performance of parallel code/program on the computing cluster (HPC) when using resources on different scales - slurm - A job scheduler that organizes the execution of multiple computing jobs submitted to a HPC cluster - Parallel file system - read and write accessing by multiple nodes at the same time - CPU/Core/Node - A node refers to the physical box, i.e. cpu sockets with north/south switches connecting memory systems and extension cards, e.g. disks, nics, and accelerators . - A cpu socket is the connector to these systems and the cpu cores, you plug in chips with multiple cpu cores. This creates a split in the cache memory space, hence the need for NUMA aware code. - A cpu core is an independent computing with its own computing pipeline, logical units, and memory controller. Each cpu core will be able to service a number of cpu threads, each having an independent instruction stream but sharing the cores memory controller and other logical units. - __Room 2:__ - Message passing: - A system for communication between threads and processes on different nodes. - *Distributed storage*: disks for storing data that are located in different nodes/computers and only connected by a network - *CUDA*: a programming paradigm to use GPUs for scientific computing (published/support mostly by Nvidia) - SLURM - A job scheduler for shared computing on one or more machines in a "cluster". - A job can be a script or program that a user wants done. - Manages resources (i.e. RAM, available CPUs, disk space, GPUs, etc.) - *Ansible*: a system for automated server/node configuration - *I prefer Julia* - __Room 3:__ - thread safety - *POSIX threads* - *mutexs* - *semaphores* - shared memory - memory that can be simultaneously accessed by multiple programs to communicate without redundant copying of data - to avoid race conditions and simultaneous writes to shared memory, techniques like mutex exist to flag that another process is wri - ting to that memory chunk (depending on the program logic, these techniques are more or less the necessity) - CPU utilization from many cluster nodes seems as one node - Infiniband basically allows this to work performant - Neihboring CPUs can access memory of other Servers/CPUs - How to go beyond symmetrical multi-processing - This is the same like stated above in other words. - Maybe Edge/Fog Computing could be an answer to this question? - *Frameworks for parallel computing on GPUs (coming from CPUs development), ideally python, C/C++ work, agree on Julia* - the jist is, that it depends on your use case and the objects you like to work with (matrices, tensors, graphs, text, ...) - GPU or multi-core backends are prolific in many domains - C/C++: concentrating on the more low level ones, the most prominent are OpenMP/OpenACC for CPU+GPU workloads (expect OS dependencies), CUB/thrust or plain CUDA for Nvidia GPUs, SYCL and friends (which is btw adopted by more and more parties in the industry), OpenCL (experiences an orphaned status) - python: depends on what you want to do, there are high level libs like tensorly etc that focus on a domain (like tensor algebra), the more low level ones are numba, numpy/scipy and the ML-inspired libs like pytorch/jax/... - julia has multi-core and GPU backends AFAIK Things in *italics* we won't have time to touch today (unfortunately!). 3. We will send you into break-out rooms again. For each team: **Identify common words from the list above as a starting point - spend 10 minutes working together to try to explain what the terms, phrases, or ideas on your list mean.** Note: use both each other and the internet as a resource. Identify the terms your groups were able to explain as well as those you are still struggling with. ## Notes In this part of the course, we will go through HPC Carpentries' [hpc-parallel-novice](http://www.hpc-carpentry.org/hpc-parallel-novice/). In case anyone has to leave, this is the material we are teaching and to which you can return on your own in any case. Note that this material is in alpha stage. In case you discover bugs/errors, feel free to let us know or submit an issue to the [repo](https://github.com/hpc-carpentry/hpc-parallel-novice/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc). ### Access to `heibrids.learnhpc.eu` ``` ssh <username>@heibrids.learnhpc.eu ``` Reconvening at 10:10am! * There was a question about frameworks to schedule python kernels. Alan mentioned that Dask might be a good way to help manage this, there was a recent workshop about Dask that might be interesting https://www.youtube.com/playlist?list=PLmhmpa4C4MzZ2_AUSg7Wod62uVwZdw4Rl #### Preparing our environment * Making sure we are all using the same Python, let's load a Python *module* (we can use `module avail` to see what modules are available), `/Python` lets you search for `Python` e.g.: ``` module load Python/3.8.2-GCCcore-9.3.0 ``` * You will also need `numpy`, this comes from a different module. You can search for this python extension with `module spider numpy` (`module spider ...` is a search functionality that comes from our environment module tool, Lmod). It will tell you that you can find it in `SciPy-bundle/2020.03-foss-2020a-Python-3.8.2` ``` module load SciPy-bundle/2020.03-foss-2020a-Python-3.8.2 ``` #### Calculating pi * We write a simple Python code based on numpy that calculates pi * You can find the code that Peter wrote in `/tmp/serial_pi.py` on the login node, take a copy of this if you need it: ``` cp /tmp/serial_pi.py ~/ ``` * Let's time our code execution using the unix function `time` ``` time python ./serial_pi.py 50000000 ``` * Install `line_profiler` python module using (`--user` installs it to the home directory and for the user only) ``` python -m pip install --user line_profiler ``` * some users experience trouble installing kernprof (please list them here and include what you invoked and what you get) * input: as above * output: ``` WARNING: The directory '/home/.../.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. WARNING: The directory '/home/.../.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. * one workaround is adding `--prefix=` to use `python -m pip install --user line_profiler --prefix=` * this works: python -m pip install line_profiler --prefix=/home/<user_dir>/.local * another solution was to run `export HOME=/home/<username>` and then install with `python -m pip install --user line_profiler` * This will install an application `~/.local/bin/kernprof`. We can use `kernprof` to get a profile of our code ``` ~/.local/bin/kernprof -l prof_serial.py 50000000 ``` * We need to use the decorator `@profile` to profile different parts of the code. Starting from `main` and then narrowing our focus to find the hot spots (moving `@profile` from `main()` to `estimate_pi()` to...) * We then can profile our program and see the results: ``` python -m line_profiler prof_serial_pi.py.lprof ``` ###### Using `kernprof` * Download a python script `count_lines.py` and profile it ``` wget http://www.hpc-carpentry.org/hpc-parallel-novice/downloads/count_lines.py ``` # Lunch break ## Please share with us something that you liked or you understood in the morning session :+1: - Like the the little bit of theory and examples from real life. Very interactive and the speed of teaching is perfect ! - I very much liked that it was interactive. At one time I did not correctly copy the python code, and I could not listen to what you were saying, because I was trying to fix the code. After copying your script from tmp I was back on + - very practical - Nice hands-on sessions with simple, yet very enlighting examples + - I like the taks we have been given, I think it is a good example. - great entry level learning curve! - emphasis on thinking about WHAT to paralellize ## Please share with us something that you didn't like, were confused about or want us to improve :-1: - sometimes a little bit fast for me as I am not so familiar with unix command + - i think the general pace could have been a bit faster, but as you announced, this will anyhow happen in the afternoon - I am no so sure about the difference between kernprof an line_profile modules, maybe I missed it but a more explicit differentiation would be lovely! - I'm not so sure about what the purpose of the 'Jargon Busting' was. ++ **We reconvene at 1pm!** ## Comparing numbers ### Estimate the expected Speed-up `S` for your version and for your timings to the actual speed-up that you encounter To calculate the actual speed-up, measure the serial runtime and the parallel runtime for a fixed number of samples and cores, speed-up is then calculated by `t_serial / t_parallel` - estimated speed-up S = 1.92 (p = .957, s=2), observed S = 1.7289 - estimated S = 2 and S = 4 (p = 1.0), observed S = 1.49 and S = 2.46 - estimated S=2, observed S=1.5 - estimated S=4 (p= 1., s=4) ; observed S=2.076 - estimated S=2; observed S= 1.4 ## Message Passing Interface ### Mental model of a cluster ![Mental model](http://www.hpc-carpentry.org/hpc-parallel-novice/tikz/cluster_schematic.svg) ### Check your environment ``` #check module setup $ module list ``` If OpenMPI is not there ``` module load OpenMPI ``` ``` #check if mpirun is available in my current shell setup $ which mpirun ``` ``` #running hostname on 4 different processes at the same time $ sbatch --wrap="mpirun -np 4 hostname" -n 4 -o hostname.out ``` This is what happens in our mental model when we extend across multiple nodes: ![Running on multiple nodes](http://www.hpc-carpentry.org/hpc-parallel-novice/tikz/mpirunhostname_on_clusterschematic.svg) ### Using `mpi4py` Make sure `mpi4py` is in your environment: ``` module load SciPy-bundle/2020.03-foss-2020a-Python-3.8.2 ``` (you could have used `module spider mpi4py` to search for `mpi4py`) The Python script that Peter wrote can be copied from `/tmp/print_hostname.py` ``` cp /tmp/print_hostname.py ~/ ``` ### Parallelising our pi caluculation Script that Peter wrote can be copied from `/tmp/mpi_pi.py` ``` cp /tmp/mpi_pi.py ~/ ``` We should be careful about how we seed our random numbers, the seed should be unique per rank. **break time: we reconvene at 2:50pm!** # Introduction to Automating Workflows Top500 list of HPC https://top500.org/lists/top500/2020/11/ In this part of the course, we will go through the second half of HPC Carpentries' [hpc-python](http://www.hpc-carpentry.org/hpc-python/07-snakemake-intro/index.html) lesson. In case anyone has to leave, this is the material we are teaching and to which you can return on your own in any case. other engines would be: https://nextflow.io/ To get started, please download the lesson tarball that contains the input data and much more. http://www.hpc-carpentry.org/hpc-python/files/snakemake-lesson.tar.gz ``` $ wget http://www.hpc-carpentry.org/hpc-python/files/snakemake-lesson.tar.gz # ... $ tar xf snakemake-lesson.tar.gz # optional $ mv snakemake-lesson workflows ``` To try out the wordcount, if matplotlib is not loaded yet, use the `module load` command provided here: ``` $ python wordcount.py books/isles.txt isles.dat $ head -5 isles.dat $ module load matplotlib/3.2.1-foss-2020a-Python-3.8.2 $ python plotcount.py isles.dat ascii $ python plotcount.py isles.dat isles.png ``` Executing the zipf test (to verify [Zipf's law](https://en.wikipedia.org/wiki/Zipf%27s_law)): ``` $ python zipf_test.py abyss.dat isles.dat ``` For snakemaking: ``` $ module load snakemake/5.32.2-foss-2020a-Python-3.8.2 $ snakemake --version ``` After generating a `'Snakemake'` file: ``` $ snakemake -j1 ``` Visualising the `dats` rule: ![dats DAG](http://www.hpc-carpentry.org/hpc-python/fig/02-dats-dag.svg) For the `Snakefile` that Peter was working on: ``` cp /tmp/Snakefile . ``` - content on resource management with snakemake was skipped, for those interested consult the following page http://www.hpc-carpentry.org/hpc-python/12-resources/index.html To see the SLURM configuration that Peter created: ``` cp -r /tmp/slurm . ``` ## Quality assurance questionaire Please do us the favor and fill in this questionaire: https://forms.gle/G1hNsmE1J8MWEebr6 ## Reference Material * Prerequisite material: https://carpentries-incubator.github.io/hpc-intro/ * http://www.hpc-carpentry.org/hpc-parallel-novice/ * http://www.hpc-carpentry.org/hpc-python/07-snakemake-intro/index.html (and after)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully