Try   HackMD

N8 GPU Python Workshop 2023-04-18

Hackpad for N8 CIR GPU Python workshop based on Carpentries Incubator lesson.

Usage of this hackpad

You can use this hackpad by clicking the edit button.
This will present the hackpad in a split screen view with editable markdown on the left hand side and rendered markdown on the right hand side.

Agenda

Time Agenda
10:00 Welcome and recap
10:20 Your First GPU Kernel
11:00 Break
11:10 Registers, Global, and Local Memory
12:00 Lunch
13:00 Shared Memory and Synchronization
14:30 Break
14:40 Constant Memory
15:00 Concurrent access to the GPU
16:00 Close

Google Colab

We'll be working through todays examples using Google Colaboratory.
You'll need to register a google account to use the service but once registered you can create a new can create a new colab notebook via colab.research.google.com/.

Misc

Code to fetch radio astronomy data:

wget https://github.com/ARCLeeds/lesson-gpu-programming/raw/gh-pages/data/GMRT_image_of_Galactic_Center.fits

Who you are and where do you come from?

  1. Alex Coleman, research software engineer from University of Leeds. Likes Python, R, Rust🦀
  2. Samantha Finnigan, Research Software Engineer at Durham University. Maker of sensors, platforms, and 3D printed things 🏺
  3. Chenzi Xu, Postdoc at University of York, working on person-specific automatic speaker recognition
  4. Poppy Welch, Research Assistant at York University, working on person-specific automatic speaker recognition
  5. Andrew Broad, Postgraduate Researcher, CDT for medical diagnosis and care, Leeds. Final year of PhD. Using ARC and Bede in developing attention-based neural networks.
  6. Josh Reukauf, PhD-Student in Behaviour-Informatics, Newcastle University, interested in automation of welfare measures in primates.
  7. Michael McCorkindale - Bioinformatics Support Unit at Newcastle University. I specialise in HPC pipelining.
  8. Clelia Middleton - PhD student with Penfold Group Newcastle University. My PhD is in machine learning for spectroscopy and the code makes use of tensorflow.
  9. Cong Zhang, Lecturer in Phonetics and Phonology, Newcastle University.
  10. Jordan J. Hood, PhD Student, STOR-i @ Lancaster University, Spatio-Temporal Bayesian modelling (TensorFlow)
  11. Adrienne Unsworth - Bioinformatics support unit at Newcastle University. Mostly analyse single cell & spatial transcriptomics data.
  12. Daniel Kluvanec - PhD student at Durham University. I'm researching Deep Learning for 3D image processing applied to geological seismic data
  13. Xiaoyue Wu - PhD student in Computational Chemistry from University of Leeds
  14. Yuzheng Zhang - PhD student in Engineering from Durham University.
  15. Jess Bridgen, Postdoc, Lancaster University

Notes

GPU Memory allocation:

#hopefully no mistakes here? CM

registers=small, helpful local memory on each thread. high performance. can support small arrays.

with register spillover, data is allocated automatically to local memory (=global divided per thread). (performance penalty.)

nvidia-gpu are not "cache-coherent" - updates to global memory might not be reflected simultaneously until processes completed on all threads.

static allocation of memory to local of a specific size from host can be achieved by adding memory size as an argument to one's kernel.

shared memory = read and write-able from each thread to the block. use __shared__ with variable in kernel to specify this location.

applying keyword extern to the variable permits a parameter submitted from the host to determine size of shared memory item.

if using shared allocation techniques, take care to ensure that different blocks don't overwrite the same spots in shared memory. can account for this with e.g.: use of an appropriate offset when indexing.

atomicAdd can be used in CUDA code to prevent same output being overwritten by threads.

Further Reading