Try   HackMD

N8 GPU Python Workshop 2023-03-09

Hackpad for N8 CIR GPU Python workshop based on Carpentries Incubator lesson.

Usage of this hackpad

You can use this hackpad by clicking the edit button.
This will present the hackpad in a split screen view with editable markdown on the left hand side and rendered markdown on the right hand side.

Agenda

Time Agenda
10:00 Introduction
10:20 Using your GPU with CuPy
11:00 Break
11:10 Using your GPU with CuPy (cont.)
12:00 Lunch
13:00 Accelerate your Python code with Numba
14:30 Break
14:40 Your First GPU Kernel
16:00 Close

Google Colab

We'll be working through todays examples using Google Colaboratory.
You'll need to register a google account to use the service but once registered you can create a new can create a new colab notebook via colab.research.google.com/.

Misc

Code to fetch radio astronomy data:

wget https://github.com/ARCLeeds/lesson-gpu-programming/raw/gh-pages/data/GMRT_image_of_Galactic_Center.fits

Who you are and where do you come from?

  1. Alex Coleman, research software engineer from University of Leeds. Likes Python, R, Rust🦀
  2. Shamil Al-Ameen, Computer Science PhD student from Newcastle University. working with IoT and edge devices.
  3. Miranda Horne, PhD candidate at the University of Leeds, researching machine learning for fluid dynamics.
  4. Ghada AlOsaimi, Durham University, PhD student in Brain controlled veichles, computer vision, AI
  5. Michael McCorkindale, Newcastle University Bioinformatics Support Unit
  6. Dmitry Nikolaenko, research software engineer at Advanced Research Computing @ University of Durham
  7. Chenzi Xu, Postdoc from University of York, working on person-specific automatic speaker recognition
  8. Adrienne Unsworth, Bioinformatician at the Bioinformatics Support Unit @ Newcastle University
  9. Jess Bridgen, postdoc from Lancaster University.
  10. Poppy Welch, research assistant at York University working on person specific automatic speaker recognition
  11. Adam Fletcher, Research Associate, Electrical Engineering, University of Manchester, working on inverse problems in NDT
  12. Joshua Reukauf, Newcastle University, Behaviour Informatics PhD, working with computer vision and object tracking, I like Raspberry-Pis
  13. Monika Gonka, PhD student, University of York
  14. Mikiyas Etichia, PhD student, University of Manchester
  15. Alin Morariu, PhD student, Lancaster University
  16. Adil Ashraf, PhD student, University of Manchester
  17. Ajay B Harish, Lecturer, University of Manchester
  18. Patricia Ternes, research software engineer @ University of Leeds.
  19. Xiaoyuan Luo, PhD student, University of Manchester
  20. Cong Zhang, Lecturer in Phonetics and Phonology, Newcastle University, working on speech prosody (looking for TTS engineer to collaborate)
  21. Yuzheng Zhang , PhD student, Durham University
  22. Clelia Middleton, PhD student Newcastle University Penfold group (Machine learning for X-Ray spectroscopy.)
  23. Daniel Kluvanec, Durham University, PhD Student, working on Deep Learning and image processing with 3D voxel data (seismic images)
  24. Xiaoyue Wu , PhD student, University of Leeds
  25. Amir Mohammad Norouzi, PhD student, University of Manchester
  26. Jordan J. Hood, PhD Student, STOR-i @ Lancaster University, Spatio-Temporal Bayesian modelling
  27. Joseph Umpleby-Thorp, PhD Student, University of York
  28. Leah Stella, University of Manchester
  29. Samantha Finnigan, RSE, ARC, Durham University.

Notes

Choosing grid and block size with cupy.RawKernel

REMINDER:
Please remember to send a link regarding the first tuple (1,1,1) when working with the cuda kernel.

When calling our RawKernel function:

vector_add_gpu((2, 1, 1), (size // 2, 1, 1), (a_gpu, b_gpu, c_gpu, size))

We pass a series of tuples that specify
((grid size), (block size) (arguments for your CUDA function))

This is us specifying the resources we want to run our CUDA code with on the GPU. CUDA GPU's are organised in a hierarchy with threads as the smallest unit of operation. A thread is where a kernel is executed and threads are grouped into blocks. A block can be organised into 1D, 2D or 3D arrays of threads where the maximum number of threads for a single block (regardless of configuration) is 1024. Blocks are organised into grids of blocks in CUDA, grids again can be organised in either 1D, 2D, or 3D structures.

When we call our RawKernel we have to specify to the function the arrangement of the grid of blocks and the individual block size. In the above example, we specify that we want 2 blocks (1D configuration) and those blocks should have the number of threads corresponding to the size variable divided by 2. Further steps in that lesson looked at abstracting this logic to be handled by Python code based on an array of arbitrary size.

How you choose and configure these values is a topic for further exploration and something that we don't have time to explore in this workshop. At present, I would suggest the approach taken in the workshop to programmatically determine grid size and default to blocks of 1024 is good enough.

Additional reading:

Max threads in a thread block (1024):

https://forums.developer.nvidia.com/t/maximum-number-of-threads-on-thread-block/46392

Raw strings in Python

You use a raw string because you might have strings with literal newlines inside your CUDA code block.

If you don't make it raw, that \n gets interpreted as a
literal newline, so you get syntax errors in your code.

Raw string:

r"""print("mystring\n");"""

Not raw:

"""print("mystring
");"""

Oops! That would be a syntax error, because literal newlines aren't allowed in strings. The \n got interpreted into the character 0x0A which is a line break.