N8 GPU Python Workshop 2023-03-09

Hackpad for N8 CIR GPU Python workshop based on Carpentries Incubator lesson.

Usage of this hackpad

You can use this hackpad by clicking the edit button.
This will present the hackpad in a split screen view with editable markdown on the left hand side and rendered markdown on the right hand side.

Useful links

Course notes
Google Colaboratory
Join the slack channel for this workshop in the N8 CIR slack workspace
Code of Conduct

Agenda

Time	Agenda
10:00	Introduction
10:20	Using your GPU with CuPy
11:00	Break
11:10	Using your GPU with CuPy (cont.)
12:00	Lunch
13:00	Accelerate your Python code with Numba
14:30	Break
14:40	Your First GPU Kernel
16:00	Close

Google Colab

We'll be working through todays examples using Google Colaboratory.
You'll need to register a google account to use the service but once registered you can create a new can create a new colab notebook via colab.research.google.com/.

Misc

Code to fetch radio astronomy data:

wget https://github.com/ARCLeeds/lesson-gpu-programming/raw/gh-pages/data/GMRT_image_of_Galactic_Center.fits

Who you are and where do you come from?

Alex Coleman, research software engineer from University of Leeds. Likes Python, R, Rust🦀
Shamil Al-Ameen, Computer Science PhD student from Newcastle University. working with IoT and edge devices.
Miranda Horne, PhD candidate at the University of Leeds, researching machine learning for fluid dynamics.
Ghada AlOsaimi, Durham University, PhD student in Brain controlled veichles, computer vision, AI
Michael McCorkindale, Newcastle University Bioinformatics Support Unit
Dmitry Nikolaenko, research software engineer at Advanced Research Computing @ University of Durham
Chenzi Xu, Postdoc from University of York, working on person-specific automatic speaker recognition
Adrienne Unsworth, Bioinformatician at the Bioinformatics Support Unit @ Newcastle University
Jess Bridgen, postdoc from Lancaster University.
Poppy Welch, research assistant at York University working on person specific automatic speaker recognition
Adam Fletcher, Research Associate, Electrical Engineering, University of Manchester, working on inverse problems in NDT
Joshua Reukauf, Newcastle University, Behaviour Informatics PhD, working with computer vision and object tracking, I like Raspberry-Pis
Monika Gonka, PhD student, University of York
Mikiyas Etichia, PhD student, University of Manchester
Alin Morariu, PhD student, Lancaster University
Adil Ashraf, PhD student, University of Manchester
Ajay B Harish, Lecturer, University of Manchester
Patricia Ternes, research software engineer @ University of Leeds.
Xiaoyuan Luo, PhD student, University of Manchester
Cong Zhang, Lecturer in Phonetics and Phonology, Newcastle University, working on speech prosody (looking for TTS engineer to collaborate)
Yuzheng Zhang , PhD student, Durham University
Clelia Middleton, PhD student Newcastle University Penfold group (Machine learning for X-Ray spectroscopy.)
Daniel Kluvanec, Durham University, PhD Student, working on Deep Learning and image processing with 3D voxel data (seismic images)
Xiaoyue Wu , PhD student, University of Leeds
Amir Mohammad Norouzi, PhD student, University of Manchester
Jordan J. Hood, PhD Student, STOR-i @ Lancaster University, Spatio-Temporal Bayesian modelling
Joseph Umpleby-Thorp, PhD Student, University of York
Leah Stella, University of Manchester
Samantha Finnigan, RSE, ARC, Durham University.

Notes

Choosing grid and block size with cupy.RawKernel

REMINDER:
Please remember to send a link regarding the first tuple (1,1,1) when working with the cuda kernel.

When calling our RawKernel function:

vector_add_gpu((2, 1, 1), (size // 2, 1, 1), (a_gpu, b_gpu, c_gpu, size))

We pass a series of tuples that specify
((grid size), (block size) (arguments for your CUDA function))

This is us specifying the resources we want to run our CUDA code with on the GPU. CUDA GPU's are organised in a hierarchy with threads as the smallest unit of operation. A thread is where a kernel is executed and threads are grouped into blocks. A block can be organised into 1D, 2D or 3D arrays of threads where the maximum number of threads for a single block (regardless of configuration) is 1024. Blocks are organised into grids of blocks in CUDA, grids again can be organised in either 1D, 2D, or 3D structures.

When we call our RawKernel we have to specify to the function the arrangement of the grid of blocks and the individual block size. In the above example, we specify that we want 2 blocks (1D configuration) and those blocks should have the number of threads corresponding to the size variable divided by 2. Further steps in that lesson looked at abstracting this logic to be handled by Python code based on an array of arbitrary size.

How you choose and configure these values is a topic for further exploration and something that we don't have time to explore in this workshop. At present, I would suggest the approach taken in the workshop to programmatically determine grid size and default to blocks of 1024 is good enough.

Additional reading:

Max threads in a thread block (1024):

https://forums.developer.nvidia.com/t/maximum-number-of-threads-on-thread-block/46392

Raw strings in Python

You use a raw string because you might have strings with literal newlines inside your CUDA code block.

If you don't make it raw, that \n gets interpreted as a
literal newline, so you get syntax errors in your code.

Raw string:

r"""print("mystring\n");"""

Not raw:

"""print("mystring
");"""

Oops! That would be a syntax error, because literal newlines aren't allowed in strings. The \n got interpreted into the character 0x0A which is a line break.

N8 GPU Python Workshop 2023-03-09

Usage of this hackpad

Useful links

Agenda

Google Colab

Misc

Who you are and where do you come from?

Notes

Choosing grid and block size with cupy.RawKernel

Max threads in a thread block (1024):

Raw strings in Python

Read more

2023-03 <br> SWD2: Version Control with Git/ GitHub

2024-04 SWD1a: Introduction to Python

2023-03 <br> HPC0: Introduction to Linux

HPC1 2024-04