Hackpad for N8 CIR GPU Python workshop based on Carpentries Incubator lesson.
You can use this hackpad by clicking the edit button.
This will present the hackpad in a split screen view with editable markdown on the left hand side and rendered markdown on the right hand side.
Time | Agenda |
---|---|
10:00 | Introduction |
10:20 | Using your GPU with CuPy |
11:00 | Break |
11:10 | Using your GPU with CuPy (cont.) |
12:00 | Lunch |
13:00 | Accelerate your Python code with Numba |
14:30 | Break |
14:40 | Your First GPU Kernel |
16:00 | Close |
We'll be working through todays examples using Google Colaboratory.
You'll need to register a google account to use the service but once registered you can create a new can create a new colab notebook via colab.research.google.com/.
Code to fetch radio astronomy data:
REMINDER:
Please remember to send a link regarding the first tuple (1,1,1) when working with the cuda kernel.
When calling our RawKernel function:
We pass a series of tuples that specify
((grid size), (block size) (arguments for your CUDA function))
This is us specifying the resources we want to run our CUDA code with on the GPU. CUDA GPU's are organised in a hierarchy with threads as the smallest unit of operation. A thread is where a kernel is executed and threads are grouped into blocks. A block can be organised into 1D, 2D or 3D arrays of threads where the maximum number of threads for a single block (regardless of configuration) is 1024. Blocks are organised into grids of blocks in CUDA, grids again can be organised in either 1D, 2D, or 3D structures.
When we call our RawKernel we have to specify to the function the arrangement of the grid of blocks and the individual block size. In the above example, we specify that we want 2 blocks (1D configuration) and those blocks should have the number of threads corresponding to the size
variable divided by 2. Further steps in that lesson looked at abstracting this logic to be handled by Python code based on an array of arbitrary size.
How you choose and configure these values is a topic for further exploration and something that we don't have time to explore in this workshop. At present, I would suggest the approach taken in the workshop to programmatically determine grid size and default to blocks of 1024 is good enough.
Additional reading:
https://forums.developer.nvidia.com/t/maximum-number-of-threads-on-thread-block/46392
You use a raw string because you might have strings with literal newlines inside your CUDA code block.
If you don't make it raw, that \n
gets interpreted as a
literal newline, so you get syntax errors in your code.
Raw string:
Not raw:
Oops! That would be a syntax error, because literal newlines aren't allowed in strings. The \n
got interpreted into the character 0x0A
which is a line break.