--- tags: teaching,n8cir --- # N8 GPU Python Workshop 2023-03-09 Hackpad for N8 CIR GPU Python workshop based on [Carpentries Incubator lesson](https://carpentries-incubator.github.io/lesson-gpu-programming/). ## Usage of this hackpad You can use this hackpad by clicking the [edit button](https://hackmd.io/@research-computing-leeds/n8-gpu-py/edit?both). This will present the hackpad in a split screen view with editable [markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) on the left hand side and rendered markdown on the right hand side. ## Useful links - [Course notes](https://arc.leeds.ac.uk/lesson-gpu-programming/index.html) - [Google Colaboratory](https://colab.research.google.com/) - Join the slack channel for this workshop in the N8 CIR slack workspace - [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html) ## Agenda | Time | Agenda | | --------- | ----------------------------------------------------------------- | | 10:00 | Introduction | | 10:20 | [Using your GPU with CuPy](https://carpentries-incubator.github.io/lesson-gpu-programming/02-cupy/index.html) | | 11:00 | Break | | 11:10 | [Using your GPU with CuPy](https://carpentries-incubator.github.io/lesson-gpu-programming/02-cupy/index.html) (cont.) | | 12:00 | Lunch | | 13:00 | [Accelerate your Python code with Numba](https://carpentries-incubator.github.io/lesson-gpu-programming/03-numba/index.html) | | 14:30 | Break | | 14:40 | [Your First GPU Kernel](https://carpentries-incubator.github.io/lesson-gpu-programming/05-first_program/index.html) | | 16:00 | Close | ## Google Colab We'll be working through todays examples using Google Colaboratory. You'll need to register a google account to use the service but once registered you can create a new can create a new colab notebook via colab.research.google.com/. ## Misc Code to fetch radio astronomy data: ```bash! wget https://github.com/ARCLeeds/lesson-gpu-programming/raw/gh-pages/data/GMRT_image_of_Galactic_Center.fits ``` ## Who you are and where do you come from? 1. Alex Coleman, research software engineer from University of Leeds. Likes Python, R, Rust🦀 1. Shamil Al-Ameen, Computer Science PhD student from Newcastle University. working with IoT and edge devices. 1. Miranda Horne, PhD candidate at the University of Leeds, researching machine learning for fluid dynamics. 1. Ghada AlOsaimi, Durham University, PhD student in Brain controlled veichles, computer vision, AI 1. Michael McCorkindale, Newcastle University Bioinformatics Support Unit 1. Dmitry Nikolaenko, research software engineer at Advanced Research Computing @ University of Durham 1. Chenzi Xu, Postdoc from University of York, working on person-specific automatic speaker recognition 1. Adrienne Unsworth, Bioinformatician at the Bioinformatics Support Unit @ Newcastle University 1. Jess Bridgen, postdoc from Lancaster University. 1. Poppy Welch, research assistant at York University working on person specific automatic speaker recognition 1. Adam Fletcher, Research Associate, Electrical Engineering, University of Manchester, working on inverse problems in NDT 1. Joshua Reukauf, Newcastle University, Behaviour Informatics PhD, working with computer vision and object tracking, I like Raspberry-Pis 1. Monika Gonka, PhD student, University of York 1. Mikiyas Etichia, PhD student, University of Manchester 1. Alin Morariu, PhD student, Lancaster University 1. Adil Ashraf, PhD student, University of Manchester 1. Ajay B Harish, Lecturer, University of Manchester 1. Patricia Ternes, research software engineer @ University of Leeds. 1. Xiaoyuan Luo, PhD student, University of Manchester 1. Cong Zhang, Lecturer in Phonetics and Phonology, Newcastle University, working on speech prosody (looking for TTS engineer to collaborate) 1. Yuzheng Zhang , PhD student, Durham University 1. Clelia Middleton, PhD student Newcastle University Penfold group (Machine learning for X-Ray spectroscopy.) 1. Daniel Kluvanec, Durham University, PhD Student, working on Deep Learning and image processing with 3D voxel data (seismic images) 1. Xiaoyue Wu , PhD student, University of Leeds 1. Amir Mohammad Norouzi, PhD student, University of Manchester 1. Jordan J. Hood, PhD Student, STOR-i @ Lancaster University, Spatio-Temporal Bayesian modelling 1. Joseph Umpleby-Thorp, PhD Student, University of York 1. Leah Stella, University of Manchester 1. Samantha Finnigan, RSE, ARC, Durham University. # Notes ### Choosing grid and block size with cupy.RawKernel > REMINDER: Please remember to send a link regarding the first tuple (1,1,1) when working with the cuda kernel. When calling our RawKernel function: ```python vector_add_gpu((2, 1, 1), (size // 2, 1, 1), (a_gpu, b_gpu, c_gpu, size)) ``` We pass a series of tuples that specify `((grid size), (block size) (arguments for your CUDA function))` This is us specifying the resources we want to run our CUDA code with on the GPU. CUDA GPU's are organised in a hierarchy with threads as the smallest unit of operation. A thread is where a kernel is executed and threads are grouped into blocks. A block can be organised into 1D, 2D or 3D arrays of threads where the maximum number of threads for a single block (regardless of configuration) is 1024. Blocks are organised into grids of blocks in CUDA, grids again can be organised in either 1D, 2D, or 3D structures. When we call our RawKernel we have to specify to the function the arrangement of the grid of blocks and the individual block size. In the above example, we specify that we want 2 blocks (1D configuration) and those blocks should have the number of threads corresponding to the `size` variable divided by 2. Further steps in that lesson looked at abstracting this logic to be handled by Python code based on an array of arbitrary size. How you choose and configure these values is a topic for further exploration and something that we don't have time to explore in this workshop. At present, I would suggest the approach taken in the workshop to programmatically determine grid size and default to blocks of 1024 is good enough. Additional reading: - [CUDA specification](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy-grid-of-thread-blocks) - [Wikipedia entry on thread blocks](https://en.wikipedia.org/wiki/Thread_block_(CUDA_programming)) ### Max threads in a thread block (1024): https://forums.developer.nvidia.com/t/maximum-number-of-threads-on-thread-block/46392 ### Raw strings in Python You use a raw string because you might have strings with literal newlines inside your CUDA code block. If you don't make it raw, that `\n` gets interpreted as a literal newline, so you get syntax errors in your code. Raw string: ``` r"""print("mystring\n");""" ``` Not raw: ``` """print("mystring ");""" ``` Oops! That would be a syntax error, because literal newlines aren't allowed in strings. The `\n` got interpreted into the character `0x0A` which is a line break.