JuliaCon'21 - GPU BoF

# JuliaCon'21 - GPU BoF :::info - **Date:** July 29th, 2021 5:15 PM (UTC) - **BoF leaders:** - Tim Besard (JuliaComputing): CUDA.jl, oneAPI.jl, ... - Valentin Churavy (MIT): KernelAbstractions.jl, ... - Julian Samaroo (MIT): AMDGPU.jl, ... - **Agenda** 1. User poll 2. Roundtable discussion - **Participants:** - ... ::: Feel free to edit this document, add your name above, etc... ## User poll Vote here: http://etc.ch/NiQs ### 1. Who - Research: 15 - Work: 7 - Personal projects: 4 - Other: 1 ### 2. What - Modelling & simulation: 13 - ML: 7 - Data science: 4 - Image processing: 1 ### 3. Where - HPC: 11 - PC: 10 - Remote PC: 8 ### 4. Which back-ends - CUDA.jl: 18 - AMDGPU.jl: 4 - oneAPI.jl: 1 ### 5. How back-ends are used - Arrays: 11 - Kernels: 7 - Apps: 6 - KA.jl: 4 - Tullio.jl etc: 3 ### 6. Most-wanted features All of the features (performance, distributed, multigpu). ### 7. Most-common issues - Memory pressure: 9 - Docs: 9 - Device support: 5 - Installation: 5 ### 8. Barriers to contributing - GPUs/compilers are scary: 8 - Hard to contribute to: 5 We want to lower the bar to contributing, because implementing Julia's interfaces often requires domain expertise: - should be easy: read vendor library docs, use functionality to implement interfaces - might be hard: not all vendors (AMD + Intel) make it easy to build/use their numerical libraries - most bang-for-buck: try to add basic wrappers (with Clang.jl) for everything you can, add high-level wrappers for what you know - pure Julia?: maybe try out pure-Julia implementations on the GPU, benchmark/feature compare against vendor's libraries ## Roundtable discussion Please suggest anything you want to discuss, either by editing this document, or by suggesting it in the text or voice channels on Discord. - Pure Julia replacements for BLAS, FFTs, RNG, etc.? - Requires lots of domain expertise - AMDGPU's support for linear solvers may need a pure Julia replacement (rocALUTION is a C++ API) - improve KA.jl with ideas from e.g. Triton to make it easier to do so? - Integration with Base Atomics (Julia 1.7) ### Barriers to contributing - integrating new libraries (e.g. -mg versions) is hard: document procedure for that ### Common user problems? - Better documentation for vendor-specific intrinsics, and when to use them (with examples?) - GC performance issues - Might be alleviated by work on Escape Analysis and compile-time finalization (currently WIP) ### Other - Local CUDA: don't just remove, it's needed. Can use preferences (and an appropriate API) to point CUDA.jl to the local installation - AMDGPU: Artifacts are on the way! Need help with getting JLLs to build and load properly. - Example on unified memory for arrays that do not fit in GPU memory (or, how to use this with GPUs that have very little memory)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.