# JuliaCon'21 - GPU BoF
:::info
- **Date:** July 29th, 2021 5:15 PM (UTC)
- **BoF leaders:**
- Tim Besard (JuliaComputing): CUDA.jl, oneAPI.jl, ...
- Valentin Churavy (MIT): KernelAbstractions.jl, ...
- Julian Samaroo (MIT): AMDGPU.jl, ...
- **Agenda**
1. User poll
2. Roundtable discussion
- **Participants:**
- ...
:::
Feel free to edit this document, add your name above, etc...
## User poll
Vote here: http://etc.ch/NiQs
### 1. Who
- Research: 15
- Work: 7
- Personal projects: 4
- Other: 1
### 2. What
- Modelling & simulation: 13
- ML: 7
- Data science: 4
- Image processing: 1
### 3. Where
- HPC: 11
- PC: 10
- Remote PC: 8
### 4. Which back-ends
- CUDA.jl: 18
- AMDGPU.jl: 4
- oneAPI.jl: 1
### 5. How back-ends are used
- Arrays: 11
- Kernels: 7
- Apps: 6
- KA.jl: 4
- Tullio.jl etc: 3
### 6. Most-wanted features
All of the features (performance, distributed, multigpu).
### 7. Most-common issues
- Memory pressure: 9
- Docs: 9
- Device support: 5
- Installation: 5
### 8. Barriers to contributing
- GPUs/compilers are scary: 8
- Hard to contribute to: 5
We want to lower the bar to contributing, because implementing Julia's interfaces often requires domain expertise:
- should be easy: read vendor library docs, use functionality to implement interfaces
- might be hard: not all vendors (AMD + Intel) make it easy to build/use their numerical libraries
- most bang-for-buck: try to add basic wrappers (with Clang.jl) for everything you can, add high-level wrappers for what you know
- pure Julia?: maybe try out pure-Julia implementations on the GPU, benchmark/feature compare against vendor's libraries
## Roundtable discussion
Please suggest anything you want to discuss, either by editing this document, or by suggesting it in the text or voice channels on Discord.
- Pure Julia replacements for BLAS, FFTs, RNG, etc.?
- Requires lots of domain expertise
- AMDGPU's support for linear solvers may need a pure Julia replacement (rocALUTION is a C++ API)
- improve KA.jl with ideas from e.g. Triton to make it easier to do so?
- Integration with Base Atomics (Julia 1.7)
### Barriers to contributing
- integrating new libraries (e.g. -mg versions) is hard: document procedure for that
### Common user problems?
- Better documentation for vendor-specific intrinsics, and when to use them (with examples?)
- GC performance issues
- Might be alleviated by work on Escape Analysis and compile-time finalization (currently WIP)
### Other
- Local CUDA: don't just remove, it's needed. Can use preferences (and an appropriate API) to point CUDA.jl to the local installation
- AMDGPU: Artifacts are on the way! Need help with getting JLLs to build and load properly.
- Example on unified memory for arrays that do not fit in GPU memory (or, how to use this with GPUs that have very little memory)