or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
[ENCCS Webinar]
Practical Introduction to
GPU Programming
\(~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\)
Yonglei Wang (ENCCS / NSC@LiU)
GPU & HPC
What is GPU?
What is GPU?
Top 10 HPCs
Ref: Top 500 list released on Nov. 18, 2024
HPCs in EU
GPU Architecture
CPU vs. GPU
Ref: GPU Programming: When, Why and How?
GPU architecture
Nvidia GPU architecture
Nvidia GPU architecture
Nvidia GPU architecture
AMD GPU architecture
Intel GPU architecture
Comparison of NVIDIA, AMD, and Intel GPUs
GPU Programming Models
GPU compute APIs
Standard C/C++ & Fortran programming
Directive-based models
The serial code is annotated with directives telling compilers to run specific loops and regions on GPU.
Two representative directives-based programming models are OpenACC and OpenMP offloading.
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →Overview of OpenACC
parallel
andkernel
- used to create a parallel region.loop
,collapse
,gang
,worker
,vector
, etc. - designed to efficiently allocate threads for work-sharing tasks.copy
,create
,copyin
,copyout
,delete
, andpresent
- for managing data transfer between host and device.reduction
,atomic
,cache
, etc. - for special operations that prevent slowing down parallel computation.Key directives for OpenACC
nvc -fast -Minfo=all -acc=gpu -gpu=cc80 Hello_World.c
nvfortran -fast -Minfo=all -acc=gpu Hello_World_OpenACC.f90
Key directives for OpenACC
Non-portable kernel-based models
__global__
to define a device kernelsyncthreads()
- synchronizes all threads within a thread blockCudaDeviceSynchronize()
- synchronizes a kernel call on hostCUDA example: Vector addition
Source: Vector_Addition.cu
Performance and profiling
Using CUDA APIs, we can measure the time taken to execute CUDA kernel functions.
Performance and profiling
Non-portable kernel-based models
Hipification is the process of converting CUDA code to HIP, enabling code to run on both AMD and NVIDIA GPUs.
Portable kernel-based models
Comparison of GPU compute APIs
Python libraries for GPU programming
Python libraries for AI research
Julia libraries for AI research
ENCCS
Lesson materials &
Training events
Lesson materials & Seasonal training events
Nvidia bootcamps
Take-home message