--- tags: notes, build, rpm, gpudirect --- # 編譯及安裝 GPUDirect 相關的 rpm - 測試環境 : CentOS 7 x86_64 - 本文連結 : https://hackmd.io/@kmo/notes_gpudirect_build ## Mellanox OFED GPUDirect RDMA - 官網 : https://www.mellanox.com/products/GPUDirect-RDMA - 文件 : https://docs.mellanox.com/display/GPUDirectRDMAv17 - github : https://github.com/Mellanox/nv_peer_memory - 加速 NVIDIA GPU 計算非常重要的 kernel module 以及 service - 透過 Mellanox 網卡讓 NVIDIA GPU 的記憶體彼此溝通,不需要再花時間透過 host 的記憶體傳輸 - 和 OS kernel、NVIDIA driver、Mellanox OFED 有相依性,要是上述 3 樣有任一版本升版,建議重編譯 rpm - NCCL 會用到 : https://github.com/NVIDIA/nccl/issues/475 ### 編譯 rpm ```bash= # 編譯 cd /tmp wget https://www.mellanox.com/sites/default/files/downloads/ofed/nvidia-peer-memory_1.1.tar.gz tar -zxvf nvidia-peer-memory_1.1.tar.gz cd nvidia-peer-memory-1.1 ./build_module.sh rpmbuild --rebuild /tmp/nvidia_peer_memory-1.1-0.src.rpm # 編譯完 rpm 位置 /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.1-0.x86_64.rpm # source rpm 位置 /tmp/nvidia_peer_memory-1.1-0.src.rpm ``` ### 安裝及啟用 Service - 可把 rpm 複製到 local yum repository,透過 yum 安裝 ```bash= yum install nvidia_peer_memory ``` - 啟用 service,開機自動 insert kernel module ```bash= systemctl enable nv_peer_mem ``` ### 查閱 kernel module nv_peer_mem 的相依性 ```bash= # 使用 depmod 查閱 depmod -n |grep -i nv_peer_mem # output extra/nv_peer_mem.ko: extra/nvidia.ko.xz extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko extra/mlnx-ofa_kernel/compat/mlx_compat.ko #使用 modinfo 查閱 modinfo nv_peer_mem | grep -i depends # output depends: ib_core,nvidia ``` ## GDRcopy - github : https://github.com/NVIDIA/gdrcopy - 主要用於 MPI compiler,和 [UCX](https://github.com/openucx/ucx) 及 [MVAPICH2-GDR](http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr-23) - 和 NVIDIA driver 有相依性,建議 driver 升版也要一起重編 - 讓 host CPU 可以直接讀寫 GPU 的記憶體 - NCCL 並沒有用到 GDRcopy : https://github.com/NVIDIA/nccl/issues/475 ### 編譯 rpm ```bash= # build cd /tmp wget https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.2.tar.gz tar -zxvf v2.2.tar.gz cd gdrcopy-2.2 yum install rpm-build make check check-devel subunit subunit-devel cd packages CUDA=/usr/local/cuda ./build-rpm-packages.sh # 編譯完 rpm 位置 /tmp/gdrcopy-2.2/packages/gdrcopy-2.2-1.x86_64.rpm /tmp/gdrcopy-2.2/packages/gdrcopy-devel-2.2-1.noarch.rpm /tmp/gdrcopy-2.2/packages/gdrcopy-kmod-2.2-1dkms.noarch.rpm # source rpm 位置 /tmp/gdrcopy-2.2/packages/gdrcopy-2.2-1.src.rpm ``` ### 安裝及啟用 Service - 可把 rpm 複製到 local yum repository,透過 yum 安裝 ```bash= yum install gdrcopy gdrcopy-kmod gdrcopy-devel ``` - 啟用 service,開機自動 insert kernel module ```bash= systemctl enable gdrcopy ``` ### 查閱 kernel module gdrdrv 的相依性 ```bash= # 使用 depmod 查閱 depmod -n | grep -i gdrdrv # output kernel/drivers/misc/gdrdrv.ko: extra/nvidia.ko.xz #使用 modinfo 查閱 modinfo gdrdrv | grep -i depends # output depends: nv-p2p-dummy ``` --- [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] This work is licensed under a [CC BY-NC-SA 4.0][cc-by-nc-sa] [cc-by-nc-sa]: https://creativecommons.org/licenses/by-nc-sa/4.0 [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.