--- tags: notes, build, rpm, gpudirect --- # 編譯及安裝 GPUDirect 相關的 rpm - 測試環境 : CentOS 7 x86_64 - 本文連結 : https://hackmd.io/@kmo/notes_gpudirect_build ## Mellanox OFED GPUDirect RDMA - 官網 : https://www.mellanox.com/products/GPUDirect-RDMA - 文件 : https://docs.mellanox.com/display/GPUDirectRDMAv17 - github : https://github.com/Mellanox/nv_peer_memory - 加速 NVIDIA GPU 計算非常重要的 kernel module 以及 service - 透過 Mellanox 網卡讓 NVIDIA GPU 的記憶體彼此溝通,不需要再花時間透過 host 的記憶體傳輸 - 和 OS kernel、NVIDIA driver、Mellanox OFED 有相依性,要是上述 3 樣有任一版本升版,建議重編譯 rpm - NCCL 會用到 : https://github.com/NVIDIA/nccl/issues/475 ### 編譯 rpm ```bash= # 編譯 cd /tmp wget https://www.mellanox.com/sites/default/files/downloads/ofed/nvidia-peer-memory_1.1.tar.gz tar -zxvf nvidia-peer-memory_1.1.tar.gz cd nvidia-peer-memory-1.1 ./build_module.sh rpmbuild --rebuild /tmp/nvidia_peer_memory-1.1-0.src.rpm # 編譯完 rpm 位置 /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.1-0.x86_64.rpm # source rpm 位置 /tmp/nvidia_peer_memory-1.1-0.src.rpm ``` ### 安裝及啟用 Service - 可把 rpm 複製到 local yum repository,透過 yum 安裝 ```bash= yum install nvidia_peer_memory ``` - 啟用 service,開機自動 insert kernel module ```bash= systemctl enable nv_peer_mem ``` ### 查閱 kernel module nv_peer_mem 的相依性 ```bash= # 使用 depmod 查閱 depmod -n |grep -i nv_peer_mem # output extra/nv_peer_mem.ko: extra/nvidia.ko.xz extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko extra/mlnx-ofa_kernel/compat/mlx_compat.ko #使用 modinfo 查閱 modinfo nv_peer_mem | grep -i depends # output depends: ib_core,nvidia ``` ## GDRcopy - github : https://github.com/NVIDIA/gdrcopy - 主要用於 MPI compiler,和 [UCX](https://github.com/openucx/ucx) 及 [MVAPICH2-GDR](http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr-23) - 和 NVIDIA driver 有相依性,建議 driver 升版也要一起重編 - 讓 host CPU 可以直接讀寫 GPU 的記憶體 - NCCL 並沒有用到 GDRcopy : https://github.com/NVIDIA/nccl/issues/475 ### 編譯 rpm ```bash= # build cd /tmp wget https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.2.tar.gz tar -zxvf v2.2.tar.gz cd gdrcopy-2.2 yum install rpm-build make check check-devel subunit subunit-devel cd packages CUDA=/usr/local/cuda ./build-rpm-packages.sh # 編譯完 rpm 位置 /tmp/gdrcopy-2.2/packages/gdrcopy-2.2-1.x86_64.rpm /tmp/gdrcopy-2.2/packages/gdrcopy-devel-2.2-1.noarch.rpm /tmp/gdrcopy-2.2/packages/gdrcopy-kmod-2.2-1dkms.noarch.rpm # source rpm 位置 /tmp/gdrcopy-2.2/packages/gdrcopy-2.2-1.src.rpm ``` ### 安裝及啟用 Service - 可把 rpm 複製到 local yum repository,透過 yum 安裝 ```bash= yum install gdrcopy gdrcopy-kmod gdrcopy-devel ``` - 啟用 service,開機自動 insert kernel module ```bash= systemctl enable gdrcopy ``` ### 查閱 kernel module gdrdrv 的相依性 ```bash= # 使用 depmod 查閱 depmod -n | grep -i gdrdrv # output kernel/drivers/misc/gdrdrv.ko: extra/nvidia.ko.xz #使用 modinfo 查閱 modinfo gdrdrv | grep -i depends # output depends: nv-p2p-dummy ``` --- [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] This work is licensed under a [CC BY-NC-SA 4.0][cc-by-nc-sa] [cc-by-nc-sa]: https://creativecommons.org/licenses/by-nc-sa/4.0 [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up