Try   HackMD

編譯及安裝 GPUDirect 相關的 rpm

Mellanox OFED GPUDirect RDMA

編譯 rpm

# 編譯 cd /tmp wget https://www.mellanox.com/sites/default/files/downloads/ofed/nvidia-peer-memory_1.1.tar.gz tar -zxvf nvidia-peer-memory_1.1.tar.gz cd nvidia-peer-memory-1.1 ./build_module.sh rpmbuild --rebuild /tmp/nvidia_peer_memory-1.1-0.src.rpm # 編譯完 rpm 位置 /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.1-0.x86_64.rpm # source rpm 位置 /tmp/nvidia_peer_memory-1.1-0.src.rpm

安裝及啟用 Service

  • 可把 rpm 複製到 local yum repository,透過 yum 安裝
yum install nvidia_peer_memory
  • 啟用 service,開機自動 insert kernel module
systemctl enable nv_peer_mem

查閱 kernel module nv_peer_mem 的相依性

# 使用 depmod 查閱 depmod -n |grep -i nv_peer_mem # output extra/nv_peer_mem.ko: extra/nvidia.ko.xz extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko extra/mlnx-ofa_kernel/compat/mlx_compat.ko #使用 modinfo 查閱 modinfo nv_peer_mem | grep -i depends # output depends: ib_core,nvidia

GDRcopy

編譯 rpm

# build cd /tmp wget https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.2.tar.gz tar -zxvf v2.2.tar.gz cd gdrcopy-2.2 yum install rpm-build make check check-devel subunit subunit-devel cd packages CUDA=/usr/local/cuda ./build-rpm-packages.sh # 編譯完 rpm 位置 /tmp/gdrcopy-2.2/packages/gdrcopy-2.2-1.x86_64.rpm /tmp/gdrcopy-2.2/packages/gdrcopy-devel-2.2-1.noarch.rpm /tmp/gdrcopy-2.2/packages/gdrcopy-kmod-2.2-1dkms.noarch.rpm # source rpm 位置 /tmp/gdrcopy-2.2/packages/gdrcopy-2.2-1.src.rpm

安裝及啟用 Service

  • 可把 rpm 複製到 local yum repository,透過 yum 安裝
yum install gdrcopy gdrcopy-kmod gdrcopy-devel
  • 啟用 service,開機自動 insert kernel module
systemctl enable gdrcopy

查閱 kernel module gdrdrv 的相依性

# 使用 depmod 查閱 depmod -n | grep -i gdrdrv # output kernel/drivers/misc/gdrdrv.ko: extra/nvidia.ko.xz #使用 modinfo 查閱 modinfo gdrdrv | grep -i depends # output depends: nv-p2p-dummy

CC BY-NC-SA 4.0 This work is licensed under a CC BY-NC-SA 4.0