# pytorch with mpi
[pytorch github](https://github.com/pytorch/pytorch.git)
[Build pytorch with cuda-aware MPI support](https://github.com/Stonesjtu/pytorch-learning/blob/master/build-with-mpi.md)
:::info
check:
mpirun --version
gcc --version
g++ --version
nvcc --version
which nvcc
cmake --version
echo $PATH
echo $LD_LIBRARY_PATH
:::
```
#NSCC dgx
#上互動式節點
qsub -X -I -l select=1:ncpus=5:ngpus=1 -l walltime=01:00:00 -q dgx -A install -P 50000033
qsub -I -l walltime=01:00:00 -q dgx-dev -P 50000033
# 進容器
singularity shell --writable --bind /usr/local/cuda,/home/users/industry/ai-hpc/apacsc19/scratch /home/users/industry/ai-hpc/apacsc19/scratch/dgx/ubuntu_torchucc
USE_CUDA=1
USE_NCCL=1
USE_CUDNN
#載pytorch來弄
#需要有cuda-aware mpi 跟 cuda 、 conda環境
export PATH=/home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/bin:$PATH
export LD_LIBRARY_PATH=/home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
export CMAKE_PREFIX_PATH=/usr
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"
export CUDA_HOME=/usr/local/cuda
export CUDA_LIB=/usr/local/cuda/lib64
export CUDA_NVCC_EXECUTABLE=/usr/local/cuda/bin/nvcc
export CUDA_BIN_PATH=/usr/local/cuda
export USE_CUDA=1
export USE_CUDNN=1
export USE_MKLDNN=1
export USE_NCCL=1
export USE_MPI=1
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule sync --recursive
git submodule update --init --recursive
pip3 install -r requirements.txt
python setup.py clean
CMAKE_C_COMPILER=$(which mpicc) CMAKE_CXX_COMPILER=$(which mpicxx) CMAKE_CUDA_COMPILER=$(which nvcc) python setup.py build develop
```
pytorch from source requirement
```
astunparse
expecttest
future
numpy
psutil
pyyaml
requests
setuptools
six
types-dataclasses
typing_extensions
dataclasses; python_version<"3.7"
```
```
Package: Open MPI apacsc19@dgx4105 Distribution
Open MPI: 3.1.6
Open MPI repo revision: v3.1.6
Open MPI release date: Mar 18, 2020
Open RTE: 3.1.6
Open RTE repo revision: v3.1.6
Open RTE release date: Mar 18, 2020
OPAL: 3.1.6
OPAL repo revision: v3.1.6
OPAL release date: Mar 18, 2020
MPI API: 3.1.0
Ident string: 3.1.6
Prefix: /home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6
Configured architecture: x86_64-unknown-linux-gnu
Configure host: dgx4105
Configured by: apacsc19
Configured on: Mon Oct 11 18:17:19 +08 2021
Configure host: dgx4105
Configure command line: '--prefix=/home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/'
'--enable-orterun-prefix-by-default'
'--disable-getpwuid' '--with-verbs'
'--with-cuda=/usr/local/cuda'
Built by: apacsc19
Built on: Mon Oct 11 18:48:52 +08 2021
Built host: dgx4105
C bindings: yes
C++ bindings: no
Fort mpif.h: yes (all)
Fort use mpi: yes (full: ignore TKR)
Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: yes
Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
limitations in the gfortran compiler and/or Open
MPI, does not support the following: array
subsections, direct passthru (where possible) to
underlying Open MPI's C functionality
Fort mpi_f08 subarrays: no
Java bindings: no
Wrapper compiler rpath: runpath
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C compiler family name: GNU
C compiler version: 7.4.0
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fort compiler: gfortran
Fort compiler abs: /usr/bin/gfortran
Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
Fort 08 assumed shape: yes
Fort optional args: yes
Fort INTERFACE: yes
Fort ISO_FORTRAN_ENV: yes
Fort STORAGE_SIZE: yes
Fort BIND(C) (all): yes
Fort ISO_C_BINDING: yes
Fort SUBROUTINE BIND(C): yes
Fort TYPE,BIND(C): yes
Fort T,BIND(C,name="a"): yes
Fort PRIVATE: yes
Fort PROTECTED: yes
Fort ABSTRACT: yes
Fort ASYNCHRONOUS: yes
Fort PROCEDURE: yes
Fort USE...ONLY: yes
Fort C_FUNLOC: yes
Fort f08 using wrappers: yes
Fort MPI_SIZEOF: yes
C profiling: yes
C++ profiling: no
Fort mpif.h profiling: yes
Fort use mpi profiling: yes
Fort use mpi_f08 prof: yes
C++ exceptions: no
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
OMPI progress: no, ORTE progress: yes, Event lib:
yes)
Sparse Groups: no
Internal debug support: no
MPI interface warnings: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
dl support: yes
Heterogeneous support: no
mpirun default --prefix: yes
MPI_WTIME support: native
Symbol vis. support: yes
Host topology support: yes
MPI extensions: affinity, cuda
FT Checkpoint support: no (checkpoint thread: no)
C/R Enabled Debugging: no
MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA btl: smcuda (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA hwloc: hwloc1117 (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
v3.1.6)
MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v3.1.6)
MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.1.6)
MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v3.1.6)
MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
v3.1.6)
MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
v3.1.6)
MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
v3.1.6)
MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
v3.1.6)
MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
v3.1.6)
MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA odls: default (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA schizo: singularity (MCA v2.1.0, API v1.0.0, Component
v3.1.6)
MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.1.6)
MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: spacc (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
v3.1.6)
MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.1.6)
MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.1.6)
MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
v3.1.6)
MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.1.6)
MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
v3.1.6)
```
## testbuild-參考
[PyTorch Distributed with MPI](https://medium.com/@esaliya/pytorch-distributed-with-mpi-acb84b3ae5fd)
[pytorch github](https://github.com/pytorch/pytorch#from-source)

```
-
-- ******** Summary ********
-- General:
-- CMake version : 3.19.6
-- CMake command : /anaconda3/envs/ubuntu_SIN_DGX/bin/cmake
-- System : Linux
-- C++ compiler : /home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/bin/mpicxx
-- C++ compiler id : GNU
-- C++ compiler version : 7.5.0
-- Using ccache if found : ON
-- Found ccache : CCACHE_PROGRAM-NOTFOUND
-- CXX flags : -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow
-- Build type : Release
-- Compile definitions : TH_BLAS_MKL;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
-- CMAKE_PREFIX_PATH : /anaconda3/envs/ubuntu_SIN_DGX/lib/python3.9/site-packages
-- CMAKE_INSTALL_PREFIX : /pytorch/torch
-- USE_GOLD_LINKER : OFF
--
-- TORCH_VERSION : 1.11.0
-- CAFFE2_VERSION : 1.11.0
-- BUILD_CAFFE2 : ON
-- BUILD_CAFFE2_OPS : ON
-- BUILD_CAFFE2_MOBILE : OFF
-- BUILD_STATIC_RUNTIME_BENCHMARK: OFF
-- BUILD_TENSOREXPR_BENCHMARK: OFF
-- BUILD_NVFUSER_BENCHMARK: OFF
-- BUILD_BINARY : OFF
-- BUILD_CUSTOM_PROTOBUF : ON
-- Link local protobuf : ON
-- BUILD_DOCS : OFF
-- BUILD_PYTHON : True
-- Python version : 3.9.7
-- Python executable : /anaconda3/envs/ubuntu_SIN_DGX/bin/python
-- Pythonlibs version : 3.9.7
-- Python library : /anaconda3/envs/ubuntu_SIN_DGX/lib/libpython3.9.a
-- Python includes : /anaconda3/envs/ubuntu_SIN_DGX/include/python3.9
-- Python site-packages: lib/python3.9/site-packages
-- BUILD_SHARED_LIBS : ON
-- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
-- BUILD_TEST : False
-- BUILD_JNI : OFF
-- BUILD_MOBILE_AUTOGRAD : OFF
-- BUILD_LITE_INTERPRETER: OFF
-- INTERN_BUILD_MOBILE :
-- USE_BLAS : 1
-- BLAS : mkl
-- USE_LAPACK : 1
-- LAPACK : mkl
-- USE_ASAN : OFF
-- USE_CPP_CODE_COVERAGE : OFF
-- USE_CUDA : OFF
-- USE_ROCM : OFF
-- USE_EIGEN_FOR_BLAS :
-- USE_FBGEMM : OFF
-- USE_FAKELOWP : OFF
-- USE_KINETO : 1
-- USE_FFMPEG : OFF
-- USE_GFLAGS : OFF
-- USE_GLOG : OFF
-- USE_LEVELDB : OFF
-- USE_LITE_PROTO : OFF
-- USE_LMDB : OFF
-- USE_METAL : OFF
-- USE_PYTORCH_METAL : OFF
-- USE_PYTORCH_METAL_EXPORT : OFF
-- USE_FFTW : OFF
-- USE_MKL : ON
-- USE_MKLDNN : 0
-- USE_NCCL : 0
-- USE_NNPACK : 0
-- USE_NUMPY : ON
-- USE_OBSERVERS : ON
-- USE_OPENCL : OFF
-- USE_OPENCV : 0
-- USE_OPENMP : ON
-- USE_TBB : OFF
-- USE_VULKAN : OFF
-- USE_PROF : OFF
-- USE_QNNPACK : 0
-- USE_PYTORCH_QNNPACK : ON
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_ZMQ : OFF
-- USE_DISTRIBUTED : 1
-- USE_MPI : ON
-- USE_GLOO : 1
-- USE_GLOO_WITH_OPENSSL : OFF
-- USE_TENSORPIPE : ON
-- USE_DEPLOY : OFF
-- USE_BREAKPAD : ON
-- Public Dependencies : caffe2::Threads;caffe2::mkl
-- Private Dependencies : pthreadpool;cpuinfo;pytorch_qnnpack;fp16;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
-- USE_COREML_DELEGATE : OFF
```
## 一些問題
看起來在DGX上編的openmpi有些Lib不是放在自己的lib(link?)
不知道之後執行可不可以用(自己的mpirun應該可以吃到)

###### tags: `DLRM`