# pytorch with mpi [pytorch github](https://github.com/pytorch/pytorch.git) [Build pytorch with cuda-aware MPI support](https://github.com/Stonesjtu/pytorch-learning/blob/master/build-with-mpi.md) :::info check: mpirun --version gcc --version g++ --version nvcc --version which nvcc cmake --version echo $PATH echo $LD_LIBRARY_PATH ::: ``` #NSCC dgx #上互動式節點 qsub -X -I -l select=1:ncpus=5:ngpus=1 -l walltime=01:00:00 -q dgx -A install -P 50000033 qsub -I -l walltime=01:00:00 -q dgx-dev -P 50000033 # 進容器 singularity shell --writable --bind /usr/local/cuda,/home/users/industry/ai-hpc/apacsc19/scratch /home/users/industry/ai-hpc/apacsc19/scratch/dgx/ubuntu_torchucc USE_CUDA=1 USE_NCCL=1 USE_CUDNN #載pytorch來弄 #需要有cuda-aware mpi 跟 cuda 、 conda環境 export PATH=/home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/bin:$PATH export LD_LIBRARY_PATH=/home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/lib:$LD_LIBRARY_PATH export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH export CMAKE_PREFIX_PATH=/usr export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" export CUDA_HOME=/usr/local/cuda export CUDA_LIB=/usr/local/cuda/lib64 export CUDA_NVCC_EXECUTABLE=/usr/local/cuda/bin/nvcc export CUDA_BIN_PATH=/usr/local/cuda export USE_CUDA=1 export USE_CUDNN=1 export USE_MKLDNN=1 export USE_NCCL=1 export USE_MPI=1 git clone https://github.com/pytorch/pytorch.git cd pytorch git submodule sync --recursive git submodule update --init --recursive pip3 install -r requirements.txt python setup.py clean CMAKE_C_COMPILER=$(which mpicc) CMAKE_CXX_COMPILER=$(which mpicxx) CMAKE_CUDA_COMPILER=$(which nvcc) python setup.py build develop ``` pytorch from source requirement ``` astunparse expecttest future numpy psutil pyyaml requests setuptools six types-dataclasses typing_extensions dataclasses; python_version<"3.7" ``` ``` Package: Open MPI apacsc19@dgx4105 Distribution Open MPI: 3.1.6 Open MPI repo revision: v3.1.6 Open MPI release date: Mar 18, 2020 Open RTE: 3.1.6 Open RTE repo revision: v3.1.6 Open RTE release date: Mar 18, 2020 OPAL: 3.1.6 OPAL repo revision: v3.1.6 OPAL release date: Mar 18, 2020 MPI API: 3.1.0 Ident string: 3.1.6 Prefix: /home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6 Configured architecture: x86_64-unknown-linux-gnu Configure host: dgx4105 Configured by: apacsc19 Configured on: Mon Oct 11 18:17:19 +08 2021 Configure host: dgx4105 Configure command line: '--prefix=/home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/' '--enable-orterun-prefix-by-default' '--disable-getpwuid' '--with-verbs' '--with-cuda=/usr/local/cuda' Built by: apacsc19 Built on: Mon Oct 11 18:48:52 +08 2021 Built host: dgx4105 C bindings: yes C++ bindings: no Fort mpif.h: yes (all) Fort use mpi: yes (full: ignore TKR) Fort use mpi size: deprecated-ompi-info-value Fort use mpi_f08: yes Fort mpi_f08 compliance: The mpi_f08 module is available, but due to limitations in the gfortran compiler and/or Open MPI, does not support the following: array subsections, direct passthru (where possible) to underlying Open MPI's C functionality Fort mpi_f08 subarrays: no Java bindings: no Wrapper compiler rpath: runpath C compiler: gcc C compiler absolute: /usr/bin/gcc C compiler family name: GNU C compiler version: 7.4.0 C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fort compiler: gfortran Fort compiler abs: /usr/bin/gfortran Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::) Fort 08 assumed shape: yes Fort optional args: yes Fort INTERFACE: yes Fort ISO_FORTRAN_ENV: yes Fort STORAGE_SIZE: yes Fort BIND(C) (all): yes Fort ISO_C_BINDING: yes Fort SUBROUTINE BIND(C): yes Fort TYPE,BIND(C): yes Fort T,BIND(C,name="a"): yes Fort PRIVATE: yes Fort PROTECTED: yes Fort ABSTRACT: yes Fort ASYNCHRONOUS: yes Fort PROCEDURE: yes Fort USE...ONLY: yes Fort C_FUNLOC: yes Fort f08 using wrappers: yes Fort MPI_SIZEOF: yes C profiling: yes C++ profiling: no Fort mpif.h profiling: yes Fort use mpi profiling: yes Fort use mpi_f08 prof: yes C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes) Sparse Groups: no Internal debug support: no MPI interface warnings: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: no dl support: yes Heterogeneous support: no mpirun default --prefix: yes MPI_WTIME support: native Symbol vis. support: yes Host topology support: yes MPI extensions: affinity, cuda FT Checkpoint support: no (checkpoint thread: no) C/R Enabled Debugging: no MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128 MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA btl: smcuda (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA hwloc: hwloc1117 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v3.1.6) MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.1.6) MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v3.1.6) MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA odls: default (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA schizo: singularity (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.1.6) MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: spacc (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.1.6) MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.1.6) MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component v3.1.6) MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.1.6) MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component v3.1.6) ``` ## testbuild-參考 [PyTorch Distributed with MPI](https://medium.com/@esaliya/pytorch-distributed-with-mpi-acb84b3ae5fd) [pytorch github](https://github.com/pytorch/pytorch#from-source) ![](https://i.imgur.com/Orak2dx.png) ``` - -- ******** Summary ******** -- General: -- CMake version : 3.19.6 -- CMake command : /anaconda3/envs/ubuntu_SIN_DGX/bin/cmake -- System : Linux -- C++ compiler : /home/users/industry/ai-hpc/apacsc19/openmpi-3.1.6/bin/mpicxx -- C++ compiler id : GNU -- C++ compiler version : 7.5.0 -- Using ccache if found : ON -- Found ccache : CCACHE_PROGRAM-NOTFOUND -- CXX flags : -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -- Build type : Release -- Compile definitions : TH_BLAS_MKL;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -- CMAKE_PREFIX_PATH : /anaconda3/envs/ubuntu_SIN_DGX/lib/python3.9/site-packages -- CMAKE_INSTALL_PREFIX : /pytorch/torch -- USE_GOLD_LINKER : OFF -- -- TORCH_VERSION : 1.11.0 -- CAFFE2_VERSION : 1.11.0 -- BUILD_CAFFE2 : ON -- BUILD_CAFFE2_OPS : ON -- BUILD_CAFFE2_MOBILE : OFF -- BUILD_STATIC_RUNTIME_BENCHMARK: OFF -- BUILD_TENSOREXPR_BENCHMARK: OFF -- BUILD_NVFUSER_BENCHMARK: OFF -- BUILD_BINARY : OFF -- BUILD_CUSTOM_PROTOBUF : ON -- Link local protobuf : ON -- BUILD_DOCS : OFF -- BUILD_PYTHON : True -- Python version : 3.9.7 -- Python executable : /anaconda3/envs/ubuntu_SIN_DGX/bin/python -- Pythonlibs version : 3.9.7 -- Python library : /anaconda3/envs/ubuntu_SIN_DGX/lib/libpython3.9.a -- Python includes : /anaconda3/envs/ubuntu_SIN_DGX/include/python3.9 -- Python site-packages: lib/python3.9/site-packages -- BUILD_SHARED_LIBS : ON -- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF -- BUILD_TEST : False -- BUILD_JNI : OFF -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- USE_BLAS : 1 -- BLAS : mkl -- USE_LAPACK : 1 -- LAPACK : mkl -- USE_ASAN : OFF -- USE_CPP_CODE_COVERAGE : OFF -- USE_CUDA : OFF -- USE_ROCM : OFF -- USE_EIGEN_FOR_BLAS : -- USE_FBGEMM : OFF -- USE_FAKELOWP : OFF -- USE_KINETO : 1 -- USE_FFMPEG : OFF -- USE_GFLAGS : OFF -- USE_GLOG : OFF -- USE_LEVELDB : OFF -- USE_LITE_PROTO : OFF -- USE_LMDB : OFF -- USE_METAL : OFF -- USE_PYTORCH_METAL : OFF -- USE_PYTORCH_METAL_EXPORT : OFF -- USE_FFTW : OFF -- USE_MKL : ON -- USE_MKLDNN : 0 -- USE_NCCL : 0 -- USE_NNPACK : 0 -- USE_NUMPY : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENCV : 0 -- USE_OPENMP : ON -- USE_TBB : OFF -- USE_VULKAN : OFF -- USE_PROF : OFF -- USE_QNNPACK : 0 -- USE_PYTORCH_QNNPACK : ON -- USE_REDIS : OFF -- USE_ROCKSDB : OFF -- USE_ZMQ : OFF -- USE_DISTRIBUTED : 1 -- USE_MPI : ON -- USE_GLOO : 1 -- USE_GLOO_WITH_OPENSSL : OFF -- USE_TENSORPIPE : ON -- USE_DEPLOY : OFF -- USE_BREAKPAD : ON -- Public Dependencies : caffe2::Threads;caffe2::mkl -- Private Dependencies : pthreadpool;cpuinfo;pytorch_qnnpack;fp16;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl -- USE_COREML_DELEGATE : OFF ``` ## 一些問題 看起來在DGX上編的openmpi有些Lib不是放在自己的lib(link?) 不知道之後執行可不可以用(自己的mpirun應該可以吃到) ![](https://i.imgur.com/O6C84mG.png) ###### tags: `DLRM`