# NCCL test inside docker error log ### Check IB interface: ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ lspci | grep Mellanox 0c:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 12:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 4b:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 54:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 8d:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 94:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] ba:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] cc:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] e1:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] e1:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] ``` ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ ibstat CA 'mlx5_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300d33f4e System image GUID: 0x043f720300d33f4e Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 14 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300d33f4e Link layer: InfiniBand CA 'mlx5_7' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caad36 System image GUID: 0x043f720300caad36 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 24 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caad36 Link layer: InfiniBand CA 'mlx5_5' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300d75126 System image GUID: 0x043f720300d75126 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 10 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300d75126 Link layer: InfiniBand CA 'mlx5_3' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caac82 System image GUID: 0x043f720300caac82 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 13 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caac82 Link layer: InfiniBand CA 'mlx5_bond_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300ce74ca System image GUID: 0x043f720300ce74ca Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_1' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300d33ed2 System image GUID: 0x043f720300d33ed2 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 17 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300d33ed2 Link layer: InfiniBand CA 'mlx5_6' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caacaa System image GUID: 0x043f720300caacaa Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 9 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caacaa Link layer: InfiniBand CA 'mlx5_4' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300df98c0 System image GUID: 0x043f720300df98c0 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 11 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300df98c0 Link layer: InfiniBand CA 'mlx5_2' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caad3a System image GUID: 0x043f720300caad3a Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 33 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caad3a Link layer: InfiniBand ``` ### nvidia-smi output: ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ nvidia-smi Thu Sep 2 13:23:45 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 A100-SXM4-40GB On | 00000000:47:00.0 Off | 0 | | N/A 29C P0 51W / 400W | 0MiB / 40537MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 A100-SXM4-40GB On | 00000000:4E:00.0 Off | 0 | | N/A 29C P0 50W / 400W | 0MiB / 40537MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` ### nccl-tests error: ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ NCCL_DEBUG=INFO mpirun -np 2 ./all_reduce_perf # nThread 1 nGpus 1 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 validation: 1 # # Using devices # nThread 1 nGpus 1 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 validation: 1 # # Using devices # Rank 0 Pid 1638 on dev-jax-pmap-1-9z3ne-4fsr4 device 0 [0x47] A100-SXM4-40GB # Rank 0 Pid 1639 on dev-jax-pmap-1-9z3ne-4fsr4 device 0 [0x47] A100-SXM4-40GB dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO Bootstrap : Using eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO Bootstrap : Using eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_1 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_2 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_3 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_4 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_5 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_6 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_7 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_bond_0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO NET/IB : No device found. dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO NET/Socket : Using [0]eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO Using network Socket dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_1 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_2 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_3 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_4 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_5 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_6 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_7 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_bond_0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO NET/IB : No device found. dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO NET/Socket : Using [0]eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO Using network Socket NCCL version 2.8.4+cuda11.2 NCCL version 2.8.4+cuda11.2 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 00/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 01/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 02/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 03/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 04/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 05/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 06/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 07/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 08/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 09/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 10/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 11/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 12/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 13/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 14/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 15/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 16/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 17/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 18/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 19/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 20/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 21/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 22/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 23/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 24/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 25/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 26/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 27/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 28/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 29/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 30/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 31/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 00/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 01/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 02/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 03/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 04/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 05/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 06/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 07/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 08/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 09/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 10/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 11/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 12/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 13/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 14/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 15/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 16/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 17/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 18/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 19/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 20/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 21/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 22/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 23/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 24/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 25/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 26/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 27/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 28/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 29/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 30/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 31/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Connected all rings dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Connected all trees dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Connected all rings dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Connected all trees dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO comm 0x7f590c000e00 rank 0 nranks 1 cudaDev 0 busId 47000 - Init COMPLETE dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO comm 0x7f8524000e00 rank 0 nranks 1 cudaDev 0 busId 47000 - Init COMPLETE # # # out-of-place in-place # size count type redop time algbw busbw error time algbw busbw error # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) # out-of-place in-place # size count type redop time algbw busbw error time algbw busbw error # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 33554432 8388608 float sum 76.53 438.46 0.00 0e+00 0.39 86872.32 0.00 0e+00 33554432 8388608 float sum 128.9 260.35 0.00 0e+00 0.39 85871.87 0.00 0e+00 # Out of bounds values : 0 OK # Avg bus bandwidth : 0 # # Out of bounds values : 0 OK # Avg bus bandwidth : 0 # ```
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up