# NCCL test inside docker error log ### Check IB interface: ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ lspci | grep Mellanox 0c:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 12:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 4b:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 54:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 8d:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] 94:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] ba:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] cc:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] e1:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] e1:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] ``` ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ ibstat CA 'mlx5_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300d33f4e System image GUID: 0x043f720300d33f4e Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 14 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300d33f4e Link layer: InfiniBand CA 'mlx5_7' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caad36 System image GUID: 0x043f720300caad36 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 24 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caad36 Link layer: InfiniBand CA 'mlx5_5' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300d75126 System image GUID: 0x043f720300d75126 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 10 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300d75126 Link layer: InfiniBand CA 'mlx5_3' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caac82 System image GUID: 0x043f720300caac82 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 13 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caac82 Link layer: InfiniBand CA 'mlx5_bond_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300ce74ca System image GUID: 0x043f720300ce74ca Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_1' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300d33ed2 System image GUID: 0x043f720300d33ed2 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 17 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300d33ed2 Link layer: InfiniBand CA 'mlx5_6' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caacaa System image GUID: 0x043f720300caacaa Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 9 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caacaa Link layer: InfiniBand CA 'mlx5_4' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300df98c0 System image GUID: 0x043f720300df98c0 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 11 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300df98c0 Link layer: InfiniBand CA 'mlx5_2' CA type: MT4123 Number of ports: 1 Firmware version: 20.28.4000 Hardware version: 0 Node GUID: 0x043f720300caad3a System image GUID: 0x043f720300caad3a Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 33 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x043f720300caad3a Link layer: InfiniBand ``` ### nvidia-smi output: ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ nvidia-smi Thu Sep 2 13:23:45 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 A100-SXM4-40GB On | 00000000:47:00.0 Off | 0 | | N/A 29C P0 51W / 400W | 0MiB / 40537MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 A100-SXM4-40GB On | 00000000:4E:00.0 Off | 0 | | N/A 29C P0 50W / 400W | 0MiB / 40537MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` ### nccl-tests error: ``` wangyzh@dev-jax-pmap-1-9z3ne-4fsr4:~/nccl-tests/build$ NCCL_DEBUG=INFO mpirun -np 2 ./all_reduce_perf # nThread 1 nGpus 1 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 validation: 1 # # Using devices # nThread 1 nGpus 1 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 validation: 1 # # Using devices # Rank 0 Pid 1638 on dev-jax-pmap-1-9z3ne-4fsr4 device 0 [0x47] A100-SXM4-40GB # Rank 0 Pid 1639 on dev-jax-pmap-1-9z3ne-4fsr4 device 0 [0x47] A100-SXM4-40GB dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO Bootstrap : Using eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO Bootstrap : Using eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_1 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_2 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_3 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_4 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_5 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_6 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_7 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_bond_0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO NET/IB : No device found. dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO NET/Socket : Using [0]eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1638:1638 [0] NCCL INFO Using network Socket dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_1 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_2 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_3 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_4 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_5 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_6 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_7 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] misc/ibvwrap.cc:212 NCCL WARN Call to ibv_open_device failed dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] transport/net_ib.cc:149 NCCL WARN NET/IB : Unable to open device mlx5_bond_0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO NET/IB : No device found. dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO NET/Socket : Using [0]eth0:172.24.23.133<0> dev-jax-pmap-1-9z3ne-4fsr4:1639:1639 [0] NCCL INFO Using network Socket NCCL version 2.8.4+cuda11.2 NCCL version 2.8.4+cuda11.2 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 00/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 01/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 02/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 03/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 04/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 05/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 06/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 07/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 08/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 09/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 10/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 11/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 12/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 13/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 14/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 15/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 16/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 17/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 18/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 19/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 20/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 21/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 22/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 23/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 24/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 25/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 26/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 27/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 28/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 29/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 30/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Channel 31/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 00/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 01/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 02/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 03/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 04/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 05/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 06/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 07/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 08/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 09/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 10/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 11/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 12/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 13/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 14/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 15/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 16/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 17/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 18/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 19/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 20/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 21/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 22/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 23/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 24/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 25/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 26/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 27/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 28/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 29/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 30/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Channel 31/32 : 0 dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Connected all rings dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO Connected all trees dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Connected all rings dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO Connected all trees dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer dev-jax-pmap-1-9z3ne-4fsr4:1639:1648 [0] NCCL INFO comm 0x7f590c000e00 rank 0 nranks 1 cudaDev 0 busId 47000 - Init COMPLETE dev-jax-pmap-1-9z3ne-4fsr4:1638:1649 [0] NCCL INFO comm 0x7f8524000e00 rank 0 nranks 1 cudaDev 0 busId 47000 - Init COMPLETE # # # out-of-place in-place # size count type redop time algbw busbw error time algbw busbw error # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) # out-of-place in-place # size count type redop time algbw busbw error time algbw busbw error # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 33554432 8388608 float sum 76.53 438.46 0.00 0e+00 0.39 86872.32 0.00 0e+00 33554432 8388608 float sum 128.9 260.35 0.00 0e+00 0.39 85871.87 0.00 0e+00 # Out of bounds values : 0 OK # Avg bus bandwidth : 0 # # Out of bounds values : 0 OK # Avg bus bandwidth : 0 # ```
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.