64 nodes execution, 8 ranks per node, 6 OMP threads. Attempt to go without MPICH_ASYNC_PROGRESS and MPI_THREAD_MULTIPLE. `/scratch/project_465000454/vsingh/SCALING_27NOV/ENV_VARIABLE_STUDY/Experiment1_commenting_test_commenting/bin/SLURM/lumi/TCO1279/commented/failed_due_to_commented_async_1/` All ranks are OK, single rank gets segfault from `librocfft.so` -> `libamdhip64.so` with something related to run-time compilator. The backtrace from rank=7: ``` [Thread 0x1551002ff700 (LWP 90437) exited] Thread 1 "ifsMASTER.SP" received signal SIGSEGV, Segmentation fault. 0x00001554f1a7ecf4 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #0 0x00001554f1a7ecf4 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #1 0x00001554f1aa8651 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #2 0x00001554f18de345 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #3 0x00001554f18de390 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #4 0x00001554f189ed88 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #5 0x00001554f189ee99 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #6 0x00001554f19f0fe9 in ?? () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #7 0x00001554f19c40b0 in hipModuleLoadData () from /opt/rocm-5.2.3/lib/libamdhip64.so.5 #8 0x00001554f1671818 in RTCKernel::RTCKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<char, std::allocator<char> > const&) () from /opt/rocm-5.2.3/lib/librocfft.so.0 #9 0x00001554f16738ae in RTCKernel::runtime_compile(TreeNode&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /opt/rocm-5.2.3/lib/librocfft.so.0 #10 0x00001554f160fa8e in RuntimeCompilePlan(ExecPlan&) () from /opt/rocm-5.2.3/lib/librocfft.so.0 #11 0x00001554f160b22f in ProcessNode(ExecPlan&) () from /opt/rocm-5.2.3/lib/librocfft.so.0 #12 0x00001554f160ac1a in rocfft_plan_create_internal(rocfft_plan_t*, rocfft_result_placement_e, rocfft_transform_type_e, rocfft_precision_e, unsigned long, unsigned long const*, unsigned long, rocfft_plan_description_t*) () from /opt/rocm-5.2.3/lib/librocfft.so.0 #13 0x00001554f160ba6b in rocfft_plan_create () from /opt/rocm-5.2.3/lib/librocfft.so.0 #14 0x000015550014667a in hipfftMakePlan_internal(hipfftHandle_t*, unsigned long, unsigned long*, hipfftType_t, unsigned long, hipfft_plan_description_t*, unsigned long*, bool) () from /opt/rocm-5.2.3/lib/libhipfft.so #15 0x0000155500145c80 in hipfftMakePlanMany () from /opt/rocm-5.2.3/lib/libhipfft.so #16 0x00001555001454d5 in hipfftPlanMany () from /opt/rocm-5.2.3/lib/libhipfft.so #17 0x000015551a039e94 in create_plan_ffth_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #18 0x000015551a02520e in create_plan_fft (kplan=0xe5764e50, ktype=<error reading variable: Cannot access memory at address 0x1>, kn=<error reading variable: Cannot access memory at address 0x0>, klot=<error reading variable: Cannot access memory at address 0x1>) at /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/source/ectrans/src/trans/gpu/internal/tpm_ffth.F90:112 #19 0x000015551a048910 in ftinv (preel=<error reading variable: value requires 199636448 bytes, which is more than max-value-size>, kfields=484) at /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/source/ectrans/src/trans/gpu/internal/ftinv_mod.F90:132 #20 0x000015551a042f95 in ftinv_ctl (kf_uv_g=<optimized out>, kf_scalars_g=1, kf_uv=0, kf_scalars=1, kf_scders=0, kf_gp=1, kf_fs=1, kf_out_lt=1, kvsetuv=<error reading variable: Location address is not set.>, kvsetsc=..., kptrgp=<error reading variable: Location address is not set.>, kvsetsc3a=<error reading variable: Location address is not set.>, kvsetsc3b=<error reading variable: Location address is not set.>, kvsetsc2=<error reading variable: Location address is not set.>, pgp=<error reading variable: value requires 103120 bytes, which is more than max-value-size>, pgpuv=<error reading variable: Location address is not set.>, pgp3a=<error reading variable: Location address is not set.>, pgp3b=<error reading variable: Location address is not set.>, pgp2=<error reading variable: Location address is not set.>) at /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/source/ectrans/src/trans/gpu/internal/ftinv_ctl_mod.F90:171 #21 0x0000155519febb40 in inv_trans_ctl (kf_uv_g=0, kf_scalars_g=1, kf_gp=1, kf_fs=1, kf_out_lt=1, kf_uv=0, kf_scalars=1, kf_scders=0, pspvor=<error reading variable: Location address is not set.>, pspdiv=<error reading variable: Location address is not set.>, pspscalar=<error reading variable: value requires 102480 bytes, which is more than max-value-size>, kvsetuv=<error reading variable: Location address is not set.>, kvsetsc=..., pgp=<error reading variable: value requires 103120 bytes, which is more than max-value-size>, fspgl_proc=0x0, pspsc3a=<error reading variable: Location address is not set.>, pspsc3b=<error reading variable: Location address is not set.>, pspsc2=<error reading variable: Location address is not set.>, kvsetsc3a=<error reading variable: Location address is not set.>, kvsetsc3b=<error reading variable: Location address is not set.>, kvsetsc2=<error reading variable: Location address is not set.>, pgpuv=<error reading variable: Location address is not set.>, pgp3a=<error reading variable: Location address is not set.>, pgp3b=<error reading variable: Location address is not set.>, pgp2=<error reading variable: Location address is not set.>) at /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/source/ectrans/src/trans/gpu/internal/inv_trans_ctl_mod.F90:300 #22 0x0000155519f946ce in inv_trans (pspvor=..., pspdiv=..., pspscalar=..., pspsc3a=..., pspsc3b=..., pspsc2=..., fspgl_proc=0x0, ldscders=<error reading variable: Cannot access memory at address 0x0>, ldvorgp=<error reading variable: Cannot access memory at address 0x0>, lddivgp=<error reading variable: Cannot access memory at address 0x0>, lduvder=<error reading variable: Cannot access memory at address 0x0>, ldlatlon=<error reading variable: Cannot access memory at address 0x0>, kproma=25780, kvsetuv=<error reading variable: Location address is not set.>, kvsetsc=..., kresol=1, kvsetsc3a=<error reading variable: Location address is not set.>, kvsetsc3b=<error reading variable: Location address is not set.>, kvsetsc2=<error reading variable: Location address is not set.>, pgp=<error reading variable: value requires 103120 bytes, which is more than max-value-size>, pgpuv=<error reading variable: Location address is not set.>, pgp3a=<error reading variable: Location address is not set.>, pgp3b=<error reading variable: Location address is not set.>, pgp2=<error reading variable: Location address is not set.>) at /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/source/ectrans/src/trans/gpu/external/inv_trans.F90:648 #23 0x000015551967549f in speree_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #24 0x0000155519552a09 in suorog_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #25 0x00001555195b4e22 in suspec_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #26 0x00001555194d1d88 in suinif_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #27 0x000015551679bb31 in csta_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #28 0x00001555166466c5 in cnt3_glo_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #29 0x0000155516645cad in cnt3_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #30 0x0000155516645790 in cnt2_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #31 0x0000155516645099 in cnt1_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #32 0x0000155516641b77 in cnt0_ () from /pfs/lustrep3/scratch/project_465000454/carlospena/ifs-bundle-december/build.lumi-g/bin/../lib/libarpifs.SP.so #33 0x00000000004064cc in master_ () Kill the program being debugged? (y or n) [answered Y; input not from terminal] [Inferior 1 (process 90088) killed] ```