```bash [11:22]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ module load PrgEnv-cray craype-accel-amd-gfx90a craype-x86-trento rocm Lmod is automatically replacing "craype-x86-rome" with "craype-x86-trento". [11:22]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ ml Currently Loaded Modules: 1) libfabric/1.15.2.0 4) xpmem/2.8.2-1.0_5.1__g84a27a5.shasta 7) lumi-tools/24.05 (S) 10) craype/2.7.31.11 13) cray-libsci/24.03.0 16) craype-x86-trento 2) craype-network-ofi 5) cce/17.0.1 8) init-lumi/0.2 (S) 11) cray-dsmml/0.3.0 14) PrgEnv-cray/8.5.0 17) rocm/6.0.3 3) perftools-base/24.03.0 6) ModuleLabel/label (S) 9) cray-python/3.11.7 12) cray-mpich/8.1.29 15) craype-accel-amd-gfx90a Where: S: Module is Sticky, requires --force to unload or purge [11:22]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH} [11:22]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ make clean; make rm -f *~ hip.x CC -O0 -g -ggdb -x hip -I. -Ihip -std=c++11 -DHIP main.cpp HIPStream.cpp -o hip.x [11:22]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ sbatch job.slurm sbatch: error: Batch job submission failed: No partition specified or system default partition [11:23]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ vi job.slurm [11:23]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ sbatch job.slurm Submitted batch job 9879176 [11:23]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ ls HIPStream.cpp HIPStream.h hip.x job.slurm main.cpp Makefile Readme.md slurm-9779006.out slurm-9785106.out slurm-9785115.out slurm-9879176.out Stream.h [11:24]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ less *176* [11:24]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ salloc -N 1 -n 1 -p dev-g --gpus=8 -t 60:00 -A project_465001726 --exclusive salloc: Pending job allocation 9879198 salloc: job 9879198 queued and waiting for resources salloc: job 9879198 has been allocated resources salloc: Granted job allocation 9879198 [11:24]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ module load gdb4hpc [11:24]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ export HIP_ENABLE_DEFERRED_LOADING=0 [11:24]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ export CTI_SLURM_DAEMON_GRES="gpu:1" [11:24]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ gdb4hpc gdb4hpc 4.16.1 - Cray Interactive Parallel Debugger With Cray Comparative Debugging Technology. Copyright 2007-2023 Hewlett Packard Enterprise Development LP. Copyright 1996-2016 University of Queensland. All Rights Reserved. Type "help" for a list of commands. Type "help <cmd>" for detailed help about a command. dbg all> dbg all > launch $p{1} --gpu --launcher-args="--ntasks-per-node=1" ./hip.x unknown command: dbg dbg all> launch $p{1} --gpu --launcher-args="--ntasks-per-node=1" ./hip.x Starting application, please wait... Launched application... 1/1 ranks connected Creating network... (timeout in 300 seconds) Created network... Connected to application... Launch complete. p{0}: Initial breakpoint, main at /pfs/lustrep1/users/wukai111/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu/main.cpp:85 dbg all> break HIPStream.cpp:118 p{0}: Debugger error: No compiled code for line 118 in file "HIPStream.cpp". Make breakpoint pending on future code object load? (y or [n]) y p{0}: Breakpoint 1: Pending at HIPStream.cpp:118. dbg all> c dbg all> <$p>: BabelStream <$p>: Version: 4.0 <$p>: Implementation: HIP <$p>: Running kernels 100 times <$p>: Precision: double <$p>: Array size: 268.4 MB (=0.3 GB) <$p>: Total size: 805.3 MB (=0.8 GB) <$p>: Error: no ROCm-capable device is detected p{0}: The application has reached an exit breakpoint. dbg all> ^Ddbg all> Shutting down debugger and killing application for 'p'. [11:25]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ srun --interactive --pty bash [11:25]wukai111@nid005002:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ rocm-smi ======================================= ROCm System Management Interface ======================================= ================================================= Concise Info ================================================= Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% Name (20 chars) (Edge) (Avg) (Mem, Compute) ================================================================================================================ 0 [0x0b0c : 0x00] 51.0°C 90.0W N/A, N/A 800Mhz 1600Mhz 0% manual 500.0W 0% 0% AMD INSTINCT MI200 ( 1 [0x0b0c : 0x00] 49.0°C N/A N/A, N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0% AMD INSTINCT MI200 ( 2 [0x0b0c : 0x00] 43.0°C 89.0W N/A, N/A 800Mhz 1600Mhz 0% manual 500.0W 0% 0% AMD INSTINCT MI200 ( 3 [0x0b0c : 0x00] 44.0°C N/A N/A, N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0% AMD INSTINCT MI200 ( 4 [0x0b0c : 0x00] 40.0°C 85.0W N/A, N/A 800Mhz 1600Mhz 0% manual 500.0W 0% 0% AMD INSTINCT MI200 ( 5 [0x0b0c : 0x00] 48.0°C N/A N/A, N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0% AMD INSTINCT MI200 ( 6 [0x0b0c : 0x00] 40.0°C 88.0W N/A, N/A 800Mhz 1600Mhz 0% manual 500.0W 0% 0% AMD INSTINCT MI200 ( 7 [0x0b0c : 0x00] 51.0°C N/A N/A, N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0% AMD INSTINCT MI200 ( ================================================================================================================ ============================================= End of ROCm SMI Log ============================================== [11:25]wukai111@nid005002:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ gdb4hpc gdb4hpc 4.16.1 - Cray Interactive Parallel Debugger With Cray Comparative Debugging Technology. Copyright 2007-2023 Hewlett Packard Enterprise Development LP. Copyright 1996-2016 University of Queensland. All Rights Reserved. Type "help" for a list of commands. Type "help <cmd>" for detailed help about a command. dbg all> break HIPStream.cpp:118 No process, use launch/attach first. dbg all> launch $p{1} --gpu --launcher-args="--ntasks-per-node=1" ./hip.x Starting application, please wait... Launched application... 1/1 ranks connected. Created network... Connected to application... Launch complete. p{0}: Initial breakpoint, main at /pfs/lustrep1/users/wukai111/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu/main.cpp:85 dbg all> break HIPStream.cpp:118 p{0}: Debugger error: No compiled code for line 118 in file "HIPStream.cpp". Make breakpoint pending on future code object load? (y or [n]) y p{0}: Breakpoint 1: Pending at HIPStream.cpp:118. dbg all> c dbg all> <$p>: BabelStream <$p>: Version: 4.0 <$p>: Implementation: HIP <$p>: Running kernels 100 times <$p>: Precision: double <$p>: Array size: 268.4 MB (=0.3 GB) <$p>: Total size: 805.3 MB (=0.8 GB) <$p>: Error: no ROCm-capable device is detected p{0}: The application has reached an exit breakpoint. dbg all> ^Ddbg all> Shutting down debugger and killing application for 'p'. [11:26]wukai111@nid005002:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ exit [11:31]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ rocgdb hip.x GNU gdb (rocm-rel-6.0-131) 13.2 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://github.com/ROCm-Developer-Tools/ROCgdb/issues>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from hip.x... (gdb) r Starting program: /pfs/lustrep1/users/wukai111/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu/hip.x warning: amd-dbgapi: unable to enable GPU debugging due to a restriction error [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". BabelStream Version: 4.0 Implementation: HIP Running kernels 100 times Precision: double Array size: 268.4 MB (=0.3 GB) Total size: 805.3 MB (=0.8 GB) Error: no ROCm-capable device is detected [Inferior 1 (process 123338) exited with code 0144] (gdb) quit [11:31]wukai111@uan03:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ srun --interactive --pty bash [11:31]wukai111@nid005002:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ rocgdb hip.x GNU gdb (rocm-rel-6.0-131) 13.2 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://github.com/ROCm-Developer-Tools/ROCgdb/issues>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from hip.x... (gdb) r Starting program: /pfs/lustrep1/users/wukai111/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu/hip.x [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x155543fff700 (LWP 92863)] [New Thread 0x155443bff700 (LWP 92864)] [Thread 0x155443bff700 (LWP 92864) exited] BabelStream Version: 4.0 Implementation: HIP Running kernels 100 times Precision: double Array size: 268.4 MB (=0.3 GB) Total size: 805.3 MB (=0.8 GB) Using HIP device AMD Instinct MI250X Driver: 60032831 Memory: DEFAULT [New Thread 0x1554407ff700 (LWP 92879)] [New Thread 0x155545bbf700 (LWP 92880)] [Thread 0x1554407ff700 (LWP 92879) exited] Thread 6 "hip.x" received signal SIGSEGV, Segmentation fault. [Switching to thread 6, lane 0 (AMDGPU Lane 1:9:1:1/0 (0,0,0)[0,0,0])] init_kernel<double> (a=0x0, b=0x154d02e00000, c=0x154cf2c00000, initA=0.10000000000000001, initB=0.20000000000000001, initC=0) at HIPStream.cpp:120 120 b[i] = initB; (gdb) quit A debugging session is active. Inferior 1 [process 92856] will be killed. Quit anyway? (y or n) y [11:31]wukai111@nid005002:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ gdb4hpc gdb4hpc 4.16.1 - Cray Interactive Parallel Debugger With Cray Comparative Debugging Technology. Copyright 2007-2023 Hewlett Packard Enterprise Development LP. Copyright 1996-2016 University of Queensland. All Rights Reserved. Type "help" for a list of commands. Type "help <cmd>" for detailed help about a command. dbg all> launch $p{1} --gpu --launcher-args="--ntasks-per-node=1" ./hip.x Starting application, please wait... Launched application... 1/1 ranks connected. Created network... Connected to application... Launch complete. p{0}: Initial breakpoint, main at /pfs/lustrep1/users/wukai111/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu/main.cpp:85 dbg all> r Re-launch $all? Original will be released. (y or [n]) n Cancelled. dbg all> c dbg all> <$p>: BabelStream <$p>: Version: 4.0 <$p>: Implementation: HIP <$p>: Running kernels 100 times <$p>: Precision: double <$p>: Array size: 268.4 MB (=0.3 GB) <$p>: Total size: 805.3 MB (=0.8 GB) <$p>: Error: no ROCm-capable device is detected p{0}: The application has reached an exit breakpoint. dbg all> ^Ddbg all> Shutting down debugger and killing application for 'p'. [11:31]wukai111@nid005002:~/files-intensive/Exercises/HPE/day1/debugging/gdb4hpc_gpu$ ```