# HIPAC LAMMPS ##### [分子模擬應用題目](https://drive.google.com/drive/folders/1exW0IS6OOjkjObR3anHCB_rDL516_R2q) --- ![image](https://hackmd.io/_uploads/rysmKbkdxl.png =70%x) ### build dependency ```bash cmake ../cmake \ -G Ninja \ -D CMAKE_INSTALL_PREFIX=/opt/lammps/nvhpc \ -D CMAKE_C_COMPILER=$(which nvc) \ -D CMAKE_CXX_COMPILER=$(which nvc++) \ -D CMAKE_Fortran_COMPILER=$(which nvfortran) \ -D CMAKE_CUDA_HOST_COMPILER=$(which nvc++) \ -D CUDA_HOST_COMPILER=$(which nvc++) \ -D BUILD_MPI=ON \ -D BUILD_OMP=ON \ -D PKG_OPENMP=ON \ -D PKG_GPU=ON \ -D GPU_API=cuda \ -D GPU_ARCH=sm_70 \ -D PKG_KOKKOS=ON \ -D Kokkos_ARCH_NATIVE=ON \ -D Kokkos_ARCH_VOLTA70=ON \ -D Kokkos_ENABLE_CUDA=ON \ -D Kokkos_ENABLE_OPENMP=ON \ -D FFT=FFTW3 \ -D FFT_KOKKOS=CUFFT \ -D FFT_SINGLE=YES \ -D FFT_PACK=array \ -D FFT_USE_HEFFTE=NO \ -D PKG_OPT=ON \ -D BUILD_SHARED_LIBS=ON \ -D PKG_FEP=ON \ -D PKG_TALLY=ON \ -D PKG_REPLICA=ON \ -D PKG_INTEL=ON \ -D PKG_MOLECULE=ON \ -D PKG_KSPACE=ON \ -D PKG_GRANULAR=ON \ -D PKG_RIGID=ON \ -D PKG_CLASS2=ON \ -D PKG_MANYBODY=ON ``` ### lmp -h ```bash Large-scale Atomic/Molecular Massively Parallel Simulator - 29 Aug 2024 - Update 4 Usage example: /opt/lammps/nvhpc/bin/lmp -var t 300 -echo screen -in in.alloy List of command line options supported by this LAMMPS executable: -echo none/screen/log/both : echoing of input script (-e) -help : print this help message (-h) -in none/filename : read input from file or stdin (default) (-i) -kokkos on/off ... : turn KOKKOS mode on or off (-k) -log none/filename : where to send log output (-l) -mdi '<mdi flags>' : pass flags to the MolSSI Driver Interface -mpicolor color : which exe in a multi-exe mpirun cmd (-m) -cite : select citation reminder style (-c) -nocite : disable citation reminder (-nc) -nonbuf : disable screen/logfile buffering (-nb) -package style ... : invoke package command (-pk) -partition size1 size2 ... : assign partition sizes (-p) -plog basename : basename for partition logs (-pl) -pscreen basename : basename for partition screens (-ps) -restart2data rfile dfile ... : convert restart to data file (-r2data) -restart2dump rfile dgroup dstyle dfile ... : convert restart to dump file (-r2dump) -restart2info rfile : print info about restart rfile (-r2info) -reorder topology-specs : processor reordering (-r) -screen none/filename : where to send screen output (-sc) -skiprun : skip loops in run and minimize (-sr) -suffix gpu/intel/kk/opt/omp: style suffix to apply (-sf) -var varname value : set index style variable (-v) OS: Linux "Ubuntu 22.04.5 LTS" 5.15.0-151-generic x86_64 Compiler: PGI C++ 25.7 with OpenMP 5.1 C++ standard: C++17 MPI v3.1: Open MPI v4.1.7rc1, package: Open MPI qa@sky4 Distribution, ident: 4.1.7rc1, repo rev: v4.1.5-176-g6d9519e4c3, Unreleased developer copy Accelerator configuration: GPU package API: CUDA GPU package precision: mixed KOKKOS package API: CUDA OpenMP KOKKOS package precision: double Kokkos library version: 4.3.1 OPENMP package API: OpenMP OPENMP package precision: double OpenMP standard: OpenMP 5.1 INTEL package API: OpenMP INTEL package precision: single mixed double INTEL package SIMD: not enabled Compatible GPU present: yes FFT information: FFT precision = single FFT engine = mpiFFT FFT library = FFTW3 with threads KOKKOS FFT engine = mpiFFT KOKKOS FFT library = cuFFT Active compile time flags: -DLAMMPS_GZIP -DLAMMPS_PNG -DLAMMPS_JPEG -DFFT_SINGLE -DLAMMPS_SMALLBIG sizeof(smallint): 32-bit sizeof(imageint): 32-bit sizeof(tagint): 32-bit sizeof(bigint): 64-bit Available compression formats: Extension: .gz Command: gzip Extension: .bz2 Command: bzip2 Extension: .zst Command: zstd Extension: .xz Command: xz Extension: .lzma Command: xz Extension: .lz4 Command: lz4 Installed packages: CLASS2 FEP GPU GRANULAR INTEL KOKKOS KSPACE MANYBODY MOLECULE OPENMP OPT REPLICA RIGID TALLY List of individual style options included in this LAMMPS executable * Atom styles: angle angle/kk atomic atomic/kk body bond bond/kk charge charge/kk ellipsoid full full/kk hybrid hybrid/kk line molecular molecular/kk sphere sphere/kk template tri * Integrate styles: respa respa/omp verlet verlet/kk verlet/lrt/intel verlet/split * Minimize styles: cg cg/kk fire/old fire hftn quickmin sd * Pair styles: adp adp/kk adp/omp airebo airebo/intel airebo/morse airebo/morse/intel airebo/morse/omp airebo/omp atm bop born born/coul/long born/coul/long/gpu born/coul/long/omp born/coul/msm born/coul/msm/omp born/gpu born/omp buck buck/coul/cut buck/coul/cut/gpu buck/coul/cut/intel buck/coul/cut/kk buck/coul/cut/omp buck/coul/long buck/coul/long/gpu buck/coul/long/intel buck/coul/long/kk buck/coul/long/omp buck/coul/msm buck/coul/msm/omp buck/gpu buck/intel buck/kk buck/long/coul/long buck/long/coul/long/omp buck/omp comb comb3 comb/omp coul/cut coul/cut/gpu coul/cut/kk coul/cut/omp coul/cut/soft coul/cut/soft/omp coul/debye coul/debye/gpu coul/debye/kk coul/debye/omp coul/dsf coul/dsf/gpu coul/dsf/kk coul/dsf/omp coul/long coul/long/gpu coul/long/kk coul/long/omp coul/long/soft coul/long/soft/omp coul/msm coul/msm/omp coul/streitz coul/wolf coul/wolf/kk coul/wolf/omp meam/c reax reax/c mesont/tpm eam eam/alloy eam/alloy/gpu eam/alloy/intel eam/alloy/kk eam/alloy/omp eam/alloy/opt eam/cd eam/cd/old eam/fs eam/fs/gpu eam/fs/intel eam/fs/kk eam/fs/omp eam/fs/opt eam/gpu eam/he eam/intel eam/kk eam/omp eam/opt edip edip/multi edip/omp eim eim/omp extep gran/hertz/history gran/hertz/history/omp gran/hooke gran/hooke/history gran/hooke/history/kk gran/hooke/history/omp gran/hooke/omp granular gw gw/zbl hbond/dreiding/lj hbond/dreiding/lj/omp hbond/dreiding/morse hbond/dreiding/morse/omp hybrid hybrid/omp hybrid/kk hybrid/molecular hybrid/molecular/omp hybrid/overlay hybrid/overlay/omp hybrid/overlay/kk hybrid/scaled hybrid/scaled/omp lcbop lj/charmm/coul/charmm lj/charmm/coul/charmm/gpu lj/charmm/coul/charmm/implicit lj/charmm/coul/charmm/implicit/kk lj/charmm/coul/charmm/implicit/omp lj/charmm/coul/charmm/intel lj/charmm/coul/charmm/kk lj/charmm/coul/charmm/omp lj/charmm/coul/long lj/charmm/coul/long/gpu lj/charmm/coul/long/intel lj/charmm/coul/long/kk lj/charmm/coul/long/omp lj/charmm/coul/long/opt lj/charmm/coul/long/soft lj/charmm/coul/long/soft/omp lj/charmm/coul/msm lj/charmm/coul/msm/omp lj/charmmfsw/coul/charmmfsh lj/charmmfsw/coul/long lj/charmmfsw/coul/long/kk lj/class2 lj/class2/coul/cut lj/class2/coul/cut/kk lj/class2/coul/cut/omp lj/class2/coul/cut/soft lj/class2/coul/long lj/class2/coul/long/gpu lj/class2/coul/long/kk lj/class2/coul/long/omp lj/class2/coul/long/soft lj/class2/gpu lj/class2/kk lj/class2/omp lj/class2/soft lj/cut lj/cut/coul/cut lj/cut/coul/cut/gpu lj/cut/coul/cut/kk lj/cut/coul/cut/omp lj/cut/coul/cut/soft lj/cut/coul/cut/soft/gpu lj/cut/coul/cut/soft/omp lj/cut/coul/long lj/cut/coul/long/gpu lj/cut/coul/long/intel lj/cut/coul/long/kk lj/cut/coul/long/omp lj/cut/coul/long/opt lj/cut/coul/long/soft lj/cut/coul/long/soft/gpu lj/cut/coul/long/soft/omp lj/cut/coul/msm lj/cut/coul/msm/gpu lj/cut/coul/msm/omp lj/cut/gpu lj/cut/intel lj/cut/kk lj/cut/omp lj/cut/opt lj/cut/soft lj/cut/soft/omp lj/cut/tip4p/cut lj/cut/tip4p/cut/omp lj/cut/tip4p/long lj/cut/tip4p/long/gpu lj/cut/tip4p/long/omp lj/cut/tip4p/long/opt lj/cut/tip4p/long/soft lj/cut/tip4p/long/soft/omp lj/expand lj/expand/gpu lj/expand/kk lj/expand/omp lj/long/coul/long lj/long/coul/long/intel lj/long/coul/long/omp lj/long/coul/long/opt lj/long/tip4p/long lj/long/tip4p/long/omp local/density meam/spline meam/spline/omp meam/sw/spline morse morse/gpu morse/kk morse/omp morse/opt morse/soft nb3b/harmonic nb3b/screened polymorphic rebo rebo/intel rebo/omp rebomos rebomos/omp soft soft/gpu soft/kk soft/omp sw sw/angle/table sw/gpu sw/intel sw/kk sw/mod sw/mod/omp sw/omp table table/gpu table/kk table/omp tersoff tersoff/gpu tersoff/kk tersoff/mod tersoff/mod/c tersoff/mod/c/omp tersoff/mod/gpu tersoff/mod/kk tersoff/mod/omp tersoff/omp tersoff/table tersoff/table/omp tersoff/zbl tersoff/zbl/gpu tersoff/zbl/kk tersoff/zbl/omp threebody/table tip4p/cut tip4p/cut/omp tip4p/long tip4p/long/omp tip4p/long/soft tip4p/long/soft/omp vashishta vashishta/gpu vashishta/kk vashishta/omp vashishta/table vashishta/table/omp yukawa yukawa/gpu yukawa/kk yukawa/omp zbl zbl/gpu zbl/kk zbl/omp zero * Bond styles: class2 class2/kk class2/omp fene fene/expand fene/expand/omp fene/intel fene/kk fene/omp gromos gromos/omp harmonic harmonic/intel harmonic/kk harmonic/omp hybrid hybrid/kk morse morse/omp quartic quartic/omp table table/omp zero * Angle styles: charmm charmm/intel charmm/kk charmm/omp class2 class2/kk class2/omp cosine cosine/kk cosine/omp cosine/squared cosine/squared/omp harmonic harmonic/intel harmonic/kk harmonic/omp hybrid hybrid/kk table table/omp zero * Dihedral styles: charmm charmm/intel charmm/kk charmm/omp charmmfsw charmmfsw/kk class2 class2/kk class2/omp harmonic harmonic/intel harmonic/kk harmonic/omp hybrid hybrid/kk multi/harmonic multi/harmonic/omp opls opls/intel opls/kk opls/omp table table/omp zero * Improper styles: class2 class2/kk class2/omp cvff cvff/intel cvff/omp harmonic harmonic/intel harmonic/kk harmonic/omp hybrid hybrid/kk umbrella umbrella/omp zero * KSpace styles: ewald ewald/dipole ewald/dipole/spin ewald/disp ewald/disp/dipole ewald/omp msm msm/cg msm/cg/omp msm/omp pppm pppm/cg pppm/cg/omp pppm/dipole pppm/dipole/spin pppm/disp pppm/disp/intel pppm/disp/omp pppm/disp/tip4p pppm/disp/tip4p/omp pppm/gpu pppm/intel pppm/kk pppm/omp pppm/stagger pppm/tip4p pppm/tip4p/omp * Fix styles adapt adapt/fep add/heat addforce alchemy ave/atom ave/chunk ave/correlate ave/grid ave/histo ave/histo/weight ave/time aveforce balance box/relax cmap damping/cundall deform deform/kk deposit ave/spatial ave/spatial/sphere lb/pc lb/rigid/pc/sphere reax/c/bonds reax/c/species dt/reset dt/reset/kk efield efield/kk ehex enforce2d enforce2d/kk evaporate external freeze freeze/kk gravity gravity/kk gravity/omp grem halt heat heat/flow hyper/global hyper/local indent langevin langevin/kk lineforce momentum momentum/kk move neb nph nph/kk nph/omp nph/sphere nph/sphere/omp npt npt/gpu npt/intel npt/kk npt/omp npt/sphere npt/sphere/omp nve nve/gpu nve/intel nve/kk nve/limit nve/noforce nve/omp nve/sphere nve/sphere/kk nve/sphere/omp nvt nvt/gpu nvt/intel nvt/kk nvt/omp nvt/sllod nvt/sllod/intel nvt/sllod/kk nvt/sllod/omp nvt/sphere nvt/sphere/omp pair pimd/langevin pimd pimd/nvt planeforce pour press/berendsen press/langevin print property/atom property/atom/kk qeq/comb qeq/comb/omp rattle recenter restrain rigid rigid/nph rigid/nph/omp rigid/nph/small rigid/npt rigid/npt/omp rigid/npt/small rigid/nve rigid/nve/omp rigid/nve/small rigid/nvt rigid/nvt/omp rigid/nvt/small rigid/omp rigid/small rigid/small/omp setforce setforce/kk shake shake/kk spring spring/chunk spring/self spring/self/kk store/force store/state temp/berendsen temp/berendsen/kk temp/rescale temp/rescale/kk thermal/conductivity tune/kspace vector viscous viscous/kk wall/gran wall/gran/kk wall/gran/region wall/harmonic wall/lj1043 wall/lj126 wall/lj93 wall/lj93/kk wall/morse wall/reflect wall/reflect/kk wall/region wall/table * Compute styles: aggregate/atom angle angle/local angmom/chunk bond bond/local centro/atom centroid/stress/atom chunk/atom chunk/spread/atom cluster/atom cna/atom com com/chunk contact/atom coord/atom coord/atom/kk count/type mesont dihedral dihedral/local dipole dipole/chunk displace/atom erotate/rigid erotate/sphere erotate/sphere/atom erotate/sphere/kk event/displace fabric fep fep/ta force/tally fragment/atom global/atom group/group gyration gyration/chunk heat/flux heat/flux/tally heat/flux/virial/tally improper improper/local inertia/chunk ke ke/atom ke/rigid msd msd/chunk omega/chunk orientorder/atom orientorder/atom/kk pair pair/local pe pe/atom pe/mol/tally pe/tally pressure pressure/alchemy property/atom property/chunk property/grid property/local rdf reduce reduce/chunk reduce/region rigid/local slice stress/atom stress/tally temp temp/chunk temp/com temp/deform temp/deform/kk temp/kk temp/partial temp/profile temp/ramp temp/region temp/sphere torque/chunk vacf vcm/chunk * Region styles: block block/kk cone cylinder ellipsoid intersect plane prism sphere union * Dump styles: atom cfg custom atom/mpiio cfg/mpiio custom/mpiio xyz/mpiio grid grid/vtk image local movie xyz * Command styles angle_write balance change_box create_atoms create_bonds create_box delete_atoms delete_bonds box kim_init kim_interactions kim_param kim_property kim_query reset_ids reset_atom_ids reset_mol_ids message server dihedral_write displace_atoms hyper info minimize neb prd read_data read_dump read_restart replicate rerun run set tad temper temper/grem temper/npt velocity write_coeff write_data write_dump write_restart ``` --- ![image](https://hackmd.io/_uploads/ryzTF-k_xx.png =70%x) ### cpu ```bash JobID: 22 Running on nodes: pca2 CUDA_VISIBLE_DEVICES = MPI ranks per node: 32 CPUs per task: 1 LAMMPS (29 Aug 2024 - Update 4) using 1 OpenMP thread(s) per MPI task Lattice spacing in x,y,z = 3.52 3.52 3.52 Created orthogonal box = (0 0 0) to (352 352 352) 2 by 4 by 4 MPI processor grid Created 4000000 atoms using lattice units in orthogonal box = (0 0 0) to (352 352 352) create_atoms CPU = 0.015 seconds Displacing atoms ... Reading eam potential file Ni_u3.eam with DATE: 2007-06-11 Neighbor list info ... update: every = 1 steps, delay = 0 steps, check = yes max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6.8 ghost atom cutoff = 6.8 binsize = 3.4, bins = 104 104 104 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair eam/opt, perpetual attributes: half, newton on pair build: half/bin/atomonly/newton stencil: half/bin/3d bin: standard Setting up cg style minimization ... Unit style : metal Current step : 0 Per MPI rank memory allocation (min/avg/max) = 76.53 | 76.54 | 76.55 Mbytes Step PotEng KinEng TotEng Press Temp Fmax 0 -14046871 0 -14046871 414938.55 0 53.018371 20 -17799955 0 -17799955 4.7971378 0 0.019115624 Loop time of 5.11289 on 32 procs for 20 steps with 4000000 atoms 98.8% CPU use with 32 MPI tasks x 1 OpenMP threads Minimization stats: Stopping criterion = energy tolerance Energy initial, next-to-last, final = -14046870.8718438 -17799939.1763127 -17799955.2793016 Force two-norm initial, final = 19245.351 12.649498 Force max component initial, final = 53.018371 0.019115624 Final line search alpha, max atom move = 1 0.019115624 Iterations, force evaluations = 20 30 MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 4.7057 | 4.7111 | 4.7175 | 0.2 | 92.14 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 0.062993 | 0.067096 | 0.072593 | 1.0 | 1.31 Output | 0 | 0 | 0 | 0.0 | 0.00 Modify | 0 | 0 | 0 | 0.0 | 0.00 Other | | 0.3347 | | | 6.55 Nlocal: 125000 ave 125126 max 124860 min Histogram: 2 2 0 3 7 6 6 4 0 2 Nghost: 51286.6 ave 51462 max 51166 min Histogram: 5 2 5 5 5 3 1 2 3 1 Neighs: 7.71396e+06 ave 7.73975e+06 max 7.68722e+06 min Histogram: 2 4 3 2 6 4 1 3 4 3 Total # of neighbors = 2.4684661e+08 Ave neighs/atom = 61.711653 Neighbor list builds = 0 Dangerous builds = 0 Setting up Verlet run ... Unit style : metal Current step : 20 Time step : 0.001 Per MPI rank memory allocation (min/avg/max) = 66.83 | 66.85 | 67.23 Mbytes Step PotEng KinEng TotEng Press Temp Fmax 20 -17799955 155112.14 -17644843 3803.5136 300 0.019115624 100 -17718767 76000.979 -17642766 12377.967 146.99233 2.2196469 200 -17716422 78412.908 -17638009 12540.207 151.6572 2.307812 300 -17716838 84327.052 -17632511 12524.134 163.09566 2.1070217 400 -17716605 90228.378 -17626376 12707.739 174.50932 2.2037702 500 -17710582 90944.405 -17619638 13531.36 175.89418 2.4309026 600 -17705205 92817.646 -17612388 14269.344 179.51719 2.4872729 700 -17703050 98386.026 -17604664 14675.862 190.2869 2.6395935 800 -17698479 101974.06 -17596505 15336.635 197.22647 2.4025683 900 -17693409 105437.17 -17587971 16049.076 203.92442 2.6248567 1000 -17690372 111261.66 -17579111 16562.334 215.18947 2.4486198 1020 -17688285 110978.84 -17577306 16843.746 214.64247 2.5920241 Loop time of 126.241 on 32 procs for 1000 steps with 4000000 atoms Performance: 0.684 ns/day, 35.067 hours/ns, 7.921 timesteps/s, 31.685 Matom-step/s 99.9% CPU use with 32 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 109.12 | 111.23 | 113.06 | 12.3 | 88.11 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 1.9338 | 3.7554 | 6.0903 | 70.9 | 2.97 Output | 0.017494 | 0.018573 | 0.021063 | 0.7 | 0.01 Modify | 7.6297 | 8.4877 | 9.3527 | 17.5 | 6.72 Other | | 2.752 | | | 2.18 Nlocal: 125000 ave 125951 max 124522 min Histogram: 3 8 4 7 5 1 3 0 0 1 Nghost: 48821 ave 49299 max 47870 min Histogram: 1 0 0 3 1 5 7 4 8 3 Neighs: 8.375e+06 ave 8.46123e+06 max 8.3338e+06 min Histogram: 5 4 8 5 4 4 1 0 0 1 Total # of neighbors = 2.68e+08 Ave neighs/atom = 67 Neighbor list builds = 0 Dangerous builds = 0 Setting up cg style minimization ... Unit style : metal Current step : 1020 Per MPI rank memory allocation (min/avg/max) = 79.2 | 79.22 | 79.6 Mbytes Step PotEng KinEng TotEng Press Temp Fmax 1020 -17688285 110978.84 -17577306 16843.746 214.64247 2.5920241 1100 -17800000 110978.84 -17689021 2717.8175 214.64247 0.00082848495 1200 -17800000 110978.84 -17689021 2717.7672 214.64247 0.00010387412 1300 -17800000 110978.84 -17689021 2717.766 214.64247 2.6133661e-05 1400 -17800000 110978.84 -17689021 2717.7659 214.64247 4.8547794e-06 1414 -17800000 110978.84 -17689021 2717.7659 214.64247 4.0074584e-06 Loop time of 124.177 on 32 procs for 394 steps with 4000000 atoms 99.9% CPU use with 32 MPI tasks x 1 OpenMP threads Minimization stats: Stopping criterion = linesearch alpha is zero Energy initial, next-to-last, final = -17688285.1593561 -17800000.0142948 -17800000.0142948 Force two-norm initial, final = 1603.6755 0.0025478634 Force max component initial, final = 2.5920241 4.0074584e-06 Final line search alpha, max atom move = 0.0009765625 3.9135336e-09 Iterations, force evaluations = 394 805 MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 115.08 | 115.21 | 115.37 | 1.0 | 92.78 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 1.4932 | 1.5893 | 1.6884 | 4.4 | 1.28 Output | 0.010687 | 0.010743 | 0.011826 | 0.2 | 0.01 Modify | 0 | 0 | 0 | 0.0 | 0.00 Other | | 7.368 | | | 5.93 Nlocal: 125000 ave 125638 max 124665 min Histogram: 1 7 6 6 7 3 0 1 0 1 Nghost: 48821.3 ave 49156 max 48184 min Histogram: 1 0 1 0 3 7 6 6 7 1 Neighs: 8.37005e+06 ave 8.429e+06 max 8.33744e+06 min Histogram: 4 4 6 6 3 4 4 0 0 1 Total # of neighbors = 2.6784159e+08 Ave neighs/atom = 66.960397 Neighbor list builds = 0 Dangerous builds = 0 Total wall time: 0:04:16 ``` ### gpu ```bash JobID: 9 Running on nodes: pca1 CUDA_VISIBLE_DEVICES = 0,1,2,3 MPI ranks per node: 4 CPUs per task: 1 LAMMPS (29 Aug 2024 - Update 4) KOKKOS mode with Kokkos version 4.3.1 is enabled (src/KOKKOS/kokkos.cpp:72) will use up to 4 GPU(s) per node WARNING: When using a single thread, the Kokkos Serial backend (i.e. Makefile.kokkos_mpi_only) gives better performance than the OpenMP backend (src/KOKKOS/kokkos.cpp:210) using 1 OpenMP thread(s) per MPI task Lattice spacing in x,y,z = 3.52 3.52 3.52 Created orthogonal box = (0 0 0) to (352 352 352) 1 by 2 by 2 MPI processor grid Created 4000000 atoms using lattice units in orthogonal box = (0 0 0) to (352 352 352) create_atoms CPU = 0.141 seconds Displacing atoms ... Reading eam potential file Ni_u3.eam with DATE: 2007-06-11 Neighbor list info ... update: every = 1 steps, delay = 0 steps, check = yes max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6.8 ghost atom cutoff = 6.8 binsize = 6.8, bins = 52 52 52 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair eam/kk, perpetual attributes: full, newton off, kokkos_device pair build: full/bin/kk/device stencil: full/bin/3d bin: kk/device Setting up cg/kk style minimization ... Unit style : metal Current step : 0 WARNING: Fix with atom-based arrays not compatible with sending data in Kokkos communication, switching to classic exchange/border communication (src/KOKKOS/comm_kokkos.cpp:754) WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:216) Per MPI rank memory allocation (min/avg/max) = 296.5 | 296.5 | 296.5 Mbytes Step PotEng KinEng TotEng Press Temp Fmax 0 -14046871 0 -14046871 414938.55 0 53.018371 20 -17799955 0 -17799955 4.7971378 0 0.019115624 Loop time of 0.812471 on 4 procs for 20 steps with 4000000 atoms 97.7% CPU use with 4 MPI tasks x 1 OpenMP threads Minimization stats: Stopping criterion = energy tolerance Energy initial, next-to-last, final = -14046870.8718438 -17799939.1763125 -17799955.2793014 Force two-norm initial, final = 19245.351 12.649498 Force max component initial, final = 53.018371 0.019115624 Final line search alpha, max atom move = 1 0.019115624 Iterations, force evaluations = 20 30 MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 0.67892 | 0.68045 | 0.68271 | 0.2 | 83.75 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 0.047649 | 0.048272 | 0.048838 | 0.3 | 5.94 Output | 0 | 0 | 0 | 0.0 | 0.00 Modify | 0 | 0 | 0 | 0.0 | 0.00 Other | | 0.08375 | | | 10.31 Nlocal: 1e+06 ave 1.00012e+06 max 999893 min Histogram: 1 1 0 0 0 0 0 1 0 1 Nghost: 193838 ave 193921 max 193727 min Histogram: 1 0 0 0 0 0 2 0 0 1 Neighs: 0 ave 0 max 0 min Histogram: 4 0 0 0 0 0 0 0 0 0 FullNghs: 1.23423e+08 ave 1.2344e+08 max 1.23407e+08 min Histogram: 1 1 0 0 0 0 0 1 0 1 Total # of neighbors = 4.9369322e+08 Ave neighs/atom = 123.42331 Neighbor list builds = 0 Dangerous builds = 0 Setting up Verlet run ... Unit style : metal Current step : 20 Time step : 0.001 Per MPI rank memory allocation (min/avg/max) = 218.1 | 218.1 | 218.1 Mbytes Step PotEng KinEng TotEng Press Temp Fmax 20 -17799955 155112.14 -17644843 3803.5136 300 0.019115624 100 -17718758 75992.128 -17642766 12378.067 146.97521 2.1693243 200 -17716452 78443.126 -17638009 12537.735 151.71565 2.2885089 300 -17716884 84373.072 -17632511 12518.596 163.18466 2.0765823 400 -17716604 90227.696 -17626376 12707.592 174.508 2.1663835 500 -17710535 90897.38 -17619638 13536.112 175.80323 2.3503506 600 -17705234 92845.985 -17612388 14265.849 179.57199 2.5065403 700 -17703057 98391.919 -17604665 14674.33 190.2983 2.4700827 800 -17698498 101992.49 -17596506 15334.073 197.2621 2.4969924 900 -17693463 105491.4 -17587971 16043.463 204.0293 2.5061929 1000 -17690358 111247.26 -17579110 16563.735 215.16162 2.5405778 1020 -17688222 110914.97 -17577307 16849.696 214.51894 2.4545932 Loop time of 10.8834 on 4 procs for 1000 steps with 4000000 atoms Performance: 7.939 ns/day, 3.023 hours/ns, 91.883 timesteps/s, 367.532 Matom-step/s 99.0% CPU use with 4 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 3.8781 | 3.8828 | 3.8927 | 0.3 | 35.68 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 1.2987 | 1.3015 | 1.3041 | 0.2 | 11.96 Output | 0.1336 | 0.15697 | 0.17478 | 3.8 | 1.44 Modify | 4.9992 | 5.0068 | 5.012 | 0.2 | 46.00 Other | | 0.5354 | | | 4.92 Nlocal: 1e+06 ave 1.00073e+06 max 999640 min Histogram: 2 0 1 0 0 0 0 0 0 1 Nghost: 184971 ave 185331 max 184237 min Histogram: 1 0 0 0 0 0 0 1 0 2 Neighs: 0 ave 0 max 0 min Histogram: 4 0 0 0 0 0 0 0 0 0 FullNghs: 1.34e+08 ave 1.34098e+08 max 1.33952e+08 min Histogram: 2 0 1 0 0 0 0 0 0 1 Total # of neighbors = 5.36e+08 Ave neighs/atom = 134 Neighbor list builds = 0 Dangerous builds = 0 Setting up cg/kk style minimization ... Unit style : metal Current step : 1020 WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:216) Per MPI rank memory allocation (min/avg/max) = 300.1 | 300.1 | 300.1 Mbytes Step PotEng KinEng TotEng Press Temp Fmax 1020 -17688222 110914.97 -17577307 16849.696 214.51894 2.4545932 1100 -17800000 110914.97 -17689085 2716.256 214.51894 0.00093680375 1200 -17800000 110914.97 -17689085 2716.2032 214.51894 0.00011017142 1300 -17800000 110914.97 -17689085 2716.2018 214.51894 2.5959422e-05 1400 -17800000 110914.97 -17689085 2716.2017 214.51894 5.1501934e-06 1450 -17800000 110914.97 -17689085 2716.2017 214.51894 2.3465206e-06 Loop time of 19.859 on 4 procs for 430 steps with 4000000 atoms 99.7% CPU use with 4 MPI tasks x 1 OpenMP threads Minimization stats: Stopping criterion = energy tolerance Energy initial, next-to-last, final = -17688221.5193737 -17800000.0143065 -17800000.0143066 Force two-norm initial, final = 1603.7099 0.0014109707 Force max component initial, final = 2.4545932 2.3465206e-06 Final line search alpha, max atom move = 1 2.3465206e-06 Iterations, force evaluations = 430 858 MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 17.99 | 18.006 | 18.024 | 0.3 | 90.67 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 1.109 | 1.1115 | 1.1143 | 0.2 | 5.60 Output | 0.036226 | 0.043084 | 0.05159 | 2.7 | 0.22 Modify | 0 | 0 | 0 | 0.0 | 0.00 Other | | 0.6984 | | | 3.52 Nlocal: 1e+06 ave 1.00052e+06 max 999583 min Histogram: 1 1 0 0 0 0 1 0 0 1 Nghost: 184972 ave 185388 max 184456 min Histogram: 1 0 0 1 0 0 0 0 1 1 Neighs: 0 ave 0 max 0 min Histogram: 4 0 0 0 0 0 0 0 0 0 FullNghs: 1.33921e+08 ave 1.33991e+08 max 1.33865e+08 min Histogram: 1 1 0 0 0 0 1 0 0 1 Total # of neighbors = 5.3568267e+08 Ave neighs/atom = 133.92067 Neighbor list builds = 0 Dangerous builds = 0 Total wall time: 0:00:39 ``` --- ![螢幕擷取畫面 2025-08-05 123625](https://hackmd.io/_uploads/HkJ4o-kugx.png =70%x)