# HIPAC LAMMPS
##### [分子模擬應用題目](https://drive.google.com/drive/folders/1exW0IS6OOjkjObR3anHCB_rDL516_R2q)
---

### build dependency
```bash
cmake ../cmake \
-G Ninja \
-D CMAKE_INSTALL_PREFIX=/opt/lammps/nvhpc \
-D CMAKE_C_COMPILER=$(which nvc) \
-D CMAKE_CXX_COMPILER=$(which nvc++) \
-D CMAKE_Fortran_COMPILER=$(which nvfortran) \
-D CMAKE_CUDA_HOST_COMPILER=$(which nvc++) \
-D CUDA_HOST_COMPILER=$(which nvc++) \
-D BUILD_MPI=ON \
-D BUILD_OMP=ON \
-D PKG_OPENMP=ON \
-D PKG_GPU=ON \
-D GPU_API=cuda \
-D GPU_ARCH=sm_70 \
-D PKG_KOKKOS=ON \
-D Kokkos_ARCH_NATIVE=ON \
-D Kokkos_ARCH_VOLTA70=ON \
-D Kokkos_ENABLE_CUDA=ON \
-D Kokkos_ENABLE_OPENMP=ON \
-D FFT=FFTW3 \
-D FFT_KOKKOS=CUFFT \
-D FFT_SINGLE=YES \
-D FFT_PACK=array \
-D FFT_USE_HEFFTE=NO \
-D PKG_OPT=ON \
-D BUILD_SHARED_LIBS=ON \
-D PKG_FEP=ON \
-D PKG_TALLY=ON \
-D PKG_REPLICA=ON \
-D PKG_INTEL=ON \
-D PKG_MOLECULE=ON \
-D PKG_KSPACE=ON \
-D PKG_GRANULAR=ON \
-D PKG_RIGID=ON \
-D PKG_CLASS2=ON \
-D PKG_MANYBODY=ON
```
### lmp -h
```bash
Large-scale Atomic/Molecular Massively Parallel Simulator - 29 Aug 2024 - Update 4
Usage example: /opt/lammps/nvhpc/bin/lmp -var t 300 -echo screen -in in.alloy
List of command line options supported by this LAMMPS executable:
-echo none/screen/log/both : echoing of input script (-e)
-help : print this help message (-h)
-in none/filename : read input from file or stdin (default) (-i)
-kokkos on/off ... : turn KOKKOS mode on or off (-k)
-log none/filename : where to send log output (-l)
-mdi '<mdi flags>' : pass flags to the MolSSI Driver Interface
-mpicolor color : which exe in a multi-exe mpirun cmd (-m)
-cite : select citation reminder style (-c)
-nocite : disable citation reminder (-nc)
-nonbuf : disable screen/logfile buffering (-nb)
-package style ... : invoke package command (-pk)
-partition size1 size2 ... : assign partition sizes (-p)
-plog basename : basename for partition logs (-pl)
-pscreen basename : basename for partition screens (-ps)
-restart2data rfile dfile ... : convert restart to data file (-r2data)
-restart2dump rfile dgroup dstyle dfile ...
: convert restart to dump file (-r2dump)
-restart2info rfile : print info about restart rfile (-r2info)
-reorder topology-specs : processor reordering (-r)
-screen none/filename : where to send screen output (-sc)
-skiprun : skip loops in run and minimize (-sr)
-suffix gpu/intel/kk/opt/omp: style suffix to apply (-sf)
-var varname value : set index style variable (-v)
OS: Linux "Ubuntu 22.04.5 LTS" 5.15.0-151-generic x86_64
Compiler: PGI C++ 25.7 with OpenMP 5.1
C++ standard: C++17
MPI v3.1: Open MPI v4.1.7rc1, package: Open MPI qa@sky4 Distribution, ident: 4.1.7rc1, repo rev: v4.1.5-176-g6d9519e4c3, Unreleased developer copy
Accelerator configuration:
GPU package API: CUDA
GPU package precision: mixed
KOKKOS package API: CUDA OpenMP
KOKKOS package precision: double
Kokkos library version: 4.3.1
OPENMP package API: OpenMP
OPENMP package precision: double
OpenMP standard: OpenMP 5.1
INTEL package API: OpenMP
INTEL package precision: single mixed double
INTEL package SIMD: not enabled
Compatible GPU present: yes
FFT information:
FFT precision = single
FFT engine = mpiFFT
FFT library = FFTW3 with threads
KOKKOS FFT engine = mpiFFT
KOKKOS FFT library = cuFFT
Active compile time flags:
-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DFFT_SINGLE
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint): 32-bit
sizeof(bigint): 64-bit
Available compression formats:
Extension: .gz Command: gzip
Extension: .bz2 Command: bzip2
Extension: .zst Command: zstd
Extension: .xz Command: xz
Extension: .lzma Command: xz
Extension: .lz4 Command: lz4
Installed packages:
CLASS2 FEP GPU GRANULAR INTEL KOKKOS KSPACE MANYBODY MOLECULE OPENMP OPT
REPLICA RIGID TALLY
List of individual style options included in this LAMMPS executable
* Atom styles:
angle angle/kk atomic atomic/kk body
bond bond/kk charge charge/kk ellipsoid
full full/kk hybrid hybrid/kk line
molecular molecular/kk sphere sphere/kk template
tri
* Integrate styles:
respa respa/omp verlet verlet/kk verlet/lrt/intel
verlet/split
* Minimize styles:
cg cg/kk fire/old fire hftn
quickmin sd
* Pair styles:
adp adp/kk adp/omp airebo airebo/intel
airebo/morse airebo/morse/intel airebo/morse/omp
airebo/omp atm bop born born/coul/long
born/coul/long/gpu born/coul/long/omp born/coul/msm
born/coul/msm/omp born/gpu born/omp buck
buck/coul/cut buck/coul/cut/gpu buck/coul/cut/intel
buck/coul/cut/kk buck/coul/cut/omp buck/coul/long
buck/coul/long/gpu buck/coul/long/intel
buck/coul/long/kk buck/coul/long/omp buck/coul/msm
buck/coul/msm/omp buck/gpu buck/intel buck/kk
buck/long/coul/long buck/long/coul/long/omp buck/omp
comb comb3 comb/omp coul/cut coul/cut/gpu
coul/cut/kk coul/cut/omp coul/cut/soft coul/cut/soft/omp
coul/debye coul/debye/gpu coul/debye/kk coul/debye/omp coul/dsf
coul/dsf/gpu coul/dsf/kk coul/dsf/omp coul/long coul/long/gpu
coul/long/kk coul/long/omp coul/long/soft coul/long/soft/omp
coul/msm coul/msm/omp coul/streitz coul/wolf coul/wolf/kk
coul/wolf/omp meam/c reax reax/c mesont/tpm
eam eam/alloy eam/alloy/gpu eam/alloy/intel eam/alloy/kk
eam/alloy/omp eam/alloy/opt eam/cd eam/cd/old eam/fs
eam/fs/gpu eam/fs/intel eam/fs/kk eam/fs/omp eam/fs/opt
eam/gpu eam/he eam/intel eam/kk eam/omp
eam/opt edip edip/multi edip/omp eim
eim/omp extep gran/hertz/history
gran/hertz/history/omp gran/hooke gran/hooke/history
gran/hooke/history/kk gran/hooke/history/omp gran/hooke/omp
granular gw gw/zbl hbond/dreiding/lj
hbond/dreiding/lj/omp hbond/dreiding/morse
hbond/dreiding/morse/omp hybrid hybrid/omp hybrid/kk
hybrid/molecular hybrid/molecular/omp hybrid/overlay
hybrid/overlay/omp hybrid/overlay/kk hybrid/scaled
hybrid/scaled/omp lcbop lj/charmm/coul/charmm
lj/charmm/coul/charmm/gpu lj/charmm/coul/charmm/implicit
lj/charmm/coul/charmm/implicit/kk
lj/charmm/coul/charmm/implicit/omp lj/charmm/coul/charmm/intel
lj/charmm/coul/charmm/kk lj/charmm/coul/charmm/omp
lj/charmm/coul/long lj/charmm/coul/long/gpu
lj/charmm/coul/long/intel lj/charmm/coul/long/kk
lj/charmm/coul/long/omp lj/charmm/coul/long/opt
lj/charmm/coul/long/soft lj/charmm/coul/long/soft/omp
lj/charmm/coul/msm lj/charmm/coul/msm/omp
lj/charmmfsw/coul/charmmfsh lj/charmmfsw/coul/long
lj/charmmfsw/coul/long/kk lj/class2 lj/class2/coul/cut
lj/class2/coul/cut/kk lj/class2/coul/cut/omp
lj/class2/coul/cut/soft lj/class2/coul/long
lj/class2/coul/long/gpu lj/class2/coul/long/kk
lj/class2/coul/long/omp lj/class2/coul/long/soft lj/class2/gpu
lj/class2/kk lj/class2/omp lj/class2/soft lj/cut lj/cut/coul/cut
lj/cut/coul/cut/gpu lj/cut/coul/cut/kk
lj/cut/coul/cut/omp lj/cut/coul/cut/soft
lj/cut/coul/cut/soft/gpu lj/cut/coul/cut/soft/omp lj/cut/coul/long
lj/cut/coul/long/gpu lj/cut/coul/long/intel
lj/cut/coul/long/kk lj/cut/coul/long/omp
lj/cut/coul/long/opt lj/cut/coul/long/soft
lj/cut/coul/long/soft/gpu lj/cut/coul/long/soft/omp lj/cut/coul/msm
lj/cut/coul/msm/gpu lj/cut/coul/msm/omp lj/cut/gpu
lj/cut/intel lj/cut/kk lj/cut/omp lj/cut/opt lj/cut/soft
lj/cut/soft/omp lj/cut/tip4p/cut lj/cut/tip4p/cut/omp
lj/cut/tip4p/long lj/cut/tip4p/long/gpu
lj/cut/tip4p/long/omp lj/cut/tip4p/long/opt
lj/cut/tip4p/long/soft lj/cut/tip4p/long/soft/omp lj/expand
lj/expand/gpu lj/expand/kk lj/expand/omp lj/long/coul/long
lj/long/coul/long/intel lj/long/coul/long/omp
lj/long/coul/long/opt lj/long/tip4p/long
lj/long/tip4p/long/omp local/density meam/spline meam/spline/omp
meam/sw/spline morse morse/gpu morse/kk morse/omp
morse/opt morse/soft nb3b/harmonic nb3b/screened polymorphic
rebo rebo/intel rebo/omp rebomos rebomos/omp
soft soft/gpu soft/kk soft/omp sw
sw/angle/table sw/gpu sw/intel sw/kk sw/mod
sw/mod/omp sw/omp table table/gpu table/kk
table/omp tersoff tersoff/gpu tersoff/kk tersoff/mod
tersoff/mod/c tersoff/mod/c/omp tersoff/mod/gpu tersoff/mod/kk
tersoff/mod/omp tersoff/omp tersoff/table tersoff/table/omp
tersoff/zbl tersoff/zbl/gpu tersoff/zbl/kk tersoff/zbl/omp threebody/table
tip4p/cut tip4p/cut/omp tip4p/long tip4p/long/omp tip4p/long/soft
tip4p/long/soft/omp vashishta vashishta/gpu vashishta/kk
vashishta/omp vashishta/table vashishta/table/omp yukawa
yukawa/gpu yukawa/kk yukawa/omp zbl zbl/gpu
zbl/kk zbl/omp zero
* Bond styles:
class2 class2/kk class2/omp fene fene/expand
fene/expand/omp fene/intel fene/kk fene/omp gromos
gromos/omp harmonic harmonic/intel harmonic/kk harmonic/omp
hybrid hybrid/kk morse morse/omp quartic
quartic/omp table table/omp zero
* Angle styles:
charmm charmm/intel charmm/kk charmm/omp class2
class2/kk class2/omp cosine cosine/kk cosine/omp
cosine/squared cosine/squared/omp harmonic harmonic/intel
harmonic/kk harmonic/omp hybrid hybrid/kk table
table/omp zero
* Dihedral styles:
charmm charmm/intel charmm/kk charmm/omp charmmfsw
charmmfsw/kk class2 class2/kk class2/omp harmonic
harmonic/intel harmonic/kk harmonic/omp hybrid hybrid/kk
multi/harmonic multi/harmonic/omp opls opls/intel
opls/kk opls/omp table table/omp zero
* Improper styles:
class2 class2/kk class2/omp cvff cvff/intel
cvff/omp harmonic harmonic/intel harmonic/kk harmonic/omp
hybrid hybrid/kk umbrella umbrella/omp zero
* KSpace styles:
ewald ewald/dipole ewald/dipole/spin ewald/disp
ewald/disp/dipole ewald/omp msm msm/cg
msm/cg/omp msm/omp pppm pppm/cg pppm/cg/omp
pppm/dipole pppm/dipole/spin pppm/disp pppm/disp/intel
pppm/disp/omp pppm/disp/tip4p pppm/disp/tip4p/omp pppm/gpu
pppm/intel pppm/kk pppm/omp pppm/stagger pppm/tip4p
pppm/tip4p/omp
* Fix styles
adapt adapt/fep add/heat addforce alchemy
ave/atom ave/chunk ave/correlate ave/grid ave/histo
ave/histo/weight ave/time aveforce balance
box/relax cmap damping/cundall deform deform/kk
deposit ave/spatial ave/spatial/sphere lb/pc
lb/rigid/pc/sphere reax/c/bonds reax/c/species dt/reset
dt/reset/kk efield efield/kk ehex enforce2d
enforce2d/kk evaporate external freeze freeze/kk
gravity gravity/kk gravity/omp grem halt
heat heat/flow hyper/global hyper/local indent
langevin langevin/kk lineforce momentum momentum/kk
move neb nph nph/kk nph/omp
nph/sphere nph/sphere/omp npt npt/gpu npt/intel
npt/kk npt/omp npt/sphere npt/sphere/omp nve
nve/gpu nve/intel nve/kk nve/limit nve/noforce
nve/omp nve/sphere nve/sphere/kk nve/sphere/omp nvt
nvt/gpu nvt/intel nvt/kk nvt/omp nvt/sllod
nvt/sllod/intel nvt/sllod/kk nvt/sllod/omp nvt/sphere nvt/sphere/omp
pair pimd/langevin pimd pimd/nvt planeforce
pour press/berendsen press/langevin print property/atom
property/atom/kk qeq/comb qeq/comb/omp rattle
recenter restrain rigid rigid/nph rigid/nph/omp
rigid/nph/small rigid/npt rigid/npt/omp rigid/npt/small rigid/nve
rigid/nve/omp rigid/nve/small rigid/nvt rigid/nvt/omp rigid/nvt/small
rigid/omp rigid/small rigid/small/omp setforce setforce/kk
shake shake/kk spring spring/chunk spring/self
spring/self/kk store/force store/state temp/berendsen
temp/berendsen/kk temp/rescale temp/rescale/kk
thermal/conductivity tune/kspace vector viscous
viscous/kk wall/gran wall/gran/kk wall/gran/region
wall/harmonic wall/lj1043 wall/lj126 wall/lj93 wall/lj93/kk
wall/morse wall/reflect wall/reflect/kk wall/region wall/table
* Compute styles:
aggregate/atom angle angle/local angmom/chunk bond
bond/local centro/atom centroid/stress/atom chunk/atom
chunk/spread/atom cluster/atom cna/atom com
com/chunk contact/atom coord/atom coord/atom/kk count/type
mesont dihedral dihedral/local dipole dipole/chunk
displace/atom erotate/rigid erotate/sphere erotate/sphere/atom
erotate/sphere/kk event/displace fabric fep
fep/ta force/tally fragment/atom global/atom group/group
gyration gyration/chunk heat/flux heat/flux/tally
heat/flux/virial/tally improper improper/local inertia/chunk
ke ke/atom ke/rigid msd msd/chunk
omega/chunk orientorder/atom orientorder/atom/kk
pair pair/local pe pe/atom pe/mol/tally
pe/tally pressure pressure/alchemy property/atom
property/chunk property/grid property/local rdf reduce
reduce/chunk reduce/region rigid/local slice stress/atom
stress/tally temp temp/chunk temp/com temp/deform
temp/deform/kk temp/kk temp/partial temp/profile temp/ramp
temp/region temp/sphere torque/chunk vacf vcm/chunk
* Region styles:
block block/kk cone cylinder ellipsoid
intersect plane prism sphere union
* Dump styles:
atom cfg custom atom/mpiio cfg/mpiio
custom/mpiio xyz/mpiio grid grid/vtk image
local movie xyz
* Command styles
angle_write balance change_box create_atoms create_bonds
create_box delete_atoms delete_bonds box kim_init
kim_interactions kim_param kim_property kim_query
reset_ids reset_atom_ids reset_mol_ids message server
dihedral_write displace_atoms hyper info minimize
neb prd read_data read_dump read_restart
replicate rerun run set tad
temper temper/grem temper/npt velocity write_coeff
write_data write_dump write_restart
```
---

### cpu
```bash
JobID: 22
Running on nodes: pca2
CUDA_VISIBLE_DEVICES =
MPI ranks per node: 32
CPUs per task: 1
LAMMPS (29 Aug 2024 - Update 4)
using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 3.52 3.52 3.52
Created orthogonal box = (0 0 0) to (352 352 352)
2 by 4 by 4 MPI processor grid
Created 4000000 atoms
using lattice units in orthogonal box = (0 0 0) to (352 352 352)
create_atoms CPU = 0.015 seconds
Displacing atoms ...
Reading eam potential file Ni_u3.eam with DATE: 2007-06-11
Neighbor list info ...
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 6.8
ghost atom cutoff = 6.8
binsize = 3.4, bins = 104 104 104
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair eam/opt, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/3d
bin: standard
Setting up cg style minimization ...
Unit style : metal
Current step : 0
Per MPI rank memory allocation (min/avg/max) = 76.53 | 76.54 | 76.55 Mbytes
Step PotEng KinEng TotEng Press Temp Fmax
0 -14046871 0 -14046871 414938.55 0 53.018371
20 -17799955 0 -17799955 4.7971378 0 0.019115624
Loop time of 5.11289 on 32 procs for 20 steps with 4000000 atoms
98.8% CPU use with 32 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
-14046870.8718438 -17799939.1763127 -17799955.2793016
Force two-norm initial, final = 19245.351 12.649498
Force max component initial, final = 53.018371 0.019115624
Final line search alpha, max atom move = 1 0.019115624
Iterations, force evaluations = 20 30
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 4.7057 | 4.7111 | 4.7175 | 0.2 | 92.14
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 0.062993 | 0.067096 | 0.072593 | 1.0 | 1.31
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 0.3347 | | | 6.55
Nlocal: 125000 ave 125126 max 124860 min
Histogram: 2 2 0 3 7 6 6 4 0 2
Nghost: 51286.6 ave 51462 max 51166 min
Histogram: 5 2 5 5 5 3 1 2 3 1
Neighs: 7.71396e+06 ave 7.73975e+06 max 7.68722e+06 min
Histogram: 2 4 3 2 6 4 1 3 4 3
Total # of neighbors = 2.4684661e+08
Ave neighs/atom = 61.711653
Neighbor list builds = 0
Dangerous builds = 0
Setting up Verlet run ...
Unit style : metal
Current step : 20
Time step : 0.001
Per MPI rank memory allocation (min/avg/max) = 66.83 | 66.85 | 67.23 Mbytes
Step PotEng KinEng TotEng Press Temp Fmax
20 -17799955 155112.14 -17644843 3803.5136 300 0.019115624
100 -17718767 76000.979 -17642766 12377.967 146.99233 2.2196469
200 -17716422 78412.908 -17638009 12540.207 151.6572 2.307812
300 -17716838 84327.052 -17632511 12524.134 163.09566 2.1070217
400 -17716605 90228.378 -17626376 12707.739 174.50932 2.2037702
500 -17710582 90944.405 -17619638 13531.36 175.89418 2.4309026
600 -17705205 92817.646 -17612388 14269.344 179.51719 2.4872729
700 -17703050 98386.026 -17604664 14675.862 190.2869 2.6395935
800 -17698479 101974.06 -17596505 15336.635 197.22647 2.4025683
900 -17693409 105437.17 -17587971 16049.076 203.92442 2.6248567
1000 -17690372 111261.66 -17579111 16562.334 215.18947 2.4486198
1020 -17688285 110978.84 -17577306 16843.746 214.64247 2.5920241
Loop time of 126.241 on 32 procs for 1000 steps with 4000000 atoms
Performance: 0.684 ns/day, 35.067 hours/ns, 7.921 timesteps/s, 31.685 Matom-step/s
99.9% CPU use with 32 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 109.12 | 111.23 | 113.06 | 12.3 | 88.11
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 1.9338 | 3.7554 | 6.0903 | 70.9 | 2.97
Output | 0.017494 | 0.018573 | 0.021063 | 0.7 | 0.01
Modify | 7.6297 | 8.4877 | 9.3527 | 17.5 | 6.72
Other | | 2.752 | | | 2.18
Nlocal: 125000 ave 125951 max 124522 min
Histogram: 3 8 4 7 5 1 3 0 0 1
Nghost: 48821 ave 49299 max 47870 min
Histogram: 1 0 0 3 1 5 7 4 8 3
Neighs: 8.375e+06 ave 8.46123e+06 max 8.3338e+06 min
Histogram: 5 4 8 5 4 4 1 0 0 1
Total # of neighbors = 2.68e+08
Ave neighs/atom = 67
Neighbor list builds = 0
Dangerous builds = 0
Setting up cg style minimization ...
Unit style : metal
Current step : 1020
Per MPI rank memory allocation (min/avg/max) = 79.2 | 79.22 | 79.6 Mbytes
Step PotEng KinEng TotEng Press Temp Fmax
1020 -17688285 110978.84 -17577306 16843.746 214.64247 2.5920241
1100 -17800000 110978.84 -17689021 2717.8175 214.64247 0.00082848495
1200 -17800000 110978.84 -17689021 2717.7672 214.64247 0.00010387412
1300 -17800000 110978.84 -17689021 2717.766 214.64247 2.6133661e-05
1400 -17800000 110978.84 -17689021 2717.7659 214.64247 4.8547794e-06
1414 -17800000 110978.84 -17689021 2717.7659 214.64247 4.0074584e-06
Loop time of 124.177 on 32 procs for 394 steps with 4000000 atoms
99.9% CPU use with 32 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = linesearch alpha is zero
Energy initial, next-to-last, final =
-17688285.1593561 -17800000.0142948 -17800000.0142948
Force two-norm initial, final = 1603.6755 0.0025478634
Force max component initial, final = 2.5920241 4.0074584e-06
Final line search alpha, max atom move = 0.0009765625 3.9135336e-09
Iterations, force evaluations = 394 805
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 115.08 | 115.21 | 115.37 | 1.0 | 92.78
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 1.4932 | 1.5893 | 1.6884 | 4.4 | 1.28
Output | 0.010687 | 0.010743 | 0.011826 | 0.2 | 0.01
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 7.368 | | | 5.93
Nlocal: 125000 ave 125638 max 124665 min
Histogram: 1 7 6 6 7 3 0 1 0 1
Nghost: 48821.3 ave 49156 max 48184 min
Histogram: 1 0 1 0 3 7 6 6 7 1
Neighs: 8.37005e+06 ave 8.429e+06 max 8.33744e+06 min
Histogram: 4 4 6 6 3 4 4 0 0 1
Total # of neighbors = 2.6784159e+08
Ave neighs/atom = 66.960397
Neighbor list builds = 0
Dangerous builds = 0
Total wall time: 0:04:16
```
### gpu
```bash
JobID: 9
Running on nodes: pca1
CUDA_VISIBLE_DEVICES = 0,1,2,3
MPI ranks per node: 4
CPUs per task: 1
LAMMPS (29 Aug 2024 - Update 4)
KOKKOS mode with Kokkos version 4.3.1 is enabled (src/KOKKOS/kokkos.cpp:72)
will use up to 4 GPU(s) per node
WARNING: When using a single thread, the Kokkos Serial backend (i.e. Makefile.kokkos_mpi_only) gives better performance than the OpenMP backend (src/KOKKOS/kokkos.cpp:210)
using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 3.52 3.52 3.52
Created orthogonal box = (0 0 0) to (352 352 352)
1 by 2 by 2 MPI processor grid
Created 4000000 atoms
using lattice units in orthogonal box = (0 0 0) to (352 352 352)
create_atoms CPU = 0.141 seconds
Displacing atoms ...
Reading eam potential file Ni_u3.eam with DATE: 2007-06-11
Neighbor list info ...
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 6.8
ghost atom cutoff = 6.8
binsize = 6.8, bins = 52 52 52
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair eam/kk, perpetual
attributes: full, newton off, kokkos_device
pair build: full/bin/kk/device
stencil: full/bin/3d
bin: kk/device
Setting up cg/kk style minimization ...
Unit style : metal
Current step : 0
WARNING: Fix with atom-based arrays not compatible with sending data in Kokkos communication, switching to classic exchange/border communication (src/KOKKOS/comm_kokkos.cpp:754)
WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:216)
Per MPI rank memory allocation (min/avg/max) = 296.5 | 296.5 | 296.5 Mbytes
Step PotEng KinEng TotEng Press Temp Fmax
0 -14046871 0 -14046871 414938.55 0 53.018371
20 -17799955 0 -17799955 4.7971378 0 0.019115624
Loop time of 0.812471 on 4 procs for 20 steps with 4000000 atoms
97.7% CPU use with 4 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
-14046870.8718438 -17799939.1763125 -17799955.2793014
Force two-norm initial, final = 19245.351 12.649498
Force max component initial, final = 53.018371 0.019115624
Final line search alpha, max atom move = 1 0.019115624
Iterations, force evaluations = 20 30
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.67892 | 0.68045 | 0.68271 | 0.2 | 83.75
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 0.047649 | 0.048272 | 0.048838 | 0.3 | 5.94
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 0.08375 | | | 10.31
Nlocal: 1e+06 ave 1.00012e+06 max 999893 min
Histogram: 1 1 0 0 0 0 0 1 0 1
Nghost: 193838 ave 193921 max 193727 min
Histogram: 1 0 0 0 0 0 2 0 0 1
Neighs: 0 ave 0 max 0 min
Histogram: 4 0 0 0 0 0 0 0 0 0
FullNghs: 1.23423e+08 ave 1.2344e+08 max 1.23407e+08 min
Histogram: 1 1 0 0 0 0 0 1 0 1
Total # of neighbors = 4.9369322e+08
Ave neighs/atom = 123.42331
Neighbor list builds = 0
Dangerous builds = 0
Setting up Verlet run ...
Unit style : metal
Current step : 20
Time step : 0.001
Per MPI rank memory allocation (min/avg/max) = 218.1 | 218.1 | 218.1 Mbytes
Step PotEng KinEng TotEng Press Temp Fmax
20 -17799955 155112.14 -17644843 3803.5136 300 0.019115624
100 -17718758 75992.128 -17642766 12378.067 146.97521 2.1693243
200 -17716452 78443.126 -17638009 12537.735 151.71565 2.2885089
300 -17716884 84373.072 -17632511 12518.596 163.18466 2.0765823
400 -17716604 90227.696 -17626376 12707.592 174.508 2.1663835
500 -17710535 90897.38 -17619638 13536.112 175.80323 2.3503506
600 -17705234 92845.985 -17612388 14265.849 179.57199 2.5065403
700 -17703057 98391.919 -17604665 14674.33 190.2983 2.4700827
800 -17698498 101992.49 -17596506 15334.073 197.2621 2.4969924
900 -17693463 105491.4 -17587971 16043.463 204.0293 2.5061929
1000 -17690358 111247.26 -17579110 16563.735 215.16162 2.5405778
1020 -17688222 110914.97 -17577307 16849.696 214.51894 2.4545932
Loop time of 10.8834 on 4 procs for 1000 steps with 4000000 atoms
Performance: 7.939 ns/day, 3.023 hours/ns, 91.883 timesteps/s, 367.532 Matom-step/s
99.0% CPU use with 4 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 3.8781 | 3.8828 | 3.8927 | 0.3 | 35.68
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 1.2987 | 1.3015 | 1.3041 | 0.2 | 11.96
Output | 0.1336 | 0.15697 | 0.17478 | 3.8 | 1.44
Modify | 4.9992 | 5.0068 | 5.012 | 0.2 | 46.00
Other | | 0.5354 | | | 4.92
Nlocal: 1e+06 ave 1.00073e+06 max 999640 min
Histogram: 2 0 1 0 0 0 0 0 0 1
Nghost: 184971 ave 185331 max 184237 min
Histogram: 1 0 0 0 0 0 0 1 0 2
Neighs: 0 ave 0 max 0 min
Histogram: 4 0 0 0 0 0 0 0 0 0
FullNghs: 1.34e+08 ave 1.34098e+08 max 1.33952e+08 min
Histogram: 2 0 1 0 0 0 0 0 0 1
Total # of neighbors = 5.36e+08
Ave neighs/atom = 134
Neighbor list builds = 0
Dangerous builds = 0
Setting up cg/kk style minimization ...
Unit style : metal
Current step : 1020
WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:216)
Per MPI rank memory allocation (min/avg/max) = 300.1 | 300.1 | 300.1 Mbytes
Step PotEng KinEng TotEng Press Temp Fmax
1020 -17688222 110914.97 -17577307 16849.696 214.51894 2.4545932
1100 -17800000 110914.97 -17689085 2716.256 214.51894 0.00093680375
1200 -17800000 110914.97 -17689085 2716.2032 214.51894 0.00011017142
1300 -17800000 110914.97 -17689085 2716.2018 214.51894 2.5959422e-05
1400 -17800000 110914.97 -17689085 2716.2017 214.51894 5.1501934e-06
1450 -17800000 110914.97 -17689085 2716.2017 214.51894 2.3465206e-06
Loop time of 19.859 on 4 procs for 430 steps with 4000000 atoms
99.7% CPU use with 4 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
-17688221.5193737 -17800000.0143065 -17800000.0143066
Force two-norm initial, final = 1603.7099 0.0014109707
Force max component initial, final = 2.4545932 2.3465206e-06
Final line search alpha, max atom move = 1 2.3465206e-06
Iterations, force evaluations = 430 858
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 17.99 | 18.006 | 18.024 | 0.3 | 90.67
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 1.109 | 1.1115 | 1.1143 | 0.2 | 5.60
Output | 0.036226 | 0.043084 | 0.05159 | 2.7 | 0.22
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 0.6984 | | | 3.52
Nlocal: 1e+06 ave 1.00052e+06 max 999583 min
Histogram: 1 1 0 0 0 0 1 0 0 1
Nghost: 184972 ave 185388 max 184456 min
Histogram: 1 0 0 1 0 0 0 0 1 1
Neighs: 0 ave 0 max 0 min
Histogram: 4 0 0 0 0 0 0 0 0 0
FullNghs: 1.33921e+08 ave 1.33991e+08 max 1.33865e+08 min
Histogram: 1 1 0 0 0 0 1 0 0 1
Total # of neighbors = 5.3568267e+08
Ave neighs/atom = 133.92067
Neighbor list builds = 0
Dangerous builds = 0
Total wall time: 0:00:39
```
---
