# LAMMPS
[toc]
## Tips
1. **最重要的:所有人都要用一樣的compiler!!**
2. 每新安裝一個dependency 記得 export path
3. 如果發現是路徑有問題直接開新的terminal重裝
4. 有用cmake開一個folder用來build,失敗就刪掉重來
5. configure 和 install 的 folder 分開
- build: for configuration
- opt: for installation
## Setup for build
```bash
mkdir lammps
cd lammps
mkdir build opt
```
### Paths
```bash
ROOT=$HOME/lammps
BUILD=$HOME/lammps/build
OPT=$HOME/lammps/opt
HDF5=hdf5-1.8.18
PNETCDF=pnetcdf-1.12.3
NETCDF=netcdf-c-4.8.1
NETCDF_FORTRAN=netcdf-fortran-4.6.1
PYTHON=Python-3.12.0
ADIOS=ADIOS-2.8.0
VORO=voro++-0.4.6
LAMMPS=lammps-2Aug2023
```
### <.bashrc>
> Compiler: GCC 11.4.0
```bash
export PATH=/home/scteam06/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/home/scteam06/openmpi/lib:$LD_LIBRARY_PATH
export HDF5_DIR=/home/scteam06/lammps/opt/hdf5-1.8.18
export PNETCDF_DIR=/home/scteam06/lammps/opt/pnetcdf-1.12.3
export NETCDF_DIR=/home/scteam06/lammps/opt/netcdf-c-4.8.1
export NETCDF_FORTRAN_DIR=/home/scteam06/lammps/opt/netcdf-fortran-4.6.1
export ADIOS_DIR=/home/scteam06/lammps/build/ADIOS-2.8.0
export VORO_DIR=/home/scteam06/lammps/opt/voro++-0.4.6
export ZFP_DIR=/home/scteam06/lammps/opt/zfp
export PATH=$HDF5_DIR/bin:$PNETCDF_DIR/bin:$NETCDF_DIR/bin:$ADIOS_DIR/bin:$VORO_DIR/bin:$ZFP_DIR/bin:$PATH
export LD_LIBRARY_PATH=$HDF5_DIR/lib:$PNETCDF_DIR/lib:$NETCDF_DIR/lib:$ADIOS_DIR/lib:$VORO_DIR/lib:$ZFP_DIR/lib:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=$HDF5_DIR/include:$PNETCDF_DIR/include:$NETCDF_DIR/include:$ADIOS_DIR/include:$VORO_DIR/include:$ZFP_DIR/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=$HDF5_DIR/include:$PNETCDF_DIR/include:$NETCDF_DIR/include:$ADIOS_DIR/include:$VORO_DIR/include:$ZFP_DIR/include:$CPLUS_INCLUDE_PATH
export LIBRARY_PATH=$HDF5_DIR/lib:$PNETCDF_DIR/lib:$NETCDF_DIR/lib:$ADIOS_DIR/lib:$VORO_DIR/lib:$ZFP_DIR/lib:$LIBRARY_PATH
export PATH=/home/scteam06/lammps/build/lammps-2Aug2023/bin:$PATH
export LD_LIBRARY_PATH=/home/scteam06/lammps/build/lammps-2Aug2023/lib:$LD_LIBRARY_PATH
```
```bash
source ~/.bashrc
```
## Dependencies
### Zlib 1.2.11
```bash
cd $BUILD
wget https://nchc.dl.sourceforge.net/project/libpng/zlib/1.2.11/zlib-1.2.11.tar.gz
tar -xvf zlib-1.2.11.tar.gz
cd zlib-1.2.11
CC=mpicc CXX=mpicxx \
./configure \
--prefix=$HOME/opt/zlib-1.2.11 #zlib路徑不影響
make -j
sudo make install
```
### HDF5 1.8.18
```bash
cd $BUILD
wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8/hdf5-1.8.18/src/hdf5-1.8.18.tar.gz
tar xf $HDF5.tar.gz
cd $HDF5
CC=mpicc CXX=mpicxx FC=mpifort \
./configure \
--prefix=$OPT/$HDF5 \
--enable-shared \
--enable-parallel
make -j
make install
```

### PNETCDF 1.12.3
```bash
cd $BUILD
wget https://parallel-netcdf.github.io/Release/pnetcdf-1.12.3.tar.gz
tar xf $PNETCDF.tar.gz
cd $PNETCDF
CC=mpicc CXX=mpicxx FC=mpifort \
./configure \
--enable-shared \
--enable-subfiling \
--prefix=$OPT/$PNETCDF
make -j
make install
```

### NETCDF-C 4.8.1
```bash
cd $BUILD
wget https://github.com/Unidata/netcdf-c/archive/refs/tags/v4.8.1.tar.gz
tar xf v4.8.1.tar.gz
cd $NETCDF
CC=mpicc CXX=mpicxx FC=mpifort \
./configure \
--enable-pnetcdf \
--enable-hdf5 \
--enable-shared \
--prefix=$OPT/$NETCDF
make -j
make install
```

### NETCDF-fortran 4.6.1
```bash
cd $BUILD
wget https://downloads.unidata.ucar.edu/netcdf-fortran/4.6.1/$NETCDF_FORTRAN.tar.gz
tar xf $NETCDF_FORTRAN.tar.gz
cd $NETCDF_FORTRAN
CC=mpicc CXX=mpicxx FC=mpifort \
./configure \
--prefix=$OPT/$NETCDF_FORTRAN \
--enable-shared \
make -j
make install
```

### ADIOS 2.8.0
```bash
cd $BUILD
git clone -b v2.8.0 https://github.com/ornladios/ADIOS2.git $ADIOS
cd $ADIOS
mkdir build
cd build
CCC=mpicc CXX=mpicxx FC=mpifort \
cmake .. \
-DMPI_Fortran_COMPILER=mpifort \
-DCMAKE_Fortran_COMPILER=mpifort \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_CXX_COMPILER=mpicxx \
-DCMAKE_CXX_FLAGS="-O3 -fpic" \
-DADIOS2_USE_MPI=ON \
-DADIOS2_USE_Endian_Reverse=ON \
-DADIOS2_USE_Fortran=ON \
-DADIOS2_RUN_INSTALL_TEST=OFF \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=ONLY \
-DCMAKE_INSTALL_PREFIX=$BUILD/$ADIOS \
-DADIOS2_INSTALL_GENERATE_CONFIG=OFF \
-DHDF5_DIR=/home/scteam06/lammps/opt/hdf5-1.8.18
-DADIOS2_BUILD_HDF5=ON #不要用 make -j
make install
```
### VORONOI
```bash
cd $ROOT
wget https://math.lbl.gov/voro++/download/dir/$VORO.tar.gz
tar xf $VORO.tar.gz
cd $VORO
# you have to modify config.mk first
# modify:CXX, CFLAG, INSTALL prefix
# CXX=mpicxx
# CFLAG=-ansi -O3 -march=cascadelake
# $BUILD=`your built loaction`
# $VORO=voro++-0.4.6
# PREFIX=${BUILD}/${VORO}
# Build shared library by yourself
make -j
make install
```
### Visualization tools
- JPEG
```bash
sudo apt-get install libjpeg-dev
```
- FFMPEG
```bash
sudo apt-get install ffmpeg
```
### Others
- ZFP
```bash
cd $BUILD
wget https://github.com/LLNL/zfp/archive/refs/tags/0.5.5.tar.gz
tar -xzf 0.5.5.tar.gz
cd zfp-0.5.5
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$OPT/zfp
make -j
make install
```
- LAPACK
```bash
sudo apt-get install liblapack-dev
```
## Build LAMMPS
> Version: 2Aug2023
```bash
OPTFLAGS="-march=cascadelake -O2"
CC=mpicc
CXX=mpicxx
FC=mpifort
CFLAGS="-fPIC -march=cascadelake -fopenmp -Wrestrict -DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG -Df2cFortran $OPTFLAGS"
CXXFLAGS="-fPIC -march=cascadelake -fopenmp -Wrestrict -DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG -Df2cFortran -std=c++17 $OPTFLAGS"
FCFLAGS="-fPIC -march=cascadelake -Wrestrict -DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG $OPTFLAGS"
LDFLAGS="$OPTFLAGS -L$MKLROOT/lib/intel64 -ltbbmalloc -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -L$ZFP_DIR/lib -lzfp"
cmake ../cmake \
-DCMAKE_INSTALL_PREFIX=$BUILD/$LAMMPS \
-DBUILD_SHARED_LIBS=yes \
-DINTEL_ARCH=cpu \
-DFFT=MKL \
-DFFT_MKL_THREADS=on \
-DWITH_GZIP=yes \
-DPKG_ADIOS=yes \
-DADIOS2_DIR=$ADIOS_DIR \
-DPKG_NETCDF=yes \
-DBUILD_MPI=yes \
-DBUILD_OMP=yes \
-DLAMMPS_MACHINE=mpi \
-DPKG_OPENMP=yes \
-DPKG_OPT=yes \
-DPYTHON_EXECUTABLE=$(which python3) \
-DPKG_ML-QUIP=yes \
-DPKG_ML-HDNNP=yes \
-DPKG_VORONOI=yes \
-DWITH_JPEG=yes \
-DLAMMPS_FFMPEG=yes
make -j
make install
```
- Get `result.txt` (`lmp_mpi`就在 /lammps-2Aug2023/build)
```bash
lmp_mpi -help > result.txt
```

## Run LAMMPS
### Lennard Jones Simulation
- `lammps/bench/in.lj`: set *run = 10000*
- nproc = 8
```bash
mpirun -np 8 lmp_mpi -in in.lj
```


### LAMMPS.out
Output with `time` command
```bash
/usr/bin/time -v mpirun -np 4 lmp_mpi -in in.lj > LAMMPS.out 2>&1
```

## Visualization
### JPG
In <in.lj>, add:
```
dump 1 all image 100 image.*.jpg type type &
zoom 1.6 adiam 1.0
dump_modify 1 pad 5
```
Since the simulation runs for 10000 timesteps, and an image will be created every 100 timesteps, we should get 100 images in total (one for each of the following timesteps: 0, 100, 200, ..., 9900).
- **image.00000**

- **image.05000**

- **image.10000**

### FFMPEG
In <in.lj>, add:
```
dump 1 all movie 100 movie.mp4 type type &
zoom 1.0 adiam 1.0
dump_modify 1 pad 5
```
This script creates [movie.mp4](https://drive.google.com/uc?export=download&id=1HHHyYj2UqOjjoehxxrv7bymTyAomf4jC), capturing frames every 100 timesteps of the simulation.
## Performance
因為LAMMPS跑出來就會提供滿詳盡的performance數據,且較為直覺,所以雖然我中間也有用vtune測試過,後來還是以它本來的結果為主。

Lennard-Jones(LJ) potential這個數學模型是用來描述兩個電中性的分子或原子間交互作用位能,因此我們主要關心的結果就是圖中的的幾個物理量:

這是最開始的<in.lj>得到的模擬結果(10000 steps),可以發現:
1. 溫度(Temp)明顯降低,表示系統趨於穩定(平衡)。
2. 由位能(E_pair)變大可知原子之間的距離變遠了。
3. 分子能量維持0,因為過程中沒有分子交互作用。
4. 總能量稍微降低(幾乎不變),可能是平衡過程難免有能量散失。
5. 壓力由負的轉正,表示系統壓縮的狀態放鬆了。
因此接下來的優化過程中,要注意:
- 不能調整LJ相關的參數 `pair_style` 和 `pair_coeff`。
```
pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5
```
$$
V(r) = 4\epsilon \left[ \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^6 \right]
$$
- $\epsilon = 1.0$ (depth of the potential well)
- $\sigma = 1.0$ (finite distance where potential is zero)
- `2.5`: cutoff distance for the potential
- 結果得到的數值應保持一致,以確保在提升效能的同時模擬是精確的。
Performance主要看的是"tau/day", "timesteps/s", "Matom-step/s"CPU使用率。而由圖可看出該模擬最大的瓶頸就是在"pairing"這部份,"communication"也有進步空間。了解這些情況後就可以開始優化了。

### Number of MPI ranks (np)

> np = 8 時整體效率最好
### Number of OpenMP threads per MPI rank

> n = 1, np = 8 時整體效率最好
### Modify <in.lj> script
- Atom sorting on/off
| Sort Interval | Performance (tau/day) | Timesteps/s | Matom-step/s | CPU Usage (%) |
|------------------|-----------------------|-------------|--------------|---------------|
| 0 1.0 | 157222.225 | 363.940 | 11.646 | 96.7 |
| 0 2.0 | 161417.419 | 373.651 | 11.957 | 95.9 |
| 50 2.0 | 170291.700 | 394.194 | 12.614 | 96.4 |
| 100 2.0 | 168046.347 | 388.996 | 12.448 | 96.2 |
- **Performance (tau/day)** 隨著 `sort interval` 的增加而提高,特別是在 `50 2.0` 時達到最高值。
- **Timesteps/s** 和 **Matom-step/s** 也在 `50 2.0` 時達到最高值,這表明此配置在這些條件下性能最佳。
- **CPU Usage** 在各種配置下變化不大,保持在 95-97% 之間。
- Newton flag on/off (預設是on):發現off之後效能退步了。
- `processors * * *`: LAMMPS可以自動偵測最佳的prossessor分配方式,但實驗發現影響不大。
- **Neighbor list**:
```
neighbor 0.05 bin
neigh_modify delay 0 every 100 check yes
```

- `neigh_modify delay 0 every 100 check yes` : 提高更新頻率可以減少neighbor的重建次數,從而減少計算開銷,提升性能。然而,更新頻率過低可能導致鄰居列表過時,增加不必要的計算。從結果看來,every 100 的性能優於 every 20,說明在這個模擬中,減少鄰居列表的更新次數更有利於性能提升。
- `neighbor 0.05 bin`: bin 尺寸影響鄰居查找的效率。較小的 bin 尺寸可以減少每個 bin 中的原子數量,從而加快鄰居查找速度。從數據看來,將 bin 尺寸從 0.2 減少到 0.05,性能顯著提升。這說明較小的 bin 尺寸在這個模擬中更加有效,能夠顯著減少計算時間。
- 綜合考慮,最佳的配置是 delay 0 every 100 check yes 和 neighbor 0.05 bin,性能指標(tau/day、timesteps/s、Matom-step/s)均達到最高,==**整體效能提升了 42.16%**==。
