---
title: NAMD Bio-Science Application
tags: APAC HPC-AI competition
---
[Benchmark](https://www.hpcadvisorycouncil.com/events/2020/APAC-AI-HPC/pdf/HPC-AI_Competition_NAMD_Benchmark_Guideline.pdf)
[NAMD and Charm++](https://www.youtube.com/watch?v=-hiCMAtX0Hc)
[3D Deep learning](https://www.youtube.com/watch?v=T-tFg8b6R-0&list=PLa6_BhFeNAXx4ASDtGBauhZvjVnr20xxs&index=5)
# NAMD Bio-Science Application
[TOC]
## Something need to be installed
- Charm++
- NAMD
- UCX
- OpenMPI / Intel MPI
---
## NAMD serves NIH (health) mission and use
practical supercomputing for Biomedical research
- 1115000 users cannot all be computer experts
- 18% are NIH-funded; many in other countries
- 34000 have downloaded more than one version
- 13000 citations of NAMD reference papers
- 1000 users per month download lastest release
- One program available on all platforms
- desktops and laptops - setup and testing
- Linux clusters - affordable local workhorse
- Supercomputers - "most used code" at XSEDE TACA
- Petascale - "widest- used application" on Blue Waters
- Exascale - early science on Frontier,Aurora
- GPUs - from desktop to supercomputer
- User knowledge is preserved across platforms
- No change in input or output files.
- Run any simulation on **any number of cores**.
- Available free of charge to all.
---
## NAMD embeds the TCl scripting language instead of python
- to enable *portable* innovation *by users*
- no need to recompile, can move scripts between platforms
- package management, portable
- interfaces haven't changed
- encapsulates mini-languages
- used in VMD
- looks like a simple script language, don't scare no-programmers
## NAMD runs well on NVIDIA GPUs
### What to expect :
- 1 GPU =~ 100 CPU cores
- Depending on CPU and GPU
- Scaling to 10K atoms/GPU
- Assuming fast network
- Must use smp/multicore
- Many cores share each GPU
- Use multicore for single node
- At most one process per GPU
- Must use `+pemap 0-n`
- Consider `+devices i,j`
### Why it may be wrong :
- Weak GPU (e.g. laptop)
- Too few CPU cores used
- Coarse-grained simulation
- Too few atoms per GPU
- Limited by network
- Limited by MPI (use verbs)
- Limited by special features
---
## But GPU Builds Disable Some Features
### Disable
- Alchemical (FEP and TI)
- Locally enhanced sampling
- Tabulated energies
- Drude (nonbonded Thole)
- Go forces
- Pairwise interaction
- Pressure profile
### Not Disabled
- Memory optimized builds
- Conformational free energy
- Collective variables
- Grid forces
- steering forces
- Almost everything else
---
## NAMD and Charm++ Grew Up together
### Parallel Programming Lab Achievements :
- Charm++ parallel runtime system
- Gordon Bell Prize 2002
- IEEE Fernbach Award 2012
- 16 publications SC 2012-16
- 6+ codes on Blue Waters
### Charm++ features used by NAMD :
- Parallel C++ with data driven objects
- Asynchronous method invocation
- Priortized scheduling of messages/execution
- Measurement-based load balancing
- Portable messaging layer
---
## A NAMD Build Script is Pretty Short
```bash=
tar xzf NAMD_2.14_Source.tar.gz
```
```bash=
cd NAMD_2.14_Source
```
```bash=
tar xf charm-6.10.1.tar
```
```bash=
cd charm-6.10.1
```
```bash=
./build charm++ verbs-linux-x86_64 smp icc --no-build-shared --with-production -j 8
```
```bash=
./config Linux-x86_64-icc Frontera --with-mkl --charm-arch verbs-linux-x86_64-smp-icc
```
```bash=
cd Linux-x86_64-icc
```
```bash=
make release -j 8
```
---
## Must Choose Charm++ Build OPtions
- Choose network layer :
- multicore (smp but only a single process, no network)
- netlrts (UDP over ethernet or loopback)
- gni-crayx[ce] (Cray Gemini or Aries network)
- verbs or ucx (InfiniBand)
- mpi (fall back to MPI library, use for Omni-Path)
- Choose smp or (default) non-smp :
- smp uses one core process for communication
- Optional compiler options :
- iccstatic usea Intel compiler, links Intel-provided libreries statically
- Also : `--no-build-shared -- with-production`
---
## Plus a Few NAMD Build Config Options
- Choose network layer :
- Build Charm++
- `--charm-base <build-dir> --charm-arch <arch>`
- Choose FFTW 3 or Intel MKL rather than default FFTW 2 :
- `--with-fftw3` (you want 3, but our binaries ship with 2)
- Options :
- `--with-cuda` (NVIDIA GPU acceleration, requires smp)
- `--with-memopt` (use compressed structure files)
- Using smp build is the simplest way to reduce memory usage
---
## Charm++ Launch Varies Across Platforms
- Multicore - run namd2 binary directly, specify +pP
- MPI, ucx, and gni (Cray) builds follow system docs
- Typically mpirun, mpiexec, aprun, srun
- **Specify `+ppn M`** (also `+pemap`...) to namd2 for smp builds
- Other (verbs, netlrs) use charmrun
- charmrun `++n N [++ppn M] /path/to/namd2 arg`
- N is number of processes, older Charm++ required `++pP` (total PEs)
- With queueing system use `++mpiexec[-no-n]`, `++remote-shell`
- Otherwise must set up nodelist file (see notes.txt, Charm++ docs)
- For single host can use `++local`
- Arguments to charmrun begin with "++", to namd2 with "+"
---
## NAMD command line is consistent
- `+pemap m-n[:stride.run+add+add][,...]`
- For example, 0-31:8.7 = 0-6,8-14, 16-22 24-30
- For smp also `+commap`, e.g. 7-31:8=7,15,23,31
- Path(s) to NAMD simulation config file(s)
- Accepts any option on command line with `--name value`
- Files/arguments are treated as if concatenated
- Matters if run/minimize commands are used
- Startup at first run/minimize command or end of file(s)
- All paths are relative to config file
- To avoid this can do `--source /path/to/file`
---
## NAMD/Charm++ Performance Tips
- <font color=red>DO NOT</font> use the MPI network (except on OmniPath)
- Low-level verbs, gni, pami, ucx layers exist because they are faster
- Leverage MPI startup via `charmrun ++mpiexec`
- See also `++scalabe-start`, `++remote-shell`, `++runscript`
- <font color=red>DO</font> use SMP builds for larger simulations
- Reduced memory usage and often faster
- Trade-off : communication thread not available for work
- Major direction of future optimization and tuning
- <font color=red>DO</font> set processor affinity explocitly
- For example : `++ppn 7 +commap 0,8 +pemap 1-7,9-15`
- Cray by default tends to lock all threads onto same core
- <font color=red>DO</font> save one core for OS to improve scaling
- Cray `aprun -r 1` reserves and forces OS to run on last core
---
## Measure Relevant Perform
- Benchmark your user's real science on your machine
- <font color=red>DO NOT</font> simply "time namd2..."
- Includes startup and load balancing
- Really want marginal cost of additional ns
- Startup time is highly variable across runs
- Need 500-1000 steps for load balancing
- Several "LDB:" outputs near beginning of run
- Look for several "Benchmark time:" lines on output
- For "TIMING:" output only care about wall(clock) time
- Be sur to benchmark dynamics, not minimization
---
### Example 1 : ALCF Theta Build and Run Options
- 64-core processors, Cray Aries network
- build charm++ gni-crayxc persistent smp -xMIC-AVX512
- `aprun -n $((7*$nodes)) -N 7 -d 17 -j 2 -r 1`
- `+ppn 16 +pemap 0-55 +commap 56-62`
#### ALCF Theta Run Option Math
- 64 cores, reserve one for OS (-r 1), leaves 63
- 63 = 9x7 = 9x(6+1) = 54 PE + 9 comm
+ppn 12 +pemap 0-53+64 +commap 54-62
- 63 = 7x9 = 7x(8+1) = 56 PE + 7 comm
+ppn 16 +pemap 0-55+64 +commap 56-62
- 60 = 4x15 = 4x(14+1) = 56 PE + 4 comm
+ppn 28 +pemap 0-63:16.14+64 +commap 14-62:16
### Example 2: TACC Stampede KNL Build and Run Options
- 68-core processors, Intel Omni-Path network
- build mpi-linux-x86_64 smp icc -xMIC-AVX512
- `sbatch --ntasks=$((13*$nodes))`
- `+ppn 8 +pemap 0-51+68 +commap 53-65`
(or `+ppn 4 +pemap 0-51 +commap 53-65`)
#### TACC Stampede KNL Run Option Math
- 68 cores, reserve one for OS, leaves 67
- 65 = 13x5 = 13x(4+1) = 52 PE + 13 comm
+ppn 8 +pemap 0-51+68 +commap 53-65
- 66 = 6x11 = 6x(10+1) = 60 PE + 6 comm
+ppn 20 +pemap 0-59+68 +commap 60-65
- 68 = 4x17 = 4x(16+1) = 64 PE + 4 comm
+ppn 32 +pemap 0-63+68 +commap 64-67
### KNL Run Option Reasoning
- Leave core free to isolate OS noise
- Pairs of cores on a "tile" share 1MB L2 cache
- <font color=red>Do not</font> split tile between PEs of different nodes
- OK to split tile between comm threads
- Use 1 or 2 hyperthreads for PE core
- Dedicate core to each comm thread
- Need several comm threads per host
- Fewer for Cray Aries and than for Intel Omni-Path
- Multiple copies of static data reduce memory contention
- Different configuration fit 64-core vs 68-core models
---

---
## UCX NAMD Hangs on Frontera
- Appears to be an issue in UCX library :
- Not fixed in UCX 1.8.1 release
- Likely fixed in UCX master and 1.9.x branches
- Download from https://github.com/openucx/ucx
- Monitor Charm++ issue for updates :
- https://github.com/UIUC-PPL/charm/issues/2716
---
## Upcoming Changes
- NAMD 2.14 release imminent
- will have same performance as 2.14b2 release
- Waiting to merge after 2.14 release :
- Support for AMD's "HIP" GPU API
- improved tile-based AVX512 kernel
- https://charm.cs.illinois.edu/gerrit/q/project:namd
- NAMD 3.0 single-node alpha release
- Greatly improved CUDA performance
- Single process per replica, but supports mult-copy
---
## term
- module
`module help` – 「module」簡要使用說明
`module avail` – 查詢系統可用軟體
`module load` – 載入指定軟體
`module purge` – 清除已載入軟體
`module list` – 查詢已載入軟體
---
## problem about Benchmark
1. To get `FFTW3 tar file` & `HPC-X 2.6 tar file` failed
```bach=
wget http://www.fftw.org/fftw-3.3.8.tar.gz \ -O ./code/fftw-3.3.8.tar.gz
```
```
wget ./code/fftw-3.3.8.tar.gz : no such file or dictionary
```
:heavy_check_mark: **solution** : not to paste on directly, must type in yourself
2. failed when checkouting `Charm++ v6.10.1` & `NAMD 2.13` & `Untar HPC-X 2.6`
```bach=
...
GIT_WORK_TREE=./cluster/thor/code \
...
```
```bash=
APP_MPI_PATH=./cluster/thor/application/mpi \
...
```
```
fatal: Invalid path '/HPCAI/NAMD/cluster': No such file or directory
```
:heavy_check_mark: **sol** : mkdir code under NAMD/
```bach=
...
GIT_WORK_TREE=/HPCAI/NAMD/code \
...
```
3. build fftw
```bash=
CODE_NAME=fftw \
CODE_TAG=3.3.8 \
CODE_BASE_DIR=/HPCAI/NAMD/code \
CODE_DIR=$CODE_BASE_DIR/$CODE_NAME-$CODE_TAG \
INSTALL_DIR=/HPCAI/NAMD/application/libs/fftw \
CMAKE_PATH=/usr/bin/cmake \
GCC_PATH=/usr/bin/gcc \
NATIVE_GCC_FLAGS='"-march=native -mtune=native -mavx2 -msse4.2 -O3 -DNDEBUG"' \
GCC_FLAGS='"-march=broadwell -mtune=broadwell -mavx2 -msse4.2 -O3 -DNDEBUG"'\
bash -c '
CMD_REBUILD_CODE_DIR="rm -fr $CODE_DIR \
&& tar xf /HPCAI/NAMD/code/$CODE_NAME-$CODE_TAG.tar.gz -C $CODE_BASE_DIR"
### To build shared library (single precision) with GNU Compiler
BUILD_LABEL=$CODE_TAG-shared-gcc930-avx2-broadwell \
CMD_BUILD_SHARED_GCC=" \
mkdir $CODE_DIR/build-$BUILD_LABEL; \
cd $CODE_DIR/build-$BUILD_LABEL \
&& $CMAKE_PATH .. \
-DBUILD_SHARED_LIBS=ON -DENABLE_FLOAT=ON \
-DENABLE_OPENMP=OFF -DENABLE_THREADS=OFF \
-DCMAKE_C_COMPILER=$GCC_PATH -DCMAKE_CXX_COMPILER=$GCC_PATH \
-DENABLE_AVX2=ON -DENABLE_AVX=ON \
-DENABLE_SSE2=ON -DENABLE_SSE=ON \
-DCMAKE_INSTALL_PREFIX=$INSTALL_DIR/$BUILD_LABEL \
-DCMAKE_C_FLAGS_RELEASE=$GCC_FLAGS \
-DCMAKE_CXX_FLAGS_RELEASE=$GCC_FLAGS \
&& time -p make VERBOSE=1 V=1 install -j \
&& cd $INSTALL_DIR/$BUILD_LABEL && ln -s lib64 lib | tee $BUILD_LABEL.log "
eval $CMD_REBUILD_CODE_DIR;
eval $CMD_BUILD_SHARED_GCC &
wait
echo $CMD_REBUILD_CODE_DIR;
echo $CMD_BUILD_SHARED_GCC
' | tee fftw3buildlog 2>&1
```
- Run a small FFTW benchmark with/without SIMD
```bash=
./build-3.3.8-shared-icc20-avx2-broadwell/bench -o patient -o nosimd 10240
```
error message :

:heavy_check_mark: sol : we don't install icc, remove the icc
- another problem :
- `bash -c '...'`

4. build charm
```bash=
CODE_NAME=charm \
CODE_GIT_TAG=FETCH_HEAD \
CODE_GIT_TAG=v6.10.1 \
GIT_DIR=/HPCAI/NAMD/github/$CODE_NAME.git \
GIT_WORK_TREE=/HPCAI/NAMD/code \
CHARM_CODE_DIR=$GIT_WORK_TREE/$CODE_NAME-$CODE_GIT_TAG-20-08-11 \
CHARM_DIR=$CHARM_CODE_DIR \
APP_MPI_PATH=/HPCAI/NAMD/application/mpi \
HPCX_FILES_DIR=$APP_MPI_PATH/hpcx-v2.6.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64 \
HPCX_MPI_DIR=$HPCX_FILES_DIR/ompi \
HPCX_UCX_DIR=$HPCX_FILES_DIR/ucx \
UCX_DIR=$SELF_BUILT_DIR \
UCX_DIR=$HPCX_UCX_DIR \
GCC_DIR=/usr/bin \
NATIVE_GCC_FLAGS="-march=native -mtune=native -mavx2 -msse4.2 -O3 -DNDEBUG"\
GCC_FLAGS="-static-libstdc++ -static-libgcc -march=broadwell -mtune=broadwell -mavx2 -msse4.2 -O3 -DNDEBUG" \
bash -c '
CMD_REBUILD_BUILD_DIR="rm -fr $CHARM_DIR/built && mkdir $CHARM_DIR/built;"
### To build UCX executables with HPC-X OpenMPI + GCC8.4.0
CMD_BUILD_UCX_CHARM_GCC="
module purge && module load gcc \
&& cd $CHARM_DIR/built \
&& time -p ../build charm++ ucx-linux-x86_64 ompipmix \
-j --with-production \
--basedir=$HPCX_MPI_DIR \
--basedir=$UCX_DIR \
gcc gfortran $GCC_FLAGS \
&& module purge;"
### To build MPI executables with HPC-X OpenMPI + GCC8.4.0
CMD_BUILD_MPI_CHARM_GCC="
module purge && module load gcc \
&& . $HPCX_FILES_DIR/hpcx-mt-init-ompi.sh \
&& hpcx_load \
&& cd $CHARM_DIR/built \
&& time -p ../build charm++ mpi-linux-x86_64 \
-j --with-production \
--basedir=$HPCX_MPI_DIR \
gcc gfortran $GCC_FLAGS \
&& hpcx_unload && module purge;"
eval $CMD_REBUILD_BUILD_DIR;
eval $CMD_BUILD_UCX_CHARM_GCC &
eval $CMD_BUILD_MPI_CHARM_GCC &
wait
echo $CMD_REBUILD_BUILD_DIR;
echo $CMD_BUILD_UCX_CHARM_GCC;
echo $CMD_BUILD_MPI_CHARM_GCC;
' | tee charmbuildlog 2>&1
```
- when I use my account
- 
- when I use root
- 
:heavy_check_mark: sol : USE `bash -i` instead of `sudo bash`
5. build namd
```bash=
CHARM_ARCH_UCX_GCC=ucx-linux-x86_64-gfortran-ompipmix-gcc \
CHARM_ARCH_MPI_GCC=mpi-linux-x86_64-gfortran-gcc \
CODE_NAME=charm \
CODE_GIT_TAG=FETCH_HEAD \
CODE_GIT_TAG=v6.10.1 \
GIT_WORK_TREE=/HPCAI/NAMD/code \
CHARM_CODE_DIR=$GIT_WORK_TREE/$CODE_NAME-$CODE_GIT_TAG-20-08-11 \
CHARM_BASE=$CHARM_CODE_DIR/built \
FFTW3_LIB_DIR=/HPCAI/NAMD/application/libs/fftw \
GCC_FFTW3_LIB_DIR=$FFTW3_LIB_DIR/3.3.8-shared-gcc840-avx2-broadwell \
APP_MPI_DIR=/HPCAI/NAMD/application/mpi \
HPCX_FILES_DIR=$APP_MPI_DIR/hpcx-v2.6.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64 \
GCC_DIR=/usr \
GCC_PATH='"$GCC_DIR/bin/gcc "' \
GXX_PATH='"$GCC_DIR/bin/g++ -std=c++0x"' \
NATIVE_GCC_FLAGS='"-static-libstdc++ -static-libgcc -march=native -mtune=native -mavx2 -msse4.2 -O3 -DNDEBUG"' \
GCC_FLAGS='"-static-libstdc++ -static-libgcc -march=broadwell -mtune=broadwell -mavx2 -msse4.2 -O3 -DNDEBUG"' \
CODE_NAME=namd \
CODE_GIT_TAG=FETCH_HEAD \
GIT_DIR=/HPCAI/NAMD/github/$CODE_NAME.git \
GIT_WORK_TREE=/HPCAI/NAMD/code \
NAMD_CODE_DIR=$GIT_WORK_TREE/$CODE_NAME-$CODE_GIT_TAG-20-08-11 \
NAMD_DIR=$NAMD_CODE_DIR \
bash -c '
cd $NAMD_DIR;
CMD_BUILD_UCX_NAMD_GCC_FFTW3="
PATH=$GCC_DIR/bin:$PATH \
module purge && module load gcc && \
./config Linux-x86_64-g++ --with-memopt \
--charm-base $CHARM_BASE --charm-arch $CHARM_ARCH_UCX_GCC \
--with-fftw3 --fftw-prefix $GCC_FFTW3_LIB_DIR \
--cc $GCC_PATH --cc-opts $GCC_FLAGS \
--cxx $GXX_PATH --cxx-opts $GCC_FLAGS \
&& cd Linux-x86_64-g++ && time -p make -j \
&& cd $NAMD_DIR && mv Linux-x86_64-g++ Linux-x86_64-g++-ucx-fftw3 \
&& module purge"
CMD_BUILD_UCX_NAMD_GCC_MKL="
PATH=$GCC_DIR/bin:$PATH \
module purge && module load gcc && \
./config Linux-x86_64-g++ --with-memopt \
--charm-base $CHARM_BASE --charm-arch $CHARM_ARCH_UCX_GCC \
--with-mkl --mkl-prefix $MKL_DIR \
--cc $GCC_PATH --cc-opts $GCC_FLAGS \
--cxx $GXX_PATH --cxx-opts $GCC_FLAGS \
&& cd Linux-x86_64-g++ && time -p make -j \
&& cd $NAMD_DIR && mv Linux-x86_64-g++ Linux-x86_64-g++-ucx-mkl \
&& module purge"
CMD_BUILD_MPI_NAMD_GCC_FFTW3="
module purge && module load gcc && \
. $HPCX_FILES_DIR/hpcx-mt-init-ompi.sh && hpcx_load \
&& PATH=$GCC_DIR/bin:$PATH \
./config Linux-x86_64-g++ --with-memopt \
--charm-base $CHARM_BASE --charm-arch $CHARM_ARCH_MPI_GCC \
--with-fftw3 --fftw-prefix $GCC_FFTW3_LIB_DIR \
--cc $GCC_PATH --cc-opts $GCC_FLAGS \
--cxx $GXX_PATH --cxx-opts $GCC_FLAGS \
&& cd Linux-x86_64-g++ && time -p make -j \
&& cd $NAMD_DIR && mv Linux-x86_64-g++ Linux-x86_64-g++-mpi-fftw3 \
&& hpcx_unload && module purge"
CMD_BUILD_MPI_NAMD_GCC_MKL="
module purge && module load gcc && \
. $HPCX_FILES_DIR/hpcx-mt-init-ompi.sh && hpcx_load \
&& PATH=$GCC_DIR/bin:$PATH \
./config Linux-x86_64-g++ --with-memopt \
--charm-base $CHARM_BASE --charm-arch $CHARM_ARCH_MPI_GCC \
--with-mkl --mkl-prefix $MKL_DIR \
--cc $GCC_PATH --cc-opts $GCC_FLAGS \
--cxx $GXX_PATH --cxx-opts $GCC_FLAGS \
&& cd Linux-x86_64-g++ && time -p make -j \
&& cd $NAMD_DIR && mv Linux-x86_64-g++ Linux-x86_64-g++-mpi-mkl \
&& hpcx_unload && module purge"
eval $CMD_BUILD_UCX_NAMD_GCC_FFTW3
eval $CMD_BUILD_MPI_NAMD_GCC_FFTW3
eval $CMD_BUILD_UCX_NAMD_GCC_MKL;
eval $CMD_BUILD_MPI_NAMD_GCC_MKL;
wait
echo $CMD_BUILD_UCX_NAMD_GCC_FFTW3
echo $CMD_BUILD_MPI_NAMD_GCC_FFTW3
echo $CMD_BUILD_UCX_NAMD_GCC_MKL;
echo $CMD_BUILD_MPI_NAMD_GCC_MKL;
' | tee namdbuildlog 2>&1
```
- error message

- 問題點 (看不懂Q)
```bash=
./config Linux-x86_64-g++ --with-memopt \
--charm-base $CHARM_BASE --charm-arch $CHARM_ARCH_UCX_GCC \
--with-fftw3 --fftw-prefix $GCC_FFTW3_LIB_DIR \
--cc $GCC_PATH --cc-opts $GCC_FLAGS \
--cxx $GXX_PATH --cxx-opts $GCC_FLAGS \
```