# PIConGPU 基本知識與實作平行化運算
## 前言
:::info
nvidia container所提供的是很久之前的PIConGPU版本,相對於新版本的一些設定有所出入
而且nvidia container提供的環境無法直接安裝新版本的PIConGPU,比如nvidia container內的CUDA版本為9.0.0,但是新版PIConGPU要求是11.1.0+
https://github.com/ComputationalRadiationPhysics/picongpu/wiki/Restarting-a-simulation
要用最新版的只能自行安裝,詳見:
https://github.com/ComputationalRadiationPhysics/picongpu/blob/dev/INSTALL.rst
:::
## 前置作業
### 生成訓練用API key
Nvidia API
```
Zm1mNGp0bDB2MmQ0OTMxaGgzajlnN21pbWQ6ZjA3ODE5MGUtMGZiZS00MTUzLWJlYWItNWQxZThkNzZlNGQz
```
https://org.ngc.nvidia.com/setup/api-key
### 安裝所需環境
需要安裝docker
```
sudo apt install docker
```
安裝python
```
sudo apt install python3
```
安裝pip
```
sudo apt install python3-pip
```
安裝nvidia-container
```
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
```
```
sudo apt-get update
```
```
sudo apt-get install -y nvidia-container-toolkit
```
重啟docker
```
sudo systemctl restart docker
```
### 安裝image
https://catalog.ngc.nvidia.com/orgs/hpc/containers/picongpu#command-line-execution-with-nvidia-docker
```
docker login nvcr.io
```
```
sudo docker pull nvcr.io/hpc/picongpu:july2018patch
```
## 設置實驗環境(NV container)
在本機的創立一個叫做`runs`的資料夾
```
mkdir ~/runs
```
啟動container
```
sudo docker run --shm-size=16g --ulimit memlock=-1 -it --rm --gpus all nvcr.io/hpc/picongpu:july2018patch
```
* `--shm-size=16g` 設定sharememory大小
* `--ulimit memlock=-1`
啟動container,並將運行結果mount到本機
:::info
由於此處已經先將`$HOME/runs`的目錄建好,因此待會環境設定就不必再創立此資料夾
:::
```
sudo docker run -v ~/runs:/root/runs --shm-size=16g --ulimit memlock=-1 -it --rm --gpus all nvcr.io/hpc/picongpu:july2018patch mkdir root/runs
```
* `-v ~/runs:/root` 將本機`runs/`mount至虛擬機`/root/runs`
* `--shm-size=16g` 設定sharememory大小
* `--ulimit memlock=-1`
* `mkdir root/runs` 在建立docker就執行的指令
## 安裝PIConGPU
requirements:
* C++17 supporting compiler, e.g. GCC 9+ or Clang 11+
* CMake 3.22.0 or higher
* OpenMPI 1.7+ / MVAPICH2 1.8+ or similar
* Boost 1.74.0+ (program_options, atomic and header-only libs)
* CUDA 11.2+
* git( not required for the code, but for our workflows, 1.7.9.5 or higher
* rsync( not required for the code, but for our workflows
```
sudo apt-get install -y gcc-9 g++-9 build-essential cmake file cmake-curses-gui
libopenmpi-dev libboost-program-options-dev libboost-atomic-dev git rsync
```
安裝PIConGPU Source Code
```
git clone https://github.com/ComputationalRadiationPhysics/picongpu.git $HOME/src/picongpu
```
設定路徑
```
export PICSRC=$HOME/src/picongpu
export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
export PATH=$PATH:$PICSRC
export PATH=$PATH:$PICSRC/bin
export PATH=$PATH:$PICSRC/src/tools/bin
export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
```
安裝Library
* libpng1.2.9+ (requires zlib)
* pngwriter 0.7.0+ (requires libpng, zlib, and optional freetype)
* openPMD API 0.15.0+
* c-blosc for openPMD API with ADIOS2
* FFTW3
* (ISSAC)https://github.com/ComputationalRadiationPhysics/isaac/blob/dev/INSTALL.md
libpng1.2.9+ (requires zlib)
```
sudo apt-get install libpng-dev
```
pngwriter from source
```
mkdir -p ~/src ~/lib
git clone -b 0.7.0 https://github.com/pngwriter/pngwriter.git ~/src/pngwriter/
cd ~/src/pngwriter
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=$HOME/lib/pngwriter ..
make install
```
environment: (assumes install from source in $HOME/lib/pngwriter)
```
export CMAKE_PREFIX_PATH=$HOME/lib/pngwriter:$CMAKE_PREFIX_PATH
```
openPMD API安裝
```
mkdir -p ~/src ~/lib
git clone -b 0.15.0 https://github.com/openPMD/openPMD-api.git ~/src/openPMD-api
cd ~/src/openPMD-api
mkdir build && cd build
cmake .. -DopenPMD_USE_MPI=ON -DCMAKE_INSTALL_PREFIX=~/lib/openPMD-api
make -j $(nproc) install
```
environment:* (assumes install from source in $HOME/lib/openPMD-api)
```
export CMAKE_PREFIX_PATH="$HOME/lib/openPMD-api:$CMAKE_PREFIX_PATH"
```
FFTW3
```
mkdir -p ~/src ~/lib
cd ~/src
wget -O fftw-3.3.10.tar.gz http://fftw.org/fftw-3.3.10.tar.gz
tar -xf fftw-3.3.10.tar.gz
cd fftw-3.3.10
./configure --prefix="$FFTW_ROOT"
make
make install
```
environment: (assumes install from source in $HOME/lib/fftw-3.3.10)
```
export FFTW3_ROOT =$HOME/lib/fftw-3.3.10
export LD_LIBRARY_PATH=$FFTW3_ROOT/lib:$LD_LIBRARY_PATH
```
### 環境設定
https://github.com/ComputationalRadiationPhysics/picongpu/blob/dev/USAGE.rst
For a first test you can also use your home directory:
```
export SCRATCH=$HOME
```
We need a few directories to structure our workflow:
PIConGPU input files
```
mkdir $HOME/picInputs
```
PIConGPU simulation output
```
mkdir $SCRATCH/runs
```
clone the LWFA example to $HOME/picInputs/myLWFA
```
pic-create $PIC_EXAMPLES/LaserWakefield $HOME/picInputs/myLWFA
```
:::info
他有提供不同的幾個example在`$PIC_EXAMPLES/`目錄裡
```shell
root@98b2f2362c70:~# ls $PIC_EXAMPLES
Bremsstrahlung Bunch Empty FoilLCT KelvinHelmholtz LaserWakefield SingleParticleTest ThermalTest WarmCopper WeibelTransverse limits.yml
```
:::
switch to your input directory
```
cd $HOME/picInputs/myLWFA
```
找到自己所用顯卡的compute capability,用以下連結尋找
```bash
# example for running efficiently on a K80 GPU with compute capability 7.5
pic-configure -b "cuda:75" $HOME/picInputs/myLWFA
```
```
pic-build
```
Geforce RTX 2080 Ti 7.5
| 顯卡型號 | capability |
| -------- | -------- |
| Geforce RTX 2080 Ti | 7.5 |
https://developer.nvidia.com/cuda-gpus
### 執行
https://github.com/ComputationalRadiationPhysics/picongpu/wiki/Starting-a-simulation
執行simulation
```
tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_001
```
* `etc/picongpu/1.cfg` 為指定的設定檔,如果為2張的話用`etc/picongpu/2.cfg`,以此類推
* `-t etc/picongpu/bash/mpiexec.tpl` 為要執行的simulation
* `$SCRATCH/runs/lwfa_001` output存放位置
* `/runs/lwfa_001/simOutput/pngElectronsYX` 模擬結果的png會存在此資料夾
:::info
不知道為甚麼`simOutput/`裡面的權限只有開給root,因此需要自行更改權限
在本機更改權限
```
sudo chmod -R 777 ~/runs/runs
```
:::
## 參數設定(run-time)
https://github.com/ComputationalRadiationPhysics/picongpu/wiki/Configure-an-example
TBG_gpu_x/y/z
* 每個dimension會使用的GPU數量
* 總GPU數量 = TBG_gpu_x * y * z
```
TBG_devices_x=2
TBG_devices_y=1
TBG_devices_z=1
```
TBG_gridSize
* 每個象限的grid大小
```bash
# note: the number of cells needs to be an exact multiple of a supercell
# and has to be at least 3 supercells per device,
# the size of a supercell (in cells) is defined in `memory.param`
TBG_gridSize="192 1024 12"
```
:::info
nvidia container的PIConGPU版本並未有`memory.param`檔案
:::
TBG_steps
```
TBG_steps="2048"
```
### 透過指令的方式指定參數
```
tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_001 -d 2 1 1 -g 256 512 256 -s 1024
```
```
-d 2 2 2 -g 256 512 256 -s 1024
```
* 使用 2x2x2 = 8張GPU
* `grid configuration`: 256 512 256
* 執行 1024 steps
## 參數設定(compile-time)
> [設定檔介紹](https://picongpu.readthedocs.io/en/latest/usage/param.html)
> [github範例設定檔](https://github.com/ComputationalRadiationPhysics/picongpu/tree/dev/include/picongpu/param)
:::info
我看他的compile-time的設定涉及到的比較多是專業的物理領域,像模型、算法等,比較少是針對系統運算優化的地方
:::
以下我列了幾個比較可能修改的設定
* fieldSolver.param
* Configure the field solver.
* species.param
* The current solver can be set in species.param
* https://picongpu.readthedocs.io/en/latest/usage/param/particles/current.html#usage-params-core-currentdeposition
* density.param
* The number and sampling of macroparticles per cell are defined independently of a density profile.
* precision.param
* Define the precision of typically used floating point types in the simulation.
* 64bit or 32bit
* memory.param (nv-container版本尚未有)
* Define low-level memory settings for compute devices.
* mallocMC.param (nv-container版本尚未有)
* Fine-tuning of the particle heap for GPUs
:::warning
compile-time參數經過修改後需要從新編譯
> *> If you add a new custom .param file in a PIConGPU setup which you previously compiled, you must delete the `.build/` directory in your setup, before compiling again. Otherwise PIConGPU will use the previously cached default .param file instead of your custom file.*
:::
## 實際案例比較
原始設定(32Bit floating point numbers):

使用64Bit floating point numbers:
## checkpoint
將設定檔更改至如下,下面例子為`etc/picongpu/3.cfg`
* `--checkpoint.period 1000` 每1000次記錄一次checkpoint
* `--checkpoint.restart` 從上一次的checkpoint開始
```bash=
TBG_checkpoint="--checkpoint.period 1000"
TBG_plugins="!TBG_pngYX \
!TBG_e_histogram \
!TBG_e_PSypy \
!TBG_e_macroCount \
!TBG_hdf5 \
!TBG_checkpoint"
```
:::info
建議把`TBG_e_PSypy`設定成1000,不然會很佔磁碟空間問題
```
TBG_e_PSypy="--e_phaseSpace.period 1000 \
```

:::

https://github.com/ComputationalRadiationPhysics/picongpu/blob/dev/docs/TBG_macros.cfg
執行並紀錄checkpoint
```
tbg -s bash -c etc/picongpu/3.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_004
```

我們想從存放於`lwfa_004`中的checkpoint繼續模擬,設定檔必須更改如下(建議可以直接新建一個設定檔):
**con.cfg**
* `--checkpoint.restart.directory` 為checkpoint目錄的位置
```bash=
TBG_checkpoint="--checkpoint.period 1000"
TBG_restart="--checkpoint.restart \
--checkpoint.restart.directory $SCRATCH/runs/lwfa_004/simOutput/checkpoints"
TBG_plugins="!TBG_pngYX \
!TBG_e_histogram \
!TBG_e_PSypy \
!TBG_e_macroCount \
!TBG_hdf5 \
!TBG_checkpoint \
!TBG_restart"
```
:::info
由於step也是從上次的開始,因此記得要更改step的數量
:::
從checkpoint 繼續模擬
```
tbg -s bash -c etc/picongpu/con.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_005
```

## 資料的視覺化(本機)
在模擬過程中會生成`.h5`,可以讀取該檔案進行模擬
### 使用yt來讀取h5資料
https://gist.github.com/C0nsultant/5808d5f61b271b8f969d5c09f5ca91dc
> [yt codebooks](https://yt-project.org/doc/cookbook/index.html)
下載yt範例
```
wget https://gist.github.com/C0nsultant/5808d5f61b271b8f969d5c09f5ca91dc/archive/53e0015a0ba7aedf80e236b97fcec1b4f3f9b383.zip -O yt.zip ;
unzip yt.zip
```
安裝yt library
```
pip install --user yt
```
安裝ipykernel
```
pip install ipykernel
```
安裝h5py
```
pip install h5py
```
### 使用vscode的遠端ssh
使用vscode的遠端ssh,執行和修改`YT.ipynb`
記得要在遠端安裝`juypter`和`python`延伸模組

範例notebook有誤,add_field()少了一個參數,要補上
```python=
# These will both yield the same result (for 3D datasets)
f.add_field(name=('openPMD', 'E_magnitude'),
function=_limited_magni(f.dimensionality),
sampling_type="local",
units="kg*m/(A*s**3)",
force_override=True)
f.add_field(name=('openPMD', 'E_magnitude'),
function=_magni,
sampling_type="local",
units="kg*m/(A*s**3)",
force_override=True)
```
> [官方說明文件 add_field()](https://yt-project.org/doc/reference/api/yt.data_objects.static_output.html#yt.data_objects.static_output.Dataset.add_field)

需要更改的地方如下,下方的`add_field`也要記得更改
* 將`normed`項刪除
* 將bins換成以下`bins=[int(element) for element in f.domain_dimensions]`
```python=
def _binned_parts(field, data):
hist, edges = np.histogramdd([
data[('all', 'particle_position_x')],
data[('all', 'particle_position_y')],
data[('all', 'particle_position_z')]
], bins=[int(element) for element in f.domain_dimensions], weights=data[('all', 'particle_weighting')])
return hist.flatten()
```

## 視覺化範例和統整(本機)
生成資料的cfg檔
```bash
TBG_devices_x=1
TBG_devices_y=2
TBG_devices_z=1
TBG_gridSize="128 512 128"
TBG_steps="5000"
.......
TBG_hdf5="--hdf5.period 250 \
...............
```
輸入指令,進行模擬
```
cd ~/picInputs/myLWFA
tbg -s bash -c etc/picongpu/3.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_001
```
將模擬的結果從虛擬機複製到本機(如果有則不用)
```
sudo docker cp 98b2f2362c70:/root/runs ~/runs
```
將`YT.ioynb`中的內容中的路徑修改

### Analyzing mesh data
```python=
efield_projection = yt.ProjectionPlot(f, 'z', ('openPMD', 'E_x'))
efield_projection.set_unit(('openPMD', 'E_x'), 'T*m**2/s')
efield_projection.annotate_particles((2e-6, 'm'), ptype='io')
efield_projection.show()
```

```python=
efield_slice = yt.SlicePlot(f, 'z', ('openPMD', 'E_magnitude'))
efield_slice.show()
```

### NOTE
The following will not generally work on all datasets! If you realiably want to do something along these lines, you will have to insert the binned field into your provided hdf5 file. That is easily done in a few lines of python, though.
```python=
def _binned_parts(field, data):
hist, edges = np.histogramdd([
data[('all', 'particle_position_x')],
data[('all', 'particle_position_y')],
data[('all', 'particle_position_z')]
], bins=[int(element) for element in f.domain_dimensions], weights=data[('all', 'particle_weighting')])
return hist.flatten()
```
```python=
f.add_field(name=('openPMD', 'all_particle_density'),
function=_binned_parts,
sampling_type="local",
units="",
force_override=True)
```
```python=
dens = yt.ProjectionPlot(f, 'z', 'all_particle_density')
dens.show()
```

### Note
If yt has been compiled with openmpi, scene rendering will automatically be parallelized, even if you did not enable parallelism by hand.
```python=
sc = yt.create_scene(f, 'E_magnitude')
sc.show(sigma_clip=2.0)
```

```python=
cam = sc.camera
sc.camera.resolution = (1080, 1080)
frame = 0
#sc.save('camera_movement_%04i.png' % frame)
# Rotate by 180 degrees over 5 frames
for _ in cam.iter_rotate(np.pi, 5):
sc.render()
sc.show()
#sc.save('camera_movement_%04i.png' % frame)
frame += 1
```




