PIConGPU(等離子體的動力學模擬) 基本知識與實作平行化運算

# PIConGPU 基本知識與實作平行化運算 ## 前言 :::info nvidia container所提供的是很久之前的PIConGPU版本，相對於新版本的一些設定有所出入而且nvidia container提供的環境無法直接安裝新版本的PIConGPU，比如nvidia container內的CUDA版本為9.0.0，但是新版PIConGPU要求是11.1.0+ https://github.com/ComputationalRadiationPhysics/picongpu/wiki/Restarting-a-simulation 要用最新版的只能自行安裝，詳見: https://github.com/ComputationalRadiationPhysics/picongpu/blob/dev/INSTALL.rst ::: ## 前置作業 ### 生成訓練用API key Nvidia API ``` Zm1mNGp0bDB2MmQ0OTMxaGgzajlnN21pbWQ6ZjA3ODE5MGUtMGZiZS00MTUzLWJlYWItNWQxZThkNzZlNGQz ``` https://org.ngc.nvidia.com/setup/api-key ### 安裝所需環境需要安裝docker ``` sudo apt install docker ``` 安裝python ``` sudo apt install python3 ``` 安裝pip ``` sudo apt install python3-pip ``` 安裝nvidia-container ``` curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list ``` ``` sudo apt-get update ``` ``` sudo apt-get install -y nvidia-container-toolkit ``` 重啟docker ``` sudo systemctl restart docker ``` ### 安裝image https://catalog.ngc.nvidia.com/orgs/hpc/containers/picongpu#command-line-execution-with-nvidia-docker ``` docker login nvcr.io ``` ``` sudo docker pull nvcr.io/hpc/picongpu:july2018patch ``` ## 設置實驗環境(NV container) 在本機的創立一個叫做`runs`的資料夾 ``` mkdir ~/runs ``` 啟動container ``` sudo docker run --shm-size=16g --ulimit memlock=-1 -it --rm --gpus all nvcr.io/hpc/picongpu:july2018patch ``` * `--shm-size=16g` 設定sharememory大小 * `--ulimit memlock=-1` 啟動container，並將運行結果mount到本機 :::info 由於此處已經先將`$HOME/runs`的目錄建好，因此待會環境設定就不必再創立此資料夾 ::: ``` sudo docker run -v ~/runs:/root/runs --shm-size=16g --ulimit memlock=-1 -it --rm --gpus all nvcr.io/hpc/picongpu:july2018patch mkdir root/runs ``` * `-v ~/runs:/root` 將本機`runs/`mount至虛擬機`/root/runs` * `--shm-size=16g` 設定sharememory大小 * `--ulimit memlock=-1` * `mkdir root/runs` 在建立docker就執行的指令 ## 安裝PIConGPU requirements: * C++17 supporting compiler, e.g. GCC 9+ or Clang 11+ * CMake 3.22.0 or higher * OpenMPI 1.7+ / MVAPICH2 1.8+ or similar * Boost 1.74.0+ (program_options, atomic and header-only libs) * CUDA 11.2+ * git( not required for the code, but for our workflows, 1.7.9.5 or higher * rsync( not required for the code, but for our workflows ``` sudo apt-get install -y gcc-9 g++-9 build-essential cmake file cmake-curses-gui libopenmpi-dev libboost-program-options-dev libboost-atomic-dev git rsync ``` 安裝PIConGPU Source Code ``` git clone https://github.com/ComputationalRadiationPhysics/picongpu.git $HOME/src/picongpu ``` 設定路徑 ``` export PICSRC=$HOME/src/picongpu export PIC_EXAMPLES=$PICSRC/share/picongpu/examples export PATH=$PATH:$PICSRC export PATH=$PATH:$PICSRC/bin export PATH=$PATH:$PICSRC/src/tools/bin export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH ``` 安裝Library * libpng1.2.9+ (requires zlib) * pngwriter 0.7.0+ (requires libpng, zlib, and optional freetype) * openPMD API 0.15.0+ * c-blosc for openPMD API with ADIOS2 * FFTW3 * (ISSAC)https://github.com/ComputationalRadiationPhysics/isaac/blob/dev/INSTALL.md libpng1.2.9+ (requires zlib) ``` sudo apt-get install libpng-dev ``` pngwriter from source ``` mkdir -p ~/src ~/lib git clone -b 0.7.0 https://github.com/pngwriter/pngwriter.git ~/src/pngwriter/ cd ~/src/pngwriter mkdir build && cd build cmake -DCMAKE_INSTALL_PREFIX=$HOME/lib/pngwriter .. make install ``` environment: (assumes install from source in $HOME/lib/pngwriter) ``` export CMAKE_PREFIX_PATH=$HOME/lib/pngwriter:$CMAKE_PREFIX_PATH ``` openPMD API安裝 ``` mkdir -p ~/src ~/lib git clone -b 0.15.0 https://github.com/openPMD/openPMD-api.git ~/src/openPMD-api cd ~/src/openPMD-api mkdir build && cd build cmake .. -DopenPMD_USE_MPI=ON -DCMAKE_INSTALL_PREFIX=~/lib/openPMD-api make -j $(nproc) install ``` environment:* (assumes install from source in $HOME/lib/openPMD-api) ``` export CMAKE_PREFIX_PATH="$HOME/lib/openPMD-api:$CMAKE_PREFIX_PATH" ``` FFTW3 ``` mkdir -p ~/src ~/lib cd ~/src wget -O fftw-3.3.10.tar.gz http://fftw.org/fftw-3.3.10.tar.gz tar -xf fftw-3.3.10.tar.gz cd fftw-3.3.10 ./configure --prefix="$FFTW_ROOT" make make install ``` environment: (assumes install from source in $HOME/lib/fftw-3.3.10) ``` export FFTW3_ROOT =$HOME/lib/fftw-3.3.10 export LD_LIBRARY_PATH=$FFTW3_ROOT/lib:$LD_LIBRARY_PATH ``` ### 環境設定 https://github.com/ComputationalRadiationPhysics/picongpu/blob/dev/USAGE.rst For a first test you can also use your home directory: ``` export SCRATCH=$HOME ``` We need a few directories to structure our workflow: PIConGPU input files ``` mkdir $HOME/picInputs ``` PIConGPU simulation output ``` mkdir $SCRATCH/runs ``` clone the LWFA example to $HOME/picInputs/myLWFA ``` pic-create $PIC_EXAMPLES/LaserWakefield $HOME/picInputs/myLWFA ``` :::info 他有提供不同的幾個example在`$PIC_EXAMPLES/`目錄裡 ```shell root@98b2f2362c70:~# ls $PIC_EXAMPLES Bremsstrahlung Bunch Empty FoilLCT KelvinHelmholtz LaserWakefield SingleParticleTest ThermalTest WarmCopper WeibelTransverse limits.yml ``` ::: switch to your input directory ``` cd $HOME/picInputs/myLWFA ``` 找到自己所用顯卡的compute capability，用以下連結尋找 ```bash # example for running efficiently on a K80 GPU with compute capability 7.5 pic-configure -b "cuda:75" $HOME/picInputs/myLWFA ``` ``` pic-build ``` Geforce RTX 2080 Ti 7.5 | 顯卡型號 | capability | | -------- | -------- | | Geforce RTX 2080 Ti | 7.5 | https://developer.nvidia.com/cuda-gpus ### 執行 https://github.com/ComputationalRadiationPhysics/picongpu/wiki/Starting-a-simulation 執行simulation ``` tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_001 ``` * `etc/picongpu/1.cfg` 為指定的設定檔，如果為2張的話用`etc/picongpu/2.cfg`，以此類推 * `-t etc/picongpu/bash/mpiexec.tpl` 為要執行的simulation * `$SCRATCH/runs/lwfa_001` output存放位置 * `/runs/lwfa_001/simOutput/pngElectronsYX` 模擬結果的png會存在此資料夾 :::info 不知道為甚麼`simOutput/`裡面的權限只有開給root，因此需要自行更改權限在本機更改權限 ``` sudo chmod -R 777 ~/runs/runs ``` ::: ## 參數設定(run-time) https://github.com/ComputationalRadiationPhysics/picongpu/wiki/Configure-an-example TBG_gpu_x/y/z * 每個dimension會使用的GPU數量 * 總GPU數量 = TBG_gpu_x * y * z ``` TBG_devices_x=2 TBG_devices_y=1 TBG_devices_z=1 ``` TBG_gridSize * 每個象限的grid大小 ```bash # note: the number of cells needs to be an exact multiple of a supercell # and has to be at least 3 supercells per device, # the size of a supercell (in cells) is defined in `memory.param` TBG_gridSize="192 1024 12" ``` :::info nvidia container的PIConGPU版本並未有`memory.param`檔案 ::: TBG_steps ``` TBG_steps="2048" ``` ### 透過指令的方式指定參數 ``` tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_001 -d 2 1 1 -g 256 512 256 -s 1024 ``` ``` -d 2 2 2 -g 256 512 256 -s 1024 ``` * 使用 2x2x2 = 8張GPU * `grid configuration`: 256 512 256 * 執行 1024 steps ## 參數設定(compile-time) > [設定檔介紹](https://picongpu.readthedocs.io/en/latest/usage/param.html) > [github範例設定檔](https://github.com/ComputationalRadiationPhysics/picongpu/tree/dev/include/picongpu/param) :::info 我看他的compile-time的設定涉及到的比較多是專業的物理領域，像模型、算法等，比較少是針對系統運算優化的地方 ::: 以下我列了幾個比較可能修改的設定 * fieldSolver.param * Configure the field solver. * species.param * The current solver can be set in species.param * https://picongpu.readthedocs.io/en/latest/usage/param/particles/current.html#usage-params-core-currentdeposition * density.param * The number and sampling of macroparticles per cell are defined independently of a density profile. * precision.param * Define the precision of typically used floating point types in the simulation. * 64bit or 32bit * memory.param (nv-container版本尚未有) * Define low-level memory settings for compute devices. * mallocMC.param (nv-container版本尚未有) * Fine-tuning of the particle heap for GPUs :::warning compile-time參數經過修改後需要從新編譯 > *> If you add a new custom .param file in a PIConGPU setup which you previously compiled, you must delete the `.build/` directory in your setup, before compiling again. Otherwise PIConGPU will use the previously cached default .param file instead of your custom file.* ::: ## 實際案例比較原始設定(32Bit floating point numbers): ![image](https://hackmd.io/_uploads/HyPJGDrwA.png) 使用64Bit floating point numbers: ## checkpoint 將設定檔更改至如下，下面例子為`etc/picongpu/3.cfg` * `--checkpoint.period 1000` 每1000次記錄一次checkpoint * `--checkpoint.restart` 從上一次的checkpoint開始 ```bash= TBG_checkpoint="--checkpoint.period 1000" TBG_plugins="!TBG_pngYX \ !TBG_e_histogram \ !TBG_e_PSypy \ !TBG_e_macroCount \ !TBG_hdf5 \ !TBG_checkpoint" ``` :::info 建議把`TBG_e_PSypy`設定成1000，不然會很佔磁碟空間問題 ``` TBG_e_PSypy="--e_phaseSpace.period 1000 \ ``` ![image](https://hackmd.io/_uploads/ryoAeO-PR.png) ::: ![image](https://hackmd.io/_uploads/ByIK4vWDC.png) https://github.com/ComputationalRadiationPhysics/picongpu/blob/dev/docs/TBG_macros.cfg 執行並紀錄checkpoint ``` tbg -s bash -c etc/picongpu/3.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_004 ``` ![image](https://hackmd.io/_uploads/SJ8LHubPC.png) 我們想從存放於`lwfa_004`中的checkpoint繼續模擬，設定檔必須更改如下(建議可以直接新建一個設定檔): **con.cfg** * `--checkpoint.restart.directory` 為checkpoint目錄的位置 ```bash= TBG_checkpoint="--checkpoint.period 1000" TBG_restart="--checkpoint.restart \ --checkpoint.restart.directory $SCRATCH/runs/lwfa_004/simOutput/checkpoints" TBG_plugins="!TBG_pngYX \ !TBG_e_histogram \ !TBG_e_PSypy \ !TBG_e_macroCount \ !TBG_hdf5 \ !TBG_checkpoint \ !TBG_restart" ``` :::info 由於step也是從上次的開始，因此記得要更改step的數量 ::: 從checkpoint 繼續模擬 ``` tbg -s bash -c etc/picongpu/con.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_005 ``` ![image](https://hackmd.io/_uploads/BkR_HdbwC.png) ## 資料的視覺化(本機) 在模擬過程中會生成`.h5`，可以讀取該檔案進行模擬 ### 使用yt來讀取h5資料 https://gist.github.com/C0nsultant/5808d5f61b271b8f969d5c09f5ca91dc > [yt codebooks](https://yt-project.org/doc/cookbook/index.html) 下載yt範例 ``` wget https://gist.github.com/C0nsultant/5808d5f61b271b8f969d5c09f5ca91dc/archive/53e0015a0ba7aedf80e236b97fcec1b4f3f9b383.zip -O yt.zip ; unzip yt.zip ``` 安裝yt library ``` pip install --user yt ``` 安裝ipykernel ``` pip install ipykernel ``` 安裝h5py ``` pip install h5py ``` ### 使用vscode的遠端ssh 使用vscode的遠端ssh，執行和修改`YT.ipynb` 記得要在遠端安裝`juypter`和`python`延伸模組 ![image](https://hackmd.io/_uploads/HkNN_FZw0.png) 範例notebook有誤，add_field()少了一個參數，要補上 ```python= # These will both yield the same result (for 3D datasets) f.add_field(name=('openPMD', 'E_magnitude'), function=_limited_magni(f.dimensionality), sampling_type="local", units="kg*m/(A*s**3)", force_override=True) f.add_field(name=('openPMD', 'E_magnitude'), function=_magni, sampling_type="local", units="kg*m/(A*s**3)", force_override=True) ``` > [官方說明文件 add_field()](https://yt-project.org/doc/reference/api/yt.data_objects.static_output.html#yt.data_objects.static_output.Dataset.add_field) ![image](https://hackmd.io/_uploads/rkKciFWPR.png) 需要更改的地方如下，下方的`add_field`也要記得更改 * 將`normed`項刪除 * 將bins換成以下`bins=[int(element) for element in f.domain_dimensions]` ```python= def _binned_parts(field, data): hist, edges = np.histogramdd([ data[('all', 'particle_position_x')], data[('all', 'particle_position_y')], data[('all', 'particle_position_z')] ], bins=[int(element) for element in f.domain_dimensions], weights=data[('all', 'particle_weighting')]) return hist.flatten() ``` ![image](https://hackmd.io/_uploads/SyJCk9WDR.png) ## 視覺化範例和統整(本機) 生成資料的cfg檔 ```bash TBG_devices_x=1 TBG_devices_y=2 TBG_devices_z=1 TBG_gridSize="128 512 128" TBG_steps="5000" ....... TBG_hdf5="--hdf5.period 250 \ ............... ``` 輸入指令，進行模擬 ``` cd ~/picInputs/myLWFA tbg -s bash -c etc/picongpu/3.cfg -t etc/picongpu/bash/mpiexec.tpl $SCRATCH/runs/lwfa_001 ``` 將模擬的結果從虛擬機複製到本機(如果有則不用) ``` sudo docker cp 98b2f2362c70:/root/runs ~/runs ``` 將`YT.ioynb`中的內容中的路徑修改 ![image](https://hackmd.io/_uploads/S1NVK9bPR.png) ### Analyzing mesh data ```python= efield_projection = yt.ProjectionPlot(f, 'z', ('openPMD', 'E_x')) efield_projection.set_unit(('openPMD', 'E_x'), 'T*m**2/s') efield_projection.annotate_particles((2e-6, 'm'), ptype='io') efield_projection.show() ``` ![image](https://hackmd.io/_uploads/SyqIJjbwR.png) ```python= efield_slice = yt.SlicePlot(f, 'z', ('openPMD', 'E_magnitude')) efield_slice.show() ``` ![image](https://hackmd.io/_uploads/ryL6JjbDA.png) ### NOTE The following will not generally work on all datasets! If you realiably want to do something along these lines, you will have to insert the binned field into your provided hdf5 file. That is easily done in a few lines of python, though. ```python= def _binned_parts(field, data): hist, edges = np.histogramdd([ data[('all', 'particle_position_x')], data[('all', 'particle_position_y')], data[('all', 'particle_position_z')] ], bins=[int(element) for element in f.domain_dimensions], weights=data[('all', 'particle_weighting')]) return hist.flatten() ``` ```python= f.add_field(name=('openPMD', 'all_particle_density'), function=_binned_parts, sampling_type="local", units="", force_override=True) ``` ```python= dens = yt.ProjectionPlot(f, 'z', 'all_particle_density') dens.show() ``` ![image](https://hackmd.io/_uploads/SysrljZwR.png) ### Note If yt has been compiled with openmpi, scene rendering will automatically be parallelized, even if you did not enable parallelism by hand. ```python= sc = yt.create_scene(f, 'E_magnitude') sc.show(sigma_clip=2.0) ``` ![image](https://hackmd.io/_uploads/BkrKliZDA.png) ```python= cam = sc.camera sc.camera.resolution = (1080, 1080) frame = 0 #sc.save('camera_movement_%04i.png' % frame) # Rotate by 180 degrees over 5 frames for _ in cam.iter_rotate(np.pi, 5): sc.render() sc.show() #sc.save('camera_movement_%04i.png' % frame) frame += 1 ``` ![image](https://hackmd.io/_uploads/rkw5liWvA.png) ![image](https://hackmd.io/_uploads/Bka5eo-DC.png) ![image](https://hackmd.io/_uploads/H1-jeoZwR.png) ![image](https://hackmd.io/_uploads/HJvixobwA.png) ![image](https://hackmd.io/_uploads/rJnsgiWPC.png)