HPC-I 2025 模擬比賽

# HPC-I 2025 模擬比賽 - [toc] ## 比賽時程 - 5/7 (三) 題目公布 - 5/21 (三) FAMIL build 截止、公布 FAMIL build script - 5/30 (五) 繳交結果 - 6/4 (三) 繳交簡報 & 報告 ## 比賽環境 - 由於現有全部VM皆執行於同一台實體主機上，大家同時使用會嚴重瓜分性能 - 提供由apollo cluster分離出的主機供同學自行設定使用 - 網路資訊 Subnet: 10.25.xx.0/29 (xx 是隊伍編號1~17) IP: 10.25.xx.1 or 10.25.xx.2 (xx 是隊伍編號1~17) Gateway: 10.25.xx.6 (xx 是隊伍編號1~17) DNS: 8.8.8.8 - BMC存取(因為節點皆是實體主機，需透過BMC遠端存取介面進行操作) IP: 10.0.114.yy (yy 是節點編號，請參閱分配表) User: root Password: root 由於節點機器老舊，BMC Console 只有 java viewer，無法於新的電腦中正常開啟解決方案: https://github.com/NTHU-LSALAB/megarac-aster-ikvm 請先於自己電腦或VM上安裝docker，並照github上的教學即可正常存取BMC Console - OS安裝由於透過BMC遠端掛載ISO檔會非常緩慢我們已經於網段上預先設定好DHCP與PXE Server 機器第一次開機時會自動引導進入ubuntu installer 若沒有自動啟動或安裝出現問題，可以重新啟動後於開機畫面出現時按F12，即可強制當次由網路進行引導 - InfiniBand ``` # Install dependencies sudo apt install rdma-core ibverbs-utils infiniband-diags perftest # Load driver sudo modprobe mlx4_core sudo modprobe mlx4_en # Check Infiniband status ibv_devinfo # Use nano or vim to create /etc/netplan/30-ib-network.yaml network: version: 2 ethernets: ibp3s0: addresses: - "192.168.xx.1/30" # Remember to modify this line on each node (xx 是隊伍編號1~17) # Apply network configurations sudo netplan apply ``` team18範例(請勿直接照抄，會與助教保留的機器網路相衝) ![image](https://hackmd.io/_uploads/S1IQ0jzegx.png) ![image](https://hackmd.io/_uploads/rygF-3Xgge.png) ## 題目配分 | Application | Score | | ----------------------- | ----- | | HPL | 10% | | Code_saturne | 20% | | FAMIL | 20% | | Gromacs | 20% | | Presentation | 30% | # Cluster Setup ## Instructions - Install Linux - Network (`ping 1.1.1.1`) - NFS (`/home` and `/opt`) - MPI (Any MPI, e.g. OpenMPI, MPICH, Intel MPI) - mpi_test.c ```c #include <mpi.h> #include <stdio.h> #include <unistd.h> int main(int argc, char** argv) { int rank, size, name_len; char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_processor_name(processor_name, &name_len); printf("Hello from rank %d of %d on %s\n", rank, size, processor_name); MPI_Finalize(); return 0; } ``` - Run on 24 ranks on both node - e.g. `mpirun -np 24 -host head:12,work:12 mpi_test` # HPL (10%) ## Instructions - [HPL](https://www.netlib.org/benchmark/hpl/) ## Scoring - 10%: FLOPs (Higher is better) ## Sample Output ``` $ cat HPL-dgx1_v100x8-results-dgx07.1016.131511 - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR00C2L8 149000 96 2 4 71.54 3.083e+04 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0028120 ...... PASSED ================================================================================ ``` ## Submission - `HPL.dat` - `HPL.out` - The above HPL result - `HPL.pdf` (Optional) - Additional notes for grading # code_saturne (20%) ## Instructions - [Task description](https://hackmd.io/@rainchi15/SyPJajmgeg) ## Scoring - Before 2025-05-30 - [ ] (35%) Successfully run on 2 nodes and get results - [ ] (25%) Visualize the results on the laptop and submit video - [ ] (20%) Change setup.xml file and find the best timings - [ ] (20%) Experiment with MPI and OpenMP to find the best timings --- - (35%) Successfully run on 2 nodes and get results - Submit run_solver.log, timer_stats.csv and performance.log - Show results on presentation - (25%) Visualize the results on the laptop and submit video - Submit video file - (20%) Change setup.xml file and find the best timings - Submit run_solver.log, timer_stats.csv and performance.log - Show results/comparisons on presentation - (20%) Experiment with MPI and OpenMP to find the best timings - Submit run_solver.log, timer_stats.csv and performance.log - Show results/comparisons on presentation 將檔案分類and命名後打包成zip上傳即可，只要能看得懂就行 # FAMIL (20%) ## Instructions - [Task description](/soA1hFpaQpCFKotjc9wyzA) - [Source](https://365nthu-my.sharepoint.com/:u:/g/personal/112062334_office365_nthu_edu_tw/EZYpZGqbFFRHo0JCgDkcXO4BoRPljhTkfsjqYNrPVS7GPg) (Testcases are included in the source code) ## Scoring - Before 2025-05-21 - [ ] Build dependencies (30%) - [ ] Successfully run `f_create_earth` (10%) - [ ] Successfully build (10%) - [ ] Successfully run (10%) - Before 2025-05-30 - [ ] Performance (30%) - [ ] Profile (10%) --- - Build dependencies - Show screenshot to check NCO install, run:`$YOUR_NCO_PATH/bin/ncks --version` > [!Warning] Note that successfully building the dependencies does not necessarily mean that you can compile and run FAMIL. You may need to check the dependencies. - Successfully run `./f_create_earth` - Show screenshot that you successfully create testcase configuration  ![image](https://hackmd.io/_uploads/rJiALBjbxg.png) - Successfully build - Successfully run `./f_compile` - Show screenshot to check famil.x exist, run: `ls -lh $YOUR_FAMIL_PATH/run/$CASENAME/work` - Successfully run - Show the run screenshot with total runtime - Performance - Will judge `Total runtime` in the output log. - Profile - You can use any profiler. - Show your analysis in the report. ## Sample Output ``` MPP_DOMAINS_STACK high water mark= 60480 Tabulating mpp_clock statistics across 24 PEs... tmin tmax tavg tstd tfrac grain pemin pemax Total runtime 1183.404807 1183.410838 1183.407616 0.001882 1.000 0 0 23 MAIN: initialization 6.064559 6.602922 6.248726 0.203653 0.005 1 0 23 MAIN: time loop 1168.474968 1168.480924 1168.477737 0.001859 0.987 1 0 23 MAIN: termination 0.704776 8.288136 4.279498 3.566381 0.004 1 0 23 Lin_cld_microphys 48.776889 83.526235 63.551248 9.637013 0.054 41 0 23 MPP_STACK high water mark= 0 ================================================================= Successfully performing the time integration! ================================================================= ``` ## Submission - Before 2025-05-21 - Screenshots, pack in FAMIL_{YOUR_GROUP_NUMBER}zip - Build dependencies - ./f_create_earth - Successfully build - Successfully run - Before 2025-05-30 - `FAMIL_run_{YOUR_GROUP_NUMBER}.out` - Contains the above output with "The total runtime is xxx sec" - You have to submit **full** output log! - `FAMIL_{YOUR_GROUP_NUMBER}.pdf` - **Build process** - Description of profile result - Additional notes for grading > [!Note] If you modify the code for performance or do something may change the result accuracy, write in the report. We will ask you for additional files for judge it's qualified or not. # Gromacs (20%) ## Scoring - Part 1 - GROMACS Benchmarking & Optimization: 80% - Part 2 - AlphaFold vs PDB Structural Analysis: 20% ## Part 1: GROMACS Benchmarking & Optimization ### Task 1. Run each benchmark system **without changing simulation parameters** (no modification to `.mdp`, `.gro`, `.top` files). 2. Optimize performance of GROMACS by: * Compiling with optimal flags (SIMD, MPI, OpenMP settings, FFT choices, etc.). * Adjusting runtime parameters (e.g., `-ntomp`, `-pin` options, thread placement, environment variables). 3. **Profile the simulation** to identify performance bottlenecks. ### Benchmark Cases You will work with a set of benchmark systems selected from the [**Max Planck Institute’s Free GROMACS Benchmark Set**](https://www.mpinat.mpg.de/grubmueller/bench) (fetch them yourself): 1. benchMEM: 82k atoms, protein in membrane surrounded by water, 2 fs time step - 10,000 steps (default) - `mpirun -np 24 --host comp1:12,comp2:12 gmx_mpi mdrun -s benchMEM.tpr` ![image](https://hackmd.io/_uploads/B1SljEFxee.png) - benchRIB: 2 M atoms, ribosome in water, 4 fs time step - 1000 steps: REMEMBER TO ADD `-nsteps 1000` - `mpirun -np 24 --host comp1:12,comp2:12 gmx_mpi mdrun -s benchRIB.tpr -nsteps 1000` ![image](https://hackmd.io/_uploads/ryWRIwKxll.png) > The performance is measured in `ns/day`, you can refer to my results. ### Evaluation Criteria | Item | Weight | | -------------------------------------------------------------- | ------ | | Visualization | 5% | | Performance gain (vs. "your" baseline) | 50% | | Scaling & Profiling | 25% | Please provide a short hackmd report: * **Visualization:** How did you visualize the results? * **Performance:** How did you optimize GROMACS? **Compare the performance you obtained using the baseline version (the one installed in class) with your most optimized version.** * **Scaling:** How well does performance improve as core counts increase? Provide a simple scaling plot. * **Profiling:** Explain where time is spent (MPI, computation, I/O), using `gmx mdrun -v -resetstep` or profiling tools (e.g., IPM, vtune). ## Part 2 - AlphaFold vs PDB Structural Analysis ### Task 1. Use [**AlphaFold**](https://alphafoldserver.com/) to predict the structure of a **known protein** (e.g., GFP / UniProt ID: P42212). ![image](https://hackmd.io/_uploads/rk8lvmYlgx.png) 2. Download the corresponding [**PDB experimental structure**](https://www.rcsb.org/). 3. Perform a **structural comparison**: * Visualize both structures (suggested: PyMOL or VMD). * Align the two and calculate backbone RMSD. * Discuss the differences, focusing on flexible regions, prediction accuracy, and AlphaFold confidence (pLDDT). ### Evaluation Criteria | Item | Weight | | -------------------------------------------------------- | ------ | | Correct workflow (steps executed, files prepared) | 10% | | Results & analysis (visuals + report) | 10% | * **Workflow:** Show clear step-by-step procedures: AlphaFold prediction, PDB retrieval, structure alignment, RMSD calculation. * **Analysis:** Visual comparison (screenshots), RMSD values, and a **brief discussion** (1 page max) about observed differences and potential reasons (e.g., experimental vs. predicted uncertainties). # Presentation (30%) - 6/4 上課時間 - 每組 7min + 3min QA - **6/4 17:00** 前在 eeclass 上傳投影片。 ## Suggestions - 介紹團隊合作模式 - 團隊如何分工 - 有什麼樣的 Policy 或是協作工具可以讓團隊更有效率 - 團隊合作的心得 - 針對每個 Application，可以提供 - 使用的編譯器、Library - 參數優化技巧 (HPL) - Profiling 結果 - 程式的優化方式 - Scalability - etc. - 可以依照此步驟來優化程式 & 編排簡報 - 將程式成功編譯 - 先有一個 Baseline - 嘗試不同的編譯器、Library、調整程式提供的參數 - GCC, ICC, ICX 等 Compiler - OpenBLAS, MKL 等 BLAS Library - IntelMPI, OpenMPI 等 MPI Library - 做基本的 Profiling - 可以使用 Perf, Vtune, NSight Systems, AMD uProf 等工具 - 優化 & 實驗 - 從 Profile 結果找出可能的優化方向 - 嘗試實作優化 - 觀察結果 & 比較 Profile 結果前後差異 - 請注意時間！將最重要、放最多 effort 的地方 Present 給我們！