From 2022 OSU CSE5449 Lab 1
Unlimited Attempts Allowed
The lab has the following tasks.
Login to Owens (OSC) cluster with your accounts and learn how to allocate nodes. You can allocate nodes with CPU only as well as nodes with GPUs.
Login:
To allocate a node with GPU, use:
To allocate a CPU-only node, use:
Please note that you need to add '-A PAS2312’ at the end to be able to use the allocated resources.
Please use Owens for this lab
ssh into compute node after allocating a node to install miniconda and OMB
Node name: To get the names of allocated nodes use the following command:
And ssh into it
Create directories
Load CUDA and GCC (Run this command after every new allocation)
Load MVAPICH2 (Run this command after every new allocation)
OMB will be installed in ~/owens/lab1/osu-micro-benchmarks-6.1/installed
Only required to install MPI4py
Once conda env is created and libraries installed, we just need to load in future experiments using following commands
You need to run the following experiments for the Lab.
Allocate two nodes with GPU, use:
Run these command after every new allocation
Allocate 2 nodes and run OSU Micro-Benchmarks (OMB) on Owens.
Point-to-point:
GPU:
CPU:
flags:
D: transfer data to/from GPU
H: transfer data to/from CPU
-m: max message size
–help: for more information
For runing osu_bibw on GPU use following command:
Collectives:
-d cuda: Place buffer on GPU
Allocate 2 nodes and run OSU Micro-Benchmarks for Python (OMB-Py) on Owens.
numba and cupy buffers are allocated on GPU, and NumPy buffer is created on CPU.
For runing bibw benchmark on GPU( numba & cupy) use following command:
Based on your experiments, answer the following:
Submit a report in .docx
or .pdf
format through carmen.
Questions 1 and 2
: please paste the output of each experiment/benchmark and the run command (srun …..)
Question 3 and 4
: compare the performance numbers for CPU and GPU in experiment 1/2 for different message sizes and give reasons why CPU communication may be faster
Question 5
: performance comparison and a list of reasons
Question 6
: 2-3 lines description of MPI4py and a list of reasons why we need it
Question 7
: A table
Question 1
: 3 points for each experiment (latency, bw, bibw, bcast, and allreduce)
Question 2
: 2 points for each experiment (latency, bw, bibw, bcast, and allreduce)
Academic Integrity
Collaborating or completing the assignment with help from others or as a group is NOT permitted
Copying or reusing previous work done by others is not permitted. You may reuse the work you did as part of this class
OMB: https://mvapich.cse.ohio-state.edu/benchmarks/Links
Miniconda: https://docs.conda.io/en/latest/miniconda.htmlLinks
MPI4py: https://mpi4py.readthedocs.io/en/stable/Links
https://ieeexplore.ieee.org/document/9439927Links
MVAPICH2: https://mvapich.cse.ohio-state.edu/Links
Owens: https://www.osc.edu/resources/technical_support/supercomputers/owensLinks
Pitzer: https://www.osc.edu/resources/technical_support/supercomputers/pitzer