owned this note
owned this note
Published
Linked with GitHub
# 8 Introduction into MPI
###### tags: `SS2021-IN2147-PP`
## Basics
### The Message Passing Interface (MPI)
* http://www.mpi-forum.org/
* An API (Application Programming Interface) specification
* No ABI (Application Binary Interface) guarantees
* Changing MPI implementation means recompiling
### MPI Principles and/vs. Practices
#### Common practice
* One source code compiled to one one binary
* Within code distinguishing of MPI processes or data elements
* Common usage model: **`Single Program Multiple Data (SPMD)`**
* Compiled against same MPI implementation
* One datatype representation in entire system
* MPI implementation provide startup mechanism
* Often integrated into resource manager and job scheduler
* Platform specific
#### Note
* MPI is **`NOT`** an SPMD model
* Often used as such
* MPI is more flexible
* Originally intended as runtime portability layer
## A First Example
### A First Simple Example: Value Forwarding

```c=
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main (int argc, char** argv)
{
double value;
int size, rank;
MPI_Status s;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &size);
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
value=MPI_Wtime();
printf("MPI Process %d of %d (value=%f)\n", rank, size, value);
if (rank>0)
MPI_Recv(&value, 1, MPI_DOUBLE, rank-1, 0, MPI_COMM_WORLD, &s);
if (rank<size-1)
MPI_Send(&value, 1, MPI_DOUBLE, rank+1, 0, MPI_COMM_WORLD);
if (rank==size-1)
printf("Value from MPI Process 0: %f\n",value);
MPI_Finalize ();
}
all
```
### MPI_Init and MPI_Finalize
```c
// Initializes the MPI library
// argc/argv pair, typically passed from main
int MPI_Init( int *argc, char ***argv )
// Finishes the use of MPI and releases all resources
int MPI_Finalize( void )
```
### Error Handling in MPI
> By default, an error detected during the execution of the MPI library causes the parallel computation to abort, except for file operations. However, MPI provides mechanisms for users to change this default and to handle recoverable errors.
> from MPI Standard 3.1, Section 2.8, page 21
* (Almost) every MPI routine returns an error code
* In most cases, MPI simply aborts the computation
* Newer versions of the standard will improve this mechanism
### Communicators and MPI_COMM_WORLD
#### Central concept in MPI: Communicators
* Group of MPI processes
* Communication context
* Communication across communicators are not possible
* Different context
* But: one MPI process can be in multiple communicators
#### Default communicators
```c
// All initial MPI processes
MPI_COMM_WORLD
// Contains only the own MPI process
MPI_COMM_SELF
```
### `MPI Process` vs. `OS Process` vs. `Rank`
* Basic units of concurrency in MPI
* Not the same as `OS processes`
* In most cases `MPI processes` are implemented as `OS processes`
* This is NOT guaranteed!
* CEA’s MPC: an MPI process is one thread
* Running on one more OS processes (one per node)
* Global data is then shared among co-located MPI processes
* `Rank` is an identifier of an MPI process
### MPI_Comm_get_size/rank
```c
int MPI_Comm_size (MPI_Comm comm, int *size)
// IN comm Communicator
// OUT size Cardinality of the process group for comm
int MPI_Comm_rank (MPI_Comm comm, int *rank)
// IN comm Communicator
// OUT size Rank of the current process in comm
```
### MPI_Send and MPI_Recv
```c
// Send one message to another MPI process on a given communicator
// possibilities of deadlocks!
int MPI_Send (void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm)
// Receives of a message from another MPI process on a given communicator
// possibilities of deadlocks!
// Partial receives are permitted
int MPI_Recv (void *buf, int count, MPI_Datatype dtype, int source, int tag, MPI_Comm comm, MPI_Status *status)
```
### Most Common MPI Data Types (C Versions)
```c
MPI_CHAR
MPI_SHORT
MPI_INT
MPI_LONG
MPI_LONG_LONG
MPI_UNSIGNED_SHORT
MPI_UNSIGNED
MPI_UNSIGNED_LONG
MPI_UNSIGNED_LONG_LONG
MPI_FLOAT
MPI_DOUBLE
MPI_C_COMPLEX
MPI_C_DOUBLE_COMPLEX
```
### MPI_Status
```c
// Structure containing information on incoming message
int MPI_Get_count(const MPI_Status *status, MPI_Datatype dtype, int *count)
```
### MPI_Wtime and MPI_Wtick
```c
// Return wall clock time since a reference time stamp in the past
double MPI_Wtime(void)
// Returns resolution of MPI_Wtime
double MPI_Wtick(void)
```
## The MPI Standard
### Going Beyond the 6 Basic MPI Functions Init, Finalize, Get-Rank, Get-Size, Send, Recv
#### Point-to-Point operations
* Different variants of MPI_Send
* `Non-blocking` and persistent versions
#### Collective operations
* Group operations across all processes of a communicator
* Examples: **`barriers`**, **`reductions`**, **`scatter`**/**`gather`**
* `Neighborhood` and `non-blocking` collectives
#### Group and communicator management
* Creates of groups and communicators
* Mapping of communicators to topologies
#### One-sided communication (Remote Memory Access or RMA)
#### File I/O
### History of the Message Passing Interface

### Notable Additions to MPI 3.0
* `Nonblocking` collectives
* `Neigborhood` collectives
* MPI Tool Information Interface
* `One sided` communication enhancements
* Large data counts (messages more with more than 32bit count)
* Topology aware communicator creation
* Noncollective communicator creation
* New language bindings
### MPI Continuous to Evolve

* http://www.mpi-forum.org/
* MPI 3.0 ratified in September 2012
* MPI 3.1 ratified in June 2015
* MPI 4.0 planned for early next year / draft in November
### Likely New Functionality in MPI 4.0
* Persistent Collectives
* Enables optimizations for repeated operations over groups
* Partitioned Communication
* Enables send/recv of partial messages as soon as data is ready
* Better Fault Handling
* Support for clean and scoped error reporting
* New Tools Interface
* Callbacks for internal MPI events
* Topology Support
* Query of hardware topologies and options for improved mappings
* Better Scalability with MPI Sessions
* Elimination of MPI_COMM_WORLD to avoid the need for a global startup
* And more ...