# Notes on mpEDM
## Code Structure
Source files are located under [`src/`](https://github.com/keichi/mpEDM/tree/master/src).
### User-facing classes
- `simplex_*.cc`: Performs prediction of a time series using the simplex projection algorithm.
- `embedding_dim_*.cc`: Finds the optimal embedding dimension of a time series using simplex projection.
- `cross_mapping_*.cc`: Performs cross mapping from one time series to multiple other time series.
### Internal classes
- `nearest_neighbors_*.cc`: k-nearest neighbor search. CPU version uses OpenMP for multi-core and SIMD parallelization. GPU version uses ArrayFire.
- `data_frame.cc`: Class to hold the input dataset (a column-major 2D array). Supports reading data from a CSV file or an HDF5 file.
- `lut.cc`: Lookup table for holding the pre-computed nearest neighbors.
- `mpi_master.cc`, `mpi_worker.cc`: Utility class for distributing computation across multiple nodes using master-worker model.
Automated unit tests are located under [`test/`](https://github.com/keichi/mpEDM/tree/master/test). Output is validated against ground-truth data generated using pyEDM (cppEDM).
## Stand-alone applications
When building mpEDM, several stand-alone applications are generated in addition to the shared library (`libmpedm.so`).
- `cross_mapping_bench`: This is the cross mapping application we used for the fish brain project. It performs an all-to-all cross mapping of the input time series and writes the CCM rho values to a file.
- `cross_mapping_mpi_bench`: This is the cross mapping application we used for running on multiple nodes. A Message Passing Interface (MPI) implementation must be available.
- `simplex_bench`, `knn_bench`: These are used for performance measurement/optimization purposes only and do not produce any output.
## Performance Numbers
Performance measurements were done on AIST's [ABCI](https://abci.ai/) supercomputer using the GPU backend of mpEDM. Each compute node in ABCI is equipped with the following hardware:
- NVIDIA Tesla V100 SXM2 x4
- Intel Xeon Gold 6148 x2
The table below shows the total runtime to complete an all-to-all cross mapping (cross maps from every time series to every other time series) of the input dataset.
| Dataset | # of Time Steps | # of Time Series | Single Node | 512 Nodes |
|:----------- | ---------------:| ----------------:| -----------:| ---------:|
| Fish1_Normo | 1,450 | 53,053 | 1,973s | 20s |
| Subject6 | 3,780 | 92,538 | 13,953s | 101s |
| Subject11 | 8,528 | 101,729 | 39,572s | 199s |
| Fly80XY | 10,608 | 83 | 14s | N/A |