Dask Tasking (Discussion postponed)

###### tags: `proposal` `idpi # Dask Tasking (Discussion postponed) ## Experimental Results Here we documented the results obtained using dask through xarray. M:\MCH-OS\2\4\1\1\241.1-11 Reference Version and Implementation 2016-2025\PROJ RZ+\Meetings\APN RZ+ Review Meetings\20221125\20221125_dask.pptx The parallel model employed, chunks the data and applies task parallelism to each chunk. ![](https://hackmd.io/_uploads/rkYO5lH4h.png) The documents summarizing the results describes the difficulties to obtain good scalability, due to the overhead of the sequential work of dask. There is a trade-off between parallelism and constructing too small tasks (where the computing is small compared to overheads) ![](https://hackmd.io/_uploads/H10kixrEn.png) These limitations can only be mitigated when creating bigger (more computationally expensive) tasks. This is shown when running experiments collecting multiple numbers (xN) of the same BRN computation. ![](https://hackmd.io/_uploads/HkQNjeH42.png) While scalability results improved, they are still far from ideal scaling, and de facto, limits the scalability to one node. This is in contrast with the results obtained using data parallelism for the domain (instead of task parallelism). ![](https://hackmd.io/_uploads/S13qierNh.png) (reference: M:\MCH-OS\2\4\1\1\241.1-11 Reference Version and Implementation 2016-2025\PROJ RZ+\Meetings\APN RZ+ Review Meetings\20230427\20230427_osc_EarthKitEvaluation.pptx). ## Proposal These results indicate that the task parallelism model using chunked data for our domain sizes yields poor scalability results. Additionally, the time to solution requirements will not required parallelization for individual products in many cases. Therefore the proposal here is to explored: * dask delayed for tasks generated around products. That is, we do not parallelize based on chunks of data but rather each product (on its entire domain) is an individual task that is scheduled through dask. * alternative task parallelism runtime systems (like [legion](https://legion.stanford.edu/overview/index.html)) in order to compared scalability. * For the cases where more vertical scalability is required, i.e. expensive products that do require intra-node parallelism, [numba](https://numba.pydata.org/) will be explored.