Testing with offline data

# Testing with offline data See [SwarmPAL, Including test data](https://github.com/orgs/Swarm-DISC/projects/1?pane=issue&itemId=46530806). ## Outline Wishlist: * Package user * should not have to download test data even using git clone * developing a new feature * Unit tests should run fast * Initial data download should be automatic and pain free * Adding test data * Small number of steps * Any contributor should be able to add new data sets | Usecase | static site with rsync/wget | [pooch][POOCH] | [git-lfs][GITLFS] | | --- | --- | --- | --- | | Package user | | | looks like `git clone` will get the data | | Developing a feature | need to run `download_data.sh` | | | | Adding unit tests | contributors need ssh access to static | add data to a second repo | familiar git workflow, but not sure about integration with tools | | | | have consistent branch/tag names across two repos | | | CI | new dependancies: rsync,wget,md5sum | | one user reported difficulty in [CI][GITLFS_CI] | | | run a script in workflow | | | [POOCH]: https://www.fatiando.org/pooch/latest/sample-data.html#basic-setup [GITLFS]: https://git-lfs.com/ [GITLFS_CI]: https://old.reddit.com/r/git/comments/uxaca7/when_to_use_git_lfs_large_files_storage/i9y1309/ ## Work in progress Separate files for * `config.yaml` for saving the description of a list of datasets and expected results that should be produced in unit tests * `generate_data.py` downloads data with SwarmPAL and saves the result as NetCDF file. * `sync_data.sh` uses `rsync` to upload data to GeoSciences servers where they are hosted as a [static web](https://www.geos.ed.ac.uk/~ddekler/swarmpal_data/) site. * `download_data.sh` uses `wget` to download files from GeoSciences servers