owned this note
owned this note
Published
Linked with GitHub
---
tags: OpenDreamKit, data
---
# Persistent memoization
## References
- [ODK D6.9](https://github.com/OpenDreamKit/OpenDreamKit/issues/143): shared persistent memoization library for Python/Sage
- [T6.9](https://github.com/OpenDreamKit/OpenDreamKit/issues/131): Memoisation and production of new data M24-M42@.612 PM; Sites: USTAN (lead), USlaski, UPSud, UWarwick
## Context
Many CAS users run large and intensive computations, for which they want to collect the results while simultaneously working on software improvements. GAP retains computed attribute values of objects within a session; SAGE currently has a limited cached method. Neither offers storage that is persistent across sessions or supports publication of the result or sharing within a collaboration. We will use, extend and contribute back to, an appropriate established persistent memoisation infrastructure, such as python-joblib, redis-simple-cache or dogpile.cache, adding features needed for storage and use of results in mathematical research. We will design something that is simple to deploy and configure, and makes it easy to share results in a controlled manner, but provides enough assurance to enable the user to rely on the data, give proper credit to the original computation and rerun the computation if they want
to. Results are reported in D6.9.
## Aim
Bridge the gap between the two extremes of pure functions and key-value stores. One may think of it as providing users / systems with:
- pure functions with persistent memoization;
- lazy key-value stores where data are computed on the fly if needed.
## Use cases
- Make it trivial for a researcher to hand-pick time-critical functions, and add persistent memoization
- Make it easy for a group of researchers to use a shared store
- Make it easy to publish the produced database
## Strategy
There can be many hard problems arising, like:
- serialization for one system
- serialization in a standard format for several systems
- computing unique keys
- guaranteeing that the data is correct
- atomicity
- access rights
- provenance tracking
We can't hope to solve them all for all situations.
Instead the strategy is to make a lightweight modular architecture that delegates the responsability for solving the hard problems; e.g.:
- the user for providing appropriate key computation;
- the store for atomicity, access rights, provenance tracking;
- Math-in-the-Middle for sharing across systems.
Two implementations will be provided for Sage/Python and GAP respectively, but otherwise the focus will be on architecture, standards, and best practices,
### Metadata
- Medata for a given key--value store (store or function level)
- OpenMath (?) description of what is stored
- data format description / ontology
- Metadata of individual key-values
- Who computed it with which system, software version, ...
## Examples of key value stores
- a flat database on disk
keystore=flatstore(dir="...") # a local directory
keystore=flatstore(dir="...") # a directory on a shared drive (e.g. OwnCloud, DropBox, ...)
keystore=flatstore(dir="git@") # a git repository
- a database:
keystore=database(type: "mysql?", host, port)
keystore=database(type: "scscp", host, port)
Potential nice candidate: [CouchDB](https://couchdb.apache.org/),
in particular thanks to its [simple web API](https://en.wikipedia.org/wiki/Apache_CouchDB)
# Report
Deliverable report due Thu 28 Feb 2019.
## Things to put in report
Ticked list of good software practice checklist from ODK website.
Table of other implementations, alive/dead, columns for features
including func_persist from Sage, cached_func, and cached_method
also including pypi packages with similar names
Use cases -- why is this good for mathematicians?
Show the report to Nicolas, Michael Kohlhase and group, and Luca De Feo, among others, before deadline.
## Packages
* Python Package: https://github.com/mtorpey/pypersist
* GAP Package (early dev): https://github.com/gap-packages/Memoisation
# Luxembourg meeting
## Questions ##
Which established infrastructure do we want to use?
- [`python-joblib`](https://joblib.readthedocs.io/en/latest/)
- [`redis-simple-cache`](https://github.com/vivekn/redis-simple-cache)
- [`dogpile.cache`](https://dogpilecache.readthedocs.io/en/latest/)
- [Python shelves](https://docs.python.org/3/library/shelve.html)
- Something else
- Something new
Should we use OpenMath/SCSCP?
- Only when/if it becomes necessary
What features are required?
- interoperability between systems (Math-in-the-Middle?)
- username--password logins
When should two inputs be regarded as equal?
- exact same data
- isomorphism etc.
## [Use Cases 📝](/KSBMXDpZRbCKCKVZH7CJpg)