OCFL integration === - Start Date: 2021-11-11 - RFC PR: [#number](https://github.com/inveniosoftware/rfcs/pull/#number) - Authors: Dan Granville, Peter Cornwell, Lars Holm Nielsen - State: DRAFT # OCFL integration ## Tasks - Code architecture (Lars) - Other information needed to be persisted - Export use case first - On publish ... - OCFL structure (Peter/Dan) - schemata - ocfl community meeting - add MD5 fixity to ocfl-test-data - Community extension proposal: - Schemata-registry: - Depositing the JSONSchema related a JSON record inside the OCFL structure. - will write schema repository draft - expect to have to put copies of schemas (hashed names) into OCFL_ROOT/extensions/NNNN-schema-registry/ - all versions of all referenced schemas ## Summary ## RDM specific implementation issues - do the object-per conceptdoi/parentid structure - at least for now - Parent id ## think this is specific to our implementation - Deduplication of files - solves within version deduplication - fine for archive (vs running off) - expect to do some piecing back together - understand what is done (async'ly) when records modified vs what happens when a dump is triggered - and how - - need to (slightly) modify JSON - schema reference(s) - Current example based on dump of record + file metadata - Probably want all metadata in minimal number of JSON files - Re-insert to new instance - via API? If so want API shaped JSON - Only available import method now - Need to consider all this without breaking possible future run-from-OCFL model ## Tape Library specific implementation issues - a tape stores multiple files: tar is the utility generally used for writing them - you can seek to file_number using the mt utility and then 'type' a summary or read the tape archive file (generally to disk) - appending a tape file after what's currently the last file is fine, but it's not consistent across implementations what happens if you try and delete a file on tape - we need everthing related to a repository snapshot (which might be a single OCFL archive file or not) to reside in a sub-directory, which is then written as a tape archive file - significantly, the name of the sub-directory is the first thing that tape library management software can detect, since files on tape themselves are not named - tape library management software should therefore determine this sub-directory name - therefore it doesn't matter, to tape library management software, what the OCFL archive file is called nor who owned it, though its creation date will be preserved in the tape archive file encapsulating ## Motivation Active storage vs long term preservation Versioning mechanisms Validting a tree Goal: - Can you host tape library - It's safe and replicated - Is disk space an issue? Use cases: - Shelvable tape library image of a repository. - Permissions - Ant speciments - Use cases: - Access to the OCFL structure is for an administrator to look at - Record should contain enough information to be able for an administrator to understand in the far future if someone can be given access. ## Detailed design - Understand OCFL from DSpace - Make an example of how an Invenio OCFL should look like - RO-Crate --- 1. OCFL strucutre and object metadata. - DSpace - ... - Record structure: - Ownership??? Community membership?? - Related objects?? (excluded for now) - What about users? communities? controlled vocabularies??? 2. Writing the OCFL Change MD5 to SHA512? 3. Validating the OCFL 4. Backend queue management Use cases: - Export use case: One off, snapshot of entire repository - Async/on demand - REST APIs for getting OCFL objects ### Message bus **Overview** - Async - Buffering - Replay - Acknowledgement **Broker** **Event schemas** **Event handlers** **Publishing events** **Consuming events** ## Example ## How we teach this ## Drawbacks ## Alternatives ## Unresolved questions ## Resources/Timeline # Title Code example: ```python from x import a ```