Meeting agenda, notes and actions for 2024-06-28 at 12 noon ET Organizer : Daniel Wheeler Attendees : - Daniel Wheeler (he/him) - Hafiz Noman - Olga Wodo - Marvin Tegeler - Steve DeWitt (he/him) - Katsuyo Thornton - David Montiel ## Links - [Google Meet][meet] - [WG GitHub repository][repo]m - [Google Docs][docs] - [Previous meeting notes][previous] - [Proposal document][proposal] - [Resource links][resources] - [Overleaf publication](https://www.overleaf.com/project/663e34cc1c8095115e0de913) ## Agenda 1. Any questions or items to raise for discussion (please add) 1. Reminders - Next office hours: - 2024-07-05, Friday, 11AM ET - 2024-06-21, Friday, 11AM ET - Next WG meet - 2024-06-26, Wednesday, 12-noon ET 3. Summer student - Austyn Nguyen will be working with me on the schema design with ro-crate and implementing an example for PFHub. He'll be joining our meetings - We're currently working on adding 4. Think about rotating schema section editing 5. Developing a FAIR-compliant Metadata Standard for Phase Field Data using Semantic Web Resources, [link on overleaf][overleaf] - Does everyone have access? 6. Computational execution mindmap, [see below](#Computational-Execution-Mindmap) - Divide between persistent and varying metadata 7. RO-Crate - During office hour on 2024-04-12 Michael Selzer strongly advocated for RO-Crate - ELN Consortia are using it - Based on schema.org - In its simplest from an RO-Crate describes a directory structure - [SWC style lesson](https://www.researchobject.org/packaging_data_with_ro-crate/index.html) 8. RO-Crate example, [see below](https://hackmd.io/RbmqRz-6S82Z_8zPVa2HxA#RO-Crate-Example) 9. [Workflow Run Ro-Crate](https://www.researchobject.org/workflow-run-crate/) - Should we reach out or join meetings? - Publication: [Recording provenance of workflow runs with RO-Crate](https://doi.org/10.48550/arXiv.2312.07852) - [see review below](#Review-of-Recording-provenance-of-workflow-runs-with-RO-Crate) 10. Failed to setup literature review on google docs. Sorry! - I have lit review in personal notes, will try and copy over for next meeting ## Notes / Action Items - Include lit review notes on google docs - Hafiz: how inputs and outputs are related ## Computational Execution Mindmap ![computational-execution-mindmap](https://hackmd.io/_uploads/HJZ7TtQV0.svg) ## RO-Crate Example Using Python to create an RO-Crate I used the [pyrocrate tests](https://github.com/ResearchObject/ro-crate-py/blob/a551acb4d4084c59e32e2fd79bd82686e6b3aaa2/test/test_model.py#L436) to figure out the code below. Documentation is poor. [Example of capturing software tools](https://www.researchobject.org/ro-crate/1.1/provenance.html#software-used-to-create-files) ```python= from rocrate.rocrate import ROCrate from rocrate.model.person import Person from rocrate.model.entity import Entity from rocrate.model.computationalworkflow import ComputationalWorkflow crate = ROCrate() yaml = crate.add_file("working/pfhub.yaml", properties={ "name": "PFHub meta data file", "encodingFormat": "text/yaml" }) csv = crate.add_file("working/free_energy_1a.csv", properties={ "name": "Free Energy", "encodingFormat": "text/csv" }) license_id = "https://spdx.org/licenses/CC0-1.0" wheeler_id = "https://orcid.org/0000-0002-2653-7418" keller_id = "https://orcid.org/0000-0002-2920-8302" wheeler = crate.add( Person( crate, wheeler_id, properties=dict(name="Daniel Wheeler", affiliation="NIST") ) ) license = crate.add(Entity( crate, identifier=license_id, properties={ "@type": "CreativeWork", "name": "CC0-1.0", "description": "Creative Commons Zero v1.0 Universal", "url": "https://creativecommons.org/publicdomain/zero/1.0/" } ) ) crate.license = license crate.root_dataset["author"] = wheeler crate.description = "An example of generating an ro-crate from a PFHub result, for now this is only focused on the computational platform, environment and implementation" #from metadata list on workflow hub https://about.workflowhub.eu/docs/metadata-list/ crate.root_dataset["title"] = "PFHub title: fipy_1a_travis" ``` ```python= keller = crate.add( Person( crate, keller_id, properties=dict(name="Trevor Keller", affiliation="NIST") ) ) workflow = crate.add_workflow('https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py') workflow.programmingLanguage = "Python 3.10" workflow["creator"] = keller workflow["dateCreated"] = "2017-01-09" crate.add(workflow) crate.write("exp_crate") ``` RO-Crate JSON file ```json= { "@context": "https://w3id.org/ro/crate/1.1/context", "@graph": [ { "@id": "./", "@type": "Dataset", "author": { "@id": "https://orcid.org/0000-0002-2653-7418" }, "datePublished": "2024-04-19T20:44:03+00:00", "description": "An example of generating an ro-crate from a PFHub result, for now this is only focused on the computational platform, environment and implementation", "hasPart": [ { "@id": "pfhub.yaml" }, { "@id": "free_energy_1a.csv" }, { "@id": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py" } ], "license": { "@id": "https://spdx.org/licenses/CC0-1.0" }, "title": "PFHub title: fipy_1a_travis" }, { "@id": "ro-crate-metadata.json", "@type": "CreativeWork", "about": { "@id": "./" }, "conformsTo": { "@id": "https://w3id.org/ro/crate/1.1" } }, { "@id": "pfhub.yaml", "@type": "File", "encodingFormat": "text/yaml", "name": "PFHub meta data file" }, { "@id": "free_energy_1a.csv", "@type": "File", "encodingFormat": "text/csv", "name": "Free Energy" }, { "@id": "https://orcid.org/0000-0002-2653-7418", "@type": "Person", "affiliation": "NIST", "name": "Daniel Wheeler" }, { "@id": "https://spdx.org/licenses/CC0-1.0", "@type": "CreativeWork", "description": "Creative Commons Zero v1.0 Universal", "name": "CC0-1.0", "url": "https://creativecommons.org/publicdomain/zero/1.0/" }, { "@id": "https://orcid.org/0000-0002-2920-8302", "@type": "Person", "affiliation": "NIST", "name": "Trevor Keller" }, { "@id": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py", "@type": [ "File", "SoftwareSourceCode", "ComputationalWorkflow" ], "creator": { "@id": "https://orcid.org/0000-0002-2920-8302" }, "dateCreated": "2017-01-09", "name": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard", "programmingLanguage": "Python 3.10" }, { "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl", "@type": "ComputerLanguage", "alternateName": "CWL", "identifier": { "@id": "https://w3id.org/cwl/" }, "name": "Common Workflow Language", "url": { "@id": "https://www.commonwl.org/" } } ] } ``` ## Review of "Recording provenance of workflow runs with RO-Crate" - Presents Workflow Run RO-Crate (WRROC) - 3 new profiles - Has concept of retrospective and prospective provenance built into profiles - RO-Crate describes a directory structure in its simplest form - The Workflow Run RO-Crate is a set of 3 profiles that extends RO-Crate - WRROC strikes a balance between actionable and readable. The profiles are types. - [List of requirements](https://www.researchobject.org/workflow-run-crate/requirements) - Containers - memory usage - config files - env files - timings - success / failure status - inputs / outputs - versioning - scripts - parameters - Each requirement is linked ot a github issue! - 3 types of workflow run crates - Process run crate (describe the exectusion as one or more tools) - includes human executions - poorly defined - exectution of multiple software apps - allows "composite" data sets - workflow run crate (predefined workflow) - well defined - provenance run crate (workflow computation including internal details) - internal details such as inputs / outputs between steps - Designed to have workflows rerun -- reproducible - inheritance mechanism allows reuse of common parts of descriptions - 7 different workflow systems now using WRROCs (includig Galaxy) - All crates are included in Provenance Run WRROC. - Supposedly [runcrate](https://github.com/ResearchObject/runcrate) should allow WRROC to actually be executed? - Documentation is poor though - Allows data to be described at very different levels of granularity - runcrate toolkit will be expanded it says <!-- links --> [meet]: https://meet.google.com/bas-vkxi-rmq [repo]: https://github.com/marda-alliance/phase-field-schema [docs]: https://drive.google.com/drive/u/1/folders/1zhUi3A-CXxrkh4gTkLVUOncdqAMIAXND [previous]: https://github.com/marda-alliance/phase-field-schema/blob/main/meeting-minutes/meet-008_2024-03-27.md [proposal]: https://github.com/marda-alliance/phase-field-schema/blob/main/proposal.md [resources]: https://github.com/marda-alliance/phase-field-schema/discussions/5 [overleaf]: https://www.overleaf.com/project/663e34cc1c8095115e0de913