Meeting agenda, notes and actions for 2024-06-28 at 12 noon ET
Organizer
: Daniel Wheeler
Attendees
: - Daniel Wheeler (he/him)
- Hafiz Noman
- Olga Wodo
- Marvin Tegeler
- Steve DeWitt (he/him)
- Katsuyo Thornton
- David Montiel
## Links
- [Google Meet][meet]
- [WG GitHub repository][repo]m
- [Google Docs][docs]
- [Previous meeting notes][previous]
- [Proposal document][proposal]
- [Resource links][resources]
- [Overleaf publication](https://www.overleaf.com/project/663e34cc1c8095115e0de913)
## Agenda
1. Any questions or items to raise for discussion (please add)
1. Reminders
- Next office hours:
- 2024-07-05, Friday, 11AM ET
- 2024-06-21, Friday, 11AM ET
- Next WG meet
- 2024-06-26, Wednesday, 12-noon ET
3. Summer student
- Austyn Nguyen will be working with me on the schema design with ro-crate and implementing an example for PFHub. He'll be joining our meetings
- We're currently working on adding
4. Think about rotating schema section editing
5. Developing a FAIR-compliant Metadata Standard for Phase Field Data using Semantic Web Resources, [link on overleaf][overleaf]
- Does everyone have access?
6. Computational execution mindmap, [see below](#Computational-Execution-Mindmap)
- Divide between persistent and varying metadata
7. RO-Crate
- During office hour on 2024-04-12 Michael Selzer strongly advocated for RO-Crate
- ELN Consortia are using it
- Based on schema.org
- In its simplest from an RO-Crate describes a directory structure
- [SWC style lesson](https://www.researchobject.org/packaging_data_with_ro-crate/index.html)
8. RO-Crate example, [see below](https://hackmd.io/RbmqRz-6S82Z_8zPVa2HxA#RO-Crate-Example)
9. [Workflow Run Ro-Crate](https://www.researchobject.org/workflow-run-crate/)
- Should we reach out or join meetings?
- Publication: [Recording provenance of workflow runs with RO-Crate](https://doi.org/10.48550/arXiv.2312.07852)
- [see review below](#Review-of-Recording-provenance-of-workflow-runs-with-RO-Crate)
10. Failed to setup literature review on google docs. Sorry!
- I have lit review in personal notes, will try and copy over for next meeting
## Notes / Action Items
- Include lit review notes on google docs
- Hafiz: how inputs and outputs are related
## Computational Execution Mindmap

## RO-Crate Example
Using Python to create an RO-Crate
I used the [pyrocrate tests](https://github.com/ResearchObject/ro-crate-py/blob/a551acb4d4084c59e32e2fd79bd82686e6b3aaa2/test/test_model.py#L436) to figure out the code below. Documentation is poor.
[Example of capturing software tools](https://www.researchobject.org/ro-crate/1.1/provenance.html#software-used-to-create-files)
```python=
from rocrate.rocrate import ROCrate
from rocrate.model.person import Person
from rocrate.model.entity import Entity
from rocrate.model.computationalworkflow import ComputationalWorkflow
crate = ROCrate()
yaml = crate.add_file("working/pfhub.yaml", properties={
"name": "PFHub meta data file",
"encodingFormat": "text/yaml"
})
csv = crate.add_file("working/free_energy_1a.csv", properties={
"name": "Free Energy",
"encodingFormat": "text/csv"
})
license_id = "https://spdx.org/licenses/CC0-1.0"
wheeler_id = "https://orcid.org/0000-0002-2653-7418"
keller_id = "https://orcid.org/0000-0002-2920-8302"
wheeler = crate.add(
Person(
crate,
wheeler_id,
properties=dict(name="Daniel Wheeler", affiliation="NIST")
)
)
license = crate.add(Entity(
crate,
identifier=license_id,
properties={
"@type": "CreativeWork",
"name": "CC0-1.0",
"description": "Creative Commons Zero v1.0 Universal",
"url": "https://creativecommons.org/publicdomain/zero/1.0/"
}
)
)
crate.license = license
crate.root_dataset["author"] = wheeler
crate.description = "An example of generating an ro-crate from a PFHub result, for now this is only focused on the computational platform, environment and implementation"
#from metadata list on workflow hub https://about.workflowhub.eu/docs/metadata-list/
crate.root_dataset["title"] = "PFHub title: fipy_1a_travis"
```
```python=
keller = crate.add(
Person(
crate,
keller_id,
properties=dict(name="Trevor Keller", affiliation="NIST")
)
)
workflow = crate.add_workflow('https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py')
workflow.programmingLanguage = "Python 3.10"
workflow["creator"] = keller
workflow["dateCreated"] = "2017-01-09"
crate.add(workflow)
crate.write("exp_crate")
```
RO-Crate JSON file
```json=
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"author": {
"@id": "https://orcid.org/0000-0002-2653-7418"
},
"datePublished": "2024-04-19T20:44:03+00:00",
"description": "An example of generating an ro-crate from a PFHub result, for now this is only focused on the computational platform, environment and implementation",
"hasPart": [
{
"@id": "pfhub.yaml"
},
{
"@id": "free_energy_1a.csv"
},
{
"@id": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py"
}
],
"license": {
"@id": "https://spdx.org/licenses/CC0-1.0"
},
"title": "PFHub title: fipy_1a_travis"
},
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"about": {
"@id": "./"
},
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
}
},
{
"@id": "pfhub.yaml",
"@type": "File",
"encodingFormat": "text/yaml",
"name": "PFHub meta data file"
},
{
"@id": "free_energy_1a.csv",
"@type": "File",
"encodingFormat": "text/csv",
"name": "Free Energy"
},
{
"@id": "https://orcid.org/0000-0002-2653-7418",
"@type": "Person",
"affiliation": "NIST",
"name": "Daniel Wheeler"
},
{
"@id": "https://spdx.org/licenses/CC0-1.0",
"@type": "CreativeWork",
"description": "Creative Commons Zero v1.0 Universal",
"name": "CC0-1.0",
"url": "https://creativecommons.org/publicdomain/zero/1.0/"
},
{
"@id": "https://orcid.org/0000-0002-2920-8302",
"@type": "Person",
"affiliation": "NIST",
"name": "Trevor Keller"
},
{
"@id": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py",
"@type": [
"File",
"SoftwareSourceCode",
"ComputationalWorkflow"
],
"creator": {
"@id": "https://orcid.org/0000-0002-2920-8302"
},
"dateCreated": "2017-01-09",
"name": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard",
"programmingLanguage": "Python 3.10"
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
"@type": "ComputerLanguage",
"alternateName": "CWL",
"identifier": {
"@id": "https://w3id.org/cwl/"
},
"name": "Common Workflow Language",
"url": {
"@id": "https://www.commonwl.org/"
}
}
]
}
```
## Review of "Recording provenance of workflow runs with RO-Crate"
- Presents Workflow Run RO-Crate (WRROC)
- 3 new profiles
- Has concept of retrospective and prospective provenance built into profiles
- RO-Crate describes a directory structure in its simplest form
- The Workflow Run RO-Crate is a set of 3 profiles that extends RO-Crate
- WRROC strikes a balance between actionable and readable. The profiles are types.
- [List of requirements](https://www.researchobject.org/workflow-run-crate/requirements)
- Containers
- memory usage
- config files
- env files
- timings
- success / failure status
- inputs / outputs
- versioning
- scripts
- parameters
- Each requirement is linked ot a github issue!
- 3 types of workflow run crates
- Process run crate (describe the exectusion as one or more tools)
- includes human executions
- poorly defined
- exectution of multiple software apps
- allows "composite" data sets
- workflow run crate (predefined workflow)
- well defined
- provenance run crate (workflow computation including internal details)
- internal details such as inputs / outputs between steps
- Designed to have workflows rerun -- reproducible
- inheritance mechanism allows reuse of common parts of descriptions
- 7 different workflow systems now using WRROCs (includig Galaxy)
- All crates are included in Provenance Run WRROC.
- Supposedly [runcrate](https://github.com/ResearchObject/runcrate) should allow WRROC to actually be executed?
- Documentation is poor though
- Allows data to be described at very different levels of granularity
- runcrate toolkit will be expanded it says
<!-- links -->
[meet]: https://meet.google.com/bas-vkxi-rmq
[repo]: https://github.com/marda-alliance/phase-field-schema
[docs]: https://drive.google.com/drive/u/1/folders/1zhUi3A-CXxrkh4gTkLVUOncdqAMIAXND
[previous]: https://github.com/marda-alliance/phase-field-schema/blob/main/meeting-minutes/meet-008_2024-03-27.md
[proposal]: https://github.com/marda-alliance/phase-field-schema/blob/main/proposal.md
[resources]: https://github.com/marda-alliance/phase-field-schema/discussions/5
[overleaf]: https://www.overleaf.com/project/663e34cc1c8095115e0de913