owned this note
owned this note
Published
Linked with GitHub
Supporting Projects in DataONE
-------
### Use Cases
- Group your own and other's datasets into specialized collections, called projects.
- Create and edit projects with a web-based form in MetacatUI
- Display your project in MetacatUI with branding, custom text and image content, and a human-readable name in the URL (e.g. `search.dataone.org/portals/SASAP`)
- Search for projects in MetacatUI
- Add your dataset to a project via a widget in MetacatUI
### Suggested changes to DataONE infrastructure
✅ Check marks denote tasks that are already completed
#### Support Project objects
- ✅ Develop a `Collection` and `Project` XML shema for describing projects in an XML document, to be uploaded to DataONE member nodes as a first-class object
- Collection schema: https://github.com/NCEAS/project-papers/blob/master/schemas/metacatui-collection.xsd
- Project schema (extends the Collection schema): https://github.com/NCEAS/project-papers/blob/master/schemas/metacatui-project.xsd
- Host the schema at a dataone.org web location (purl.dataone.org or ns.dataone.org?) and give it an official namespace
- **Decision** (pending review by DataONE):
- https://purl.dataone.org/collections-1.0.0
- https://purl.dataone.org/portals-1.0.0
- Change "project" to "portal" in the XML schema
- Add the project namespace as a new format in the DataONE format list
- Is currently registered in STAGE2: https://cn-stage-2.test.dataone.org/cn/v2/formats/http://ecoinformatics.org/project-beta1
- ([name=Rob]) the format id is of type DATA, which means it will not synchronize to the CN. Maybe it should be of type METADATA?
- **Decision**:
- Use formatType METADATA
#### Connect datasets and projects in resource maps
- Add a widget to MetacatUI that presents users with a list of Project names. The user selects a project name to add their dataset to that project.
- MetacatUI will add `dcterms:hasPart` and `dcterms:isPartOf` relationships in resource maps to connect metadata objects to a project object. The project seriesId should *always* be used in these relationships.
[hasPart:](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/2012-06-14/?v=terms#hasPart)
> Definition: A related resource that is included either physically or logically in the described resource.
[isPartOf:](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/2012-06-14/?v=terms#isPartOf)
> Definition: A related resource in which the described resource is physically or logically included.
>
**Decision**:
- Lauren will look into how quick of a task it would be to allow pids as well as sids. - If too long, supports sids only for now
##### Example RDF XML snippet:
```xml
<!-- Metadata object -->
<rdf:Description rdf:about="https://cn-stage-2.test.dataone.org/cn/v2/resolve/urn%3Auuid%3A6b629d7c-923f-4d94-9a9a-fdea556266be">
<terms:identifier>urn:uuid:6b629d7c-923f-4d94-9a9a-fdea556266be</terms:identifier>
<cito:documents rdf:resource="https://cn-stage-2.test.dataone.org/cn/v2/resolve/urn%3Auuid%3A74ce2b20-61d0-43f6-b9cc-d29fd0ea5570"/>
<cito:isDocumentedBy rdf:resource="https://cn-stage-2.test.dataone.org/cn/v2/resolve/urn%3Auuid%3A6b629d7c-923f-4d94-9a9a-fdea556266be"/>
<ter:isAggregatedBy rdf:resource="https://cn-stage-2.test.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Ab7a54277-0684-484f-a3f3-672f8862d539#aggregation"/>
<terms:isPartOf rdf:resource="https://cn-stage-2.test.dataone.org/cn/v2/resolve/urn%3Auuid%3A7c959e15-d309-4850-91e8-399c0e862893"/>
</rdf:Description>
<!-- Project object -->
<rdf:Description rdf:about="https://cn-stage-2.test.dataone.org/cn/v2/resolve/urn%3Auuid%3A7c959e15-d309-4850-91e8-399c0e862893">
<terms:identifier>urn:uuid:7c959e15-d309-4850-91e8-399c0e862893</terms:identifier>
<terms:hasPart rdf:resource="https://cn-stage-2.test.dataone.org/cn/v2/resolve/urn%3Auuid%3A6b629d7c-923f-4d94-9a9a-fdea556266be"/>
</rdf:Description>
```
#### Connect datasets and projects using Solr index queries
- ✅ MetacatUI uses the `filters` in the Project document to construct a Solr query. The search results of that query are presented as the datasets included in that Project.
- ✅ The `isPartOf` filter is automatically added as a search term by default when new projects are created, so that datasets added to the project manually by the dataset owner are included in the search.
#### Indexing Projects
- ✅ New Solr index fields for the Project documents:
- `projectName` - The short label used to uniquely identify a project
- **Decision**:
- Projects need to be accessible by seriesId as well as project name, in case projects with duplicate names are uploaded
- Change "projectName" Solr field name to "label" and update the field description
- `logo` - The URL or identifier for an image
- See: https://github.com/NCEAS/metacat/blob/dd04d30c329fc1b8eb8627b28ac42d1b5cef7c25/metacat-index/src/main/resources/application-context-project-beta1.xml
- Currently deployed on dev.nceas.ucsb.edu
- ✅ New Solr index fields for the `hasPart` and `isPartOf` fields
- `hasPart`
- `isPartOf`
- This required changes to the ResourceMapSubprocessor to add the ability to look up Solr documents by seriesId
- The values for these fields are extracted by SparQL queries. See: https://github.com/NCEAS/metacat/blob/e2639f36ecb1b46788c116cd8ff61994feecc29f/metacat-index/src/main/resources/application-context-projects.xml
- Example of a Solr doc for a Project:
- [Project `lauren-test` on dev.nceas](https://dev.nceas.ucsb.edu/knb/d1/mn/v2/query/solr/?q=projectName:*%20-obsoletedBy:*)
- Query to search for metadata that are part of Project `lauren-test`, using the `isPartOf` field:
- https://dev.nceas.ucsb.edu/knb/d1/mn/v2/query/solr/?q=-obsoletedBy:*%20isPartOf:%22urn:uuid:7c959e15-d309-4850-91e8-399c0e862893%22
**Decision:**
- Add a Solr field for the query string of each collection. The query string will be used by various dataone tools to find the exact same datasets in that collection
#### Allow navigation to Project views using a human-readable short name/label
- ✅ MetacatUI uses the `projectName` Solr field to identify a Project by it's name. The first Project to be uploaded with that `projectName` is used. So projects with duplicate names can be inserted into the member node and indexed successfully, but MetacatUI only uses the one that was uploaded first.
- The Project editor will prevent users from reusing an existing `projectName` by searching Solr
#### Adding Metadata Quality charts to ProjectViews
> [name=Peter]
- Aggregated quality statistics have previously been stored in a Solr index. This is being replaced with a backend facility that will run on the quality k8s cluster. Aggregated reports will be generated that match the multiple 'facets' used to define a project (the query that returns the set of pids belonging to the project). These facets will be specified in a service that will retrieve the pre-generated graph, and then displayed. Note that a project identifier might be used to tag and retrieve the graphs (e.g. "SASAP", "DOB"). See https://github.com/NCEAS/metadig-engine/issues/218
- the aggregated graph generating facility needs to:
- identify all projects on an MN or all of DataONE (use "projectName" Solr field)
- retrieve a list of pids that are part of each of these projects (use "isPartOf" Solr field w/project seriesId)
- Items to be completed:
- finish the aggregated graph facility
- implement the query interface to retreive graphs
- update MetacatUI to retrieve the appropriate graph
#### Adding Metrics charts to ProjectViews
> [name=Rushiraj]
- Metrics Service currently supports aggregated citation / usage stats for repository and user profiles.
- To include the metrics on the project pages - we'll need a list of identifiers (i.e. `datasetIdentifierFamily`) for every dataset that belongs to that project.
- Background: [notes on the ES identifiers index](https://hpad.dataone.org/GYBgLA7AhiDGBMBaARgUwMxkWAJhCiAHOsKovAIyigiYVTBA#)
- To retrive the datasets that belong to a particular project:
- (approach 1): Use SOLR to retrieve identifiers based on `is-part-of` and `has-part` relation.
- Step1: Metrics Service API queries SOLR to retieve the the dataset `pids` that have the project `seriesId` mentioned in the `is-part-of` relation in the metadata document.
- Query: `?q=-obsoletedBy:* isPartOf:"seriesId"`
- Step2: Generate an aggregated list of `datasetIdentifierFamily` for each of the dataset `pid` from the `identifiers` index.
- Step3: Retrieve the usage / citation stats based on the aggregated `datasetIdentifierFamily`
- (approach 2): Index `is-part-of` and `has-part` relations in the identifiers index.
- Step1: Generate an aggregated list of `datasetIdentifierFamily` based on the indexed `is-part-of` relation.
- Step2: -same as Step3 above-
- Points to consider:
- To include the `is-part-of` / `has-part` / `projectIdentifiers` information in the `identifiers` index - we'll need an additional process to keep this information in sync with solr.
- TODO:
- add support for project page metrics to the API.
- generate ES queries for aggregation.
- update and maintain the `identifiers` index with the project information. (approach 2)
#### Managing and Authorizing Memberships
> [name=Chris]
Projects are planned as a paid feature in DataONE Memberships, along with other offerings like Hosted Repositories and Archive Storage. These services require tracking of products, orders, payments, quotas and quota usage. Member Nodes that host the paid services will need to enable quota checking against a quota service.
- ✅ Design a membership management architecture
- In progress, see https://github.com/csjx/d1-membership-plan-mgmt/blob/master/membership-plan-management.rst
- TLDR; Products are defined with Quotas, and are ordered by Customers. Quotas are set per Customer by ORCID or by DataONE-defined group. Quota usage is harvested hourly/daily from the CN.
- Implement the membership management architecture
- Provide a REST CRUD API for managing products, orders, quotas, etc. to be run as a containerized microservice on the UCSB (and potentially ORC) Kubernetes cluster
- Update Metacat (and other MN stacks implementing quotas) to consult the quota service on `create()` and `update()` calls based on `rightsHolder` values.
- Update MetacatUI to provide quota views per `Subject`
- Update MetacatUI to show quotas in the `ProjectListView`
- Design the front-end membership sign-up and management pages
- See wireframes as a starting point: https://invis.io/9NS0LU7SP73
- Implement the front-end membership management pages in either MetacatUI or on dataone.org
#### MetacatUI Project Editor MVP Implementation Timeline
The MVP has been broken into eight phases. An MVP could optionally be released as early as Phase 6. (Which would only exclude color and access policy editing)
1. [The Project Editor MVP - Part 1](https://github.com/NCEAS/metacatui/milestone/69): Create a basic skeleton of the editor - parsing project documents, checking authorization, and serializing and saving the project back to the server.
- *3 issues remain open. 2 are in review/testing and the third is blocked by one in testing.*
2. [The Project Editor MVP - Part 2:](https://github.com/NCEAS/metacatui/milestone/70) Creating and manipulating the filters that define the dataset collection
- *9 issues open (3 in progress), 8 closed*
3. [The Project Editor MVP - Part 3:](https://github.com/NCEAS/metacatui/milestone/71) Creating, deleting, and renaming sections
4. [The Project Editor MVP - Part 4:](https://github.com/NCEAS/metacatui/milestone/72) Basic Markdown and Settings sections.
5. [The Project Editor MVP - Part 5:](https://github.com/NCEAS/metacatui/milestone/74) Adding images to Markdown sections and logos
6. [The Project Editor MVP - Part 6:](https://github.com/NCEAS/metacatui/milestone/76) Finishing touches - lots of CSS, adding Edit and New project links, marketing pages and graphics.
7. [The Project Editor MVP - Part 7:](https://github.com/NCEAS/metacatui/milestone/75) Editing Colors.
8. [The Project Editor MVP - Part 8:](https://github.com/NCEAS/metacatui/milestone/77) Permissions Panel for sharing edit access with others.
(See https://github.com/NCEAS/metacatui/milestones for more details)