changed 7 months ago
Published Linked with GitHub

Analysis on Versioning

  1. For publishing a new version of a dataset, the publisher needs to clone the dataset from previous versions (e.g., From v1), and we can call it an editing version until it is published.

    • We need to restrict publishers from editing already published datasets but allow them to edit the editing version of the dataset.
    • Once it's sent for admin review and the admin approves it, it should add some automatically generated metadata like 'has_version', 'is_version', and version, etc.
  2. All versions of the dataset will be listed on the package_search API, which means on the dataset list page, it will be listed there as well. Users will see all version of dataset in dataset list

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  3. Once the publisher clones the dataset in order to publish a new version of the dataset:

    1. They click the manage button or another simalar button with renaming "Publish new version"

      Image Not Showing Possible Reasons
      • The image was uploaded to a note which you don't have access to
      • The note which the image was originally uploaded to has been deleted
      Learn More →

    2. They get dataset edit metadata, update required changes there

      Image Not Showing Possible Reasons
      • The image was uploaded to a note which you don't have access to
      • The note which the image was originally uploaded to has been deleted
      Learn More →

    3. They click either "Send to review" or "Save as Draft." Both actions save the dataset as a draft. The only difference is that "Send to review" redirects to the dataset review page so that publisher can direclty submit it for admin reivew without updating other things like files update and collaborators, while "Save as Draft" redirects to the same page.

      Image Not Showing Possible Reasons
      • The image was uploaded to a note which you don't have access to
      • The note which the image was originally uploaded to has been deleted
      Learn More →

    4. For updating file, they need to click the "Upload Files" tab in order to upload new files and click "send to review" either for submitting for sysadmin to review or "save as draft."

      Image Not Showing Possible Reasons
      • The image was uploaded to a note which you don't have access to
      • The note which the image was originally uploaded to has been deleted
      Learn More →

    5. If the publisher wants to manage collaborators, they need to go to the collaborator tab, then invite/add collaborators and "send to review."

Questions

  1. Can we ask the client to provide a good example of what should be in 'has_version' and 'is_version'?
  2. I believe on the dataset listing or on the package_search API result, it should only include the latest version of the dataset. How are we going to track which version is the latest one?
  3. I have added the workflow for publishing a new version of the dataset. Should we confirm with them if it is okay for them?
  4. When cloning a dataset, we also need to clone collaborators and activities for each version, which is tricky since they are not part of the dataset metadata directly. Even i don't know cloning activities is possible ?

From a database perspective, this approach is not ideal as it results in keeping records for each version, potentially overloading tables. For example if we see package_extras its creates key value record for each custom metadata and that will be per package and having another version means creating another same as that.

Screenshot 2024-05-31 at 9.54.23 AM

Select a repo