Data handbook task

# Data handbook task ## JSON interchange format proposal Process: 1. Swarm Team creates set of JSON files, one for each product, each mapping to one data handbook catalogue page. Each file contains the complete(?) information to be presented on the page. 2. These JSON files are passed to the Web Team to be processed to build the catalogue pages. 3. The files are stored on GitHub at: https://github.com/smithara/swarm-handbook-experiment/tree/main/json/catalog 4. Updates to pages are communicated through updates of these JSON files Editor tool for those JSON files: 1. http://140.238.64.100/ (built by `json_creator.ipynb` in repo above) 2. Swarm Team use this tool to create JSON files; the tool enforces that the file fits the decided schema 3. Files in repo updated and Web Team notified Schema for JSON files in progress: ![](https://i.imgur.com/gSXT0zs.png) Notes [here](https://docs.google.com/spreadsheets/d/e/2PACX-1vSGGal3OprEYehA2Nz8eoNPDQZxGRQNAzDKiDEEOA1xKThKMsxVx5xDCCEcAcjXeke2goKyIT39gM5D/pubhtml#) **Questions:** 1. Will such a process and format work for the Web Team? - *in principle yes; waiting for feedback from the developers* 2. Some fields, like `description` and `details` will contain HTML - will that work? - *yes; might require some edits to add classes to tables etc* 3. The `variables_table` contains a table in CSV - is that okay? - *probably* 4. Don't know how to handle extra tables and images. - Tables: Contained within `details` as simple HTML tables? (these are then not easy to query programmatically like the `variables_table`) - *probably handle as HTML tables in `details`* - Images: Perhaps limit to just one image per product, stored as a separate file? - *probably best to upload to EarthOnline then reference by URL* --- ### Fields to add to LD+JSON Follow schema.org/Dataset ref. ESA heliophysics work: ![](https://i.imgur.com/Cmdr9qG.png) - which ones should we add for Swarm? - Minimum is `name` and `description`? - Add these as an object within the JSON files we produce? ### SPASE records - what to do to aid the later creation of SPASE records? --- --- --- ## Old notes Overall goal is to get data more discoverable and understandable, and to improve the process for describing metadata. To go from pages like: - https://earth.esa.int/eogateway/missions/swarm/product-data-handbook/level-1b-product-definitions - https://earth.esa.int/eogateway/missions/swarm/product-data-handbook/level-2-product-definitions to something like https://smithara.github.io/swarm-handbook-experiment/cards-interactive-test.html backed by a database of sorts, being mindful that ESA Earth Online is trying to organise datasets under https://earth.esa.int/eogateway/search?text=&category=Data&filter=swarm&subFilter=data%20description&sortby=RELEVANCE and that there are wider projects for common metadata standards like [SPASE](https://spase-group.org/data/model/spase-2.4.1/index.html) that should be considered. Swarm data must be made more [FAIR](https://www.jisc.ac.uk/guides/rdm-toolkit/fair-principles-in-research-data-management). ### Systematic way to store the *content* - Store in machine-readable (+ human-editable) structured form --> json, csv, markdown files (+ probably requires image files, referenced from the json) - Shared schema across all products to keep things systematic --> Decide what fields are required to adaquately house the content --> Plan for this schema to allow conversion to [SPASE](http://spase-group.org/) records later on - Find a reasonable workflow for DISC members to update this content (and automatically preview the output, to allow an approval process) We need to decide the first iteration of this, and we will be responsible for writing this content (decisions need to be made along the way where understanding of the content is important). Prototype at: https://github.com/smithara/swarm-handbook-experiment Move toward something more complex, e.g.: ![](https://i.imgur.com/ZfIrCff.png) - How to handle images and linking from Markdown? - How to include tables? ### Generation of *presentation* from the content - A system to consume the content files from above, and generate HTML, XML, PDF ... - Each product must be reachable at a static link that can be used to reference the product - Must allow some interactivity to search the handbook in appropriate ways (e.g. textual search, filter by categories) - Prototype at: https://smithara.github.io/swarm-handbook-experiment/cards-interactive-test.html - Also interesting: https://flatgithub.com/smithara/swarm-handbook-experiment?filename=input%2Foverview.csv ### *Processes* where the handbook is involved - Adding new products - Updating products There is some process of updating, verification and approval, and publication. How can this be replicated in the new handbook? Could be done through a formally managed GitHub repository: changes proposed through a Pull Request (triggering an automatic preview) and then approved by an admin.