Prep for call with Laurie/Phil

Prep for call with Laurie/Phil === ## Principles - Any open data set requires a **chain of custody** so it can be used for research in the future. - Use Data Rescue Events to **build local networks** of librarians and technologists working together to set up a sustainable effort. - **Metadata must be considered upfront**, before any archiving takes place. By creating **high quality metadata** we will enabling high quality archiving. - Collaboration should take place on **open platforms that allow anyone to contribute**. - Embody **open source and open science values** and focus on lowering barriers to contribute. - Librarians will ultimately be in control of the process, and set the standard for metadata quality as **they are expert data and metadata curators**. - Encourage **lots of copies** of data housed in lots of institutions ## Goals - **Minimize maintenance bottleneck** for organizers, e.g. libraries should be able to move forward without being blocked by infrastructure. - Enable many groups to produce **high quality metadata** in parallel. - Meaningfully **engage large numbers of volunteers** at events. - Encourage **local long term community building**, overlapping libraries and Data Refuge groups - **Empower librarians to assume curatorial roles supported by DataRefuge community**. - Libraries can host what data they can, and coordinate with others to **ensure good coverage** ## Proposed Pilot Workflow Note this workflow is meant to test a relatively simple set of instructions for the purposes of informing future more robust workflows ## Goal of pilot: Get each library to adopt ~100 datasets and produce metadata + backups Metadata workflow for libraries pilot: - Dat can slice up from Data.gov/IA/Climate Mirror/Archivers.space etc to produce '100 dataset slices' for people to adopt - Metadata will come from Max's "list of lists" https://github.com/datproject/svalbard - Create a spreadsheet of agencies and departments. - Start with 881 known federal departments - Segment metadata by these departments (using TLD in domain or data.gov organiation metadata) - Filter departments by ones that have metadata - Spreadsheet now has a list of departments ready for adoption - Note: one department may still have thousands of datasets, so there may be multiple slices - Libraries that want to participate will be assigned a slice - Dat can "slice on demand" (like a deli) and send them a metadata file - Libraries update spreadsheet to reflect their adoption of a slice - Libraries improve metadata quality (could use DataRefuge events to help) - Libraries back up datasets associated with metadata - Can host it themselves or on DataRefuge.org - Libraries create a GitHub Pull Request in DataRefuge GitHub - Fill out Pull Request to add a data.json file - We can write a bot that validates these data.json files (e.g. https://github.com/jlord/patchwork/pull/17067) - Comments on the PR will give places for contributors to add qualitative information - data.json should include: - sha256 hash of backed up data file - url of mirror(s) - original url - time of capture (when it was mirrored/downloaded) - response headers and status codes of capture - organizational metdata ('organization' field) - dataset descriptive metdata ('description', 'maintainer', 'agency') - Archiver Space ID - UUID - Any other exisiting ID, e.g. EPA ID - update periodicity - Finally, once metadata is merged through GitHub, and validated, it can be published to DataRefuge.org for discovery and use ## Future Features Post-Pilot - GitHub only for first pilot (for simplicity), can consider other metadata submission workflows later - Data.gov style data.json metadata harvesting endpoints - CKAN direct publishing (CKAN doesn't have equivalent of Pull Requests so is hard to scale out collaborations)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.