autoscale: true
slidenumbers: true
## **Dryad2Dataverse** - Introduction
---

## CRKN Conference 2022
- Paul Lesack and Eugene Barsky
- University of British Columbia
- [research.data@ubc.ca](mailto:research.data@ubc.ca)
---
## What is it?
- A tool to **harmonize** data collections between **Dryad** and **Dataverse**
- A stand-alone application
- A Python library
^
- we have developed a suite of Tool(s) to facilitate translation of data from Dryad to Dataverse, from start to finish
- Both a downloadable piece of software and (hopefully) a development tool
- Hopefully easy to use for the community
---
## Why did we develop that?
- [**UBC Dataverse Collection**](https://borealisdata.ca/dataverse/ubc) @Borealis is our prime repository
- There is a lot of UBC-authored data in Dryad
- Wanted to harmonize and digitally preserve data studies
- Data sets can be consolidated in one place: Borealis Dataverse
^
- UBC's "primary" research data repository is a Dataverse repository at .
- Other UBC users have deposited their data sets into ; there is a large UBC contingent of data there — over 500 studies
- Collection is split between two (or more, really) repositories: 50% Dryad, 50% Dataverse. That's not ideal.
- Scholars Portal Dataverse is already aligned with Geodisy for national-level geospatial searching
- SP Dataverse is connected to UBC's Summon instance
---
## Imagining the software
- Simple
- Modular
- System-neutral
- Could be scheduled
- Runs from command line
^
- Simple enough to be used by users with little or ideally no programming experience
- Modular - not all components should be required
- System neutral
- No requirement of server overhead
- Ideally, a piece of software that would run from the command line with basic information supplied by an end user
- Should be able to be scheduled
---
## Technical overview
- **API** to **API**
- A database for persistence and control
^
* Dryad and Dataverse both have relatively well documented Application Programming Interfaces (APIs), so it would make sense to use a programmatic approach to transfer the data.
* The software sits between the two APIs and transfers data from one to the other.
* A tiny database monitors changes
---
## Steps
- Create a **metadata crosswalk**. Most important step.
- Analyse Dryad's UBC datasets
- Native **Dataverse JSON** as import
^
* Arguably the most important part of the whole project is mapping the output of Dryad to the input of Datavers
* Fortunately, the UBC Library Research Commons has a lot of experience with that from numerous migrations and its work with Scholars portal.
* Will it work? Have to analyze the datasets.
* are there too many large files that won't fit into Dataverse due to size limitations?
* Can things be translated to Dataverse's complex native JSON?
* Finally, start programming
---
## Building Parts
- serializer
- transfer
- monitor
^
* The resulting Python library (which you can download) is the basis of the application has three primary components working in a sequence.
In essence,
* a translator module (serializer),
* an upload module (transfer)
* and a monitor (cleverly called monitor).
All of that is great if you want to do your own programming, which you don't.
---
## Features
- Command line program
- Using **RORs** as institutional output
- Self-contained
- Auto saves database on every run and creates time-stamped backups
^
* We built a command line program to convert, upload and monitor Dryad studies
* Takes institutional ROR as an input, plus a number of other well known inputs, like the location of the Dataverse installation and an API key
* Requires zero knowledge of Python (with the possible exception of installation)
* With the binary versions even that is not required
* Completely self-contained
* Each run of the software creates a timestamped database backup so that disasters can be averted.
---
## Features
- Can be run at any interval as each run is a self-contained crawl
- Can send email status messages to multiple recipients
^
* Options to skip problematic studies, such as those that are too large to be uploaded to a Dataverse installation
* Doesn't require any particular plaform or any special privileges except the capability to write to local storage. Runs from Linux, PC, Mac, your cell phone, etc.
---
## See it in action
- **End result** -
[https://borealisdata.ca/dataverse/UBC_DRYAD](https://borealisdata.ca/dataverse/UBC_DRYAD)
**Code** -
[github.com/ubc-library-rc/dryad2dataverse](github.com/ubc-library-rc/dryad2dataverse)
- **Documentation** -
[ubc-library-rc.github.io/dryad2dataverse/](ubc-library-rc.github.io/dryad2dataverse/)
---
## Contact us
- **Paul Lesack** and **Eugene Barsky**
- UBC Library, Research Commons
- [research.data@ubc.ca](mailto:research.data@ubc.ca)