Data deposition

--- title: Data deposition teaching: 20 exercises: 20 questions: - What is a data repository? - What types of data repositories are there? - Why should you upload your data to a data repository? - How to choose the right database for your dataset? objectives: - Define what is data repository. - Illustrate the importance of indexed data repository - Summarize the steps of data indexing in a searchable repository keypoints: - FAIR guiding principle adressed (F4) - (Meta)data are registered or indexed in a searchable resource - FAIR guiding principle adressed (R1.1) - (Meta)data are released with a clear and accessible data usage licence --- #### What is a data repository? A data repository is a general term used to describe any storage space you use to deposit data, metadata and sometimes associated research. Please note that a database is more specific and it is mainly for the storage of your data. ###### Types of data repository Data repositories are classified based on **the purpose of data repository** into: A) Controlled access repository for sensitive data: explained in details in [data sharing lesson of RDMkit](https://rdmkit.elixir-europe.org/sharing) and we will explain this type of repository in the next episode B) Discipline specific repository: there are known repositories for different data types e.g [ArrayExpress](https://www.ebi.ac.uk/biostudies/arrayexpress) for high-throughput functional genomics experiments C) Institutional repository: In case you can not find a suitable repository for your data set, some universities have their own general purpose repositories. For instance, [University of Reading Research Data Archive](https://researchdata.reading.ac.uk) has a general purpose repository that has similar features e.g. controlled access to other databases. It can be used by both students and researchers. D) General data repository: multidisciplinary or/and general-purpose open data repository, open for all scholars e.g. [zenodo](https://zenodo.org) **Figure 1 summarizes these types with different examples** ![Figure 1 Types of data repository with different examples, CC.BY from re3data.org](../fig/img56.jpg) ### Why should you upload your data to a data repository? To improve data findability, your data should be uploaded to a public indexed repository, preferably with accompanying metadata, where it can be searched and found. This will make it compliant with the fourth extended principle of findability (F4) which states that **(Meta)data are registered or indexed in a searchable resource**. Examples of these databases are [ArrayExpress](https://www.ebi.ac.uk/biostudies/arrayexpress) for high-throughput functional genomics experiments. These databases have a set of rules in place to make sure that your data will be FAIR. After you upload your data into this database, they are assigned an ID and are indexed. Indexing helps researchers find your data by using persistent identifiers, keywords or even the name of researcher who produced the data. Take a look at the [ArrayExpress](https://www.ebi.ac.uk/biostudies/arrayexpress) database where all datasets are indexed, and you can simply find any dataset using the search tools. By indexing data, you can get the dataset using any keyword other than ID. For example, if you want to locate **human NSCL cell** lines, you can just type this into the search toolbox. Use different keywords like **cartilage, stem cells and oesteoarthritis** and you will find the same dataset. Indexing and registering datasets, also means they are curated in such a way that you may discover them using different keywords. You can find the same dataset by using its identifiers or by using keywords chosen by the dataset's authors to describe it. ![When you upload your dataset to a database, it can be curated and easily found using different keywords](../fig/img54.png) ![By indexing your dataset, you can retrieve it using its PID](../fig/img55.png) > ## Exercise 1: Indexing dataset in the data repository > [FAIRcookbook](https://faircookbook.elixir-europe.org/) is an online open resource containing general guidance and specific 'how to' guides (recipes) that help you to make and keep your data > > FAIR. It also includes information about how general data-repositories can be indexed. > The basic unit of the FAIR cookbook is called a recipe, The recipe is a term used to describe instructions for how to FAIRify your data, and is written by practitioners, targeting very specific and common tasks. As you see in the > image, the structure of each recipe includes these main items **Figure 2**: > 1- Graphical overview which is the mindmap for the recipe > 2- Ingredients which gives you an idea of the skills needed and tools you can use to apply the recipes > 3- The steps and the process > 4- Recommendations of what to read next and references to your reading > ![Figure 2. FAIRcookbook recipes structure](../fig/img4.png) > Please use **FAIRcookbook** to find out and discuss required steps on how to obtain index for your dataset? > When navigating the homepage of the FAIRcookbook, you will find different tabs that covers each of the FAIR > principles, so for instance, if you want recipes on **Accessibility** for FAIR, you will find recipes > that can help you make your data more accessible. > For a quick overview, you can also watch our RDMBites on FAIRcookbook [FAIRcookbook RDMBites](https://drive.google.com/drive/folders/16XZtCWBR-F3cvDHkB7A8jkjj6wvQ7sOr) >> ## Solution >> - **Follow the following steps to find the recipe:** >> >> 1- In this exercise, we are looking for a recipe on **indexing or registering dataset in a searchable >> resource** which you can find it in the findability tab, **Can you find it in this picture?** >> ![Figure 3. Recipes of FAIRcookbook where you will find different recipes for FAIR, infrastructure, assessment and >> maturity models](../fig/img51.png) >> >> 2- Click on the findability tab >> >> 3- on the left side, you will find a navigation bar which will help you find different recipes that make >> your data **findable**. >> ![You can find on the left side the list of recipes to make your data findable](../fig/img52.png) >> >> 4- As you can see here, you will find a recipe on registering datasets with Wikidata and another one on >> depositing to generic repositories-Zenodo use case >> **Once you click on one of these resources, you will find the following:** >> >> A) Requirements to apply the recipe to your dataset >> B) The instructions >> C) References and further readings >> B) Authors and licence >> ![Figure 4. Zenodo use case where you will get step by step guideline on how to deposit your data to Zenodo](../fig/img53.png) >> >> In our specialized courses, we will give you examples on how to upload your data to discipline specific repository > {: .solution} > {: .challenge} ### Uploading your data to a database will make your data visible through the following: 1- Databases assign a unique persistent identifier to your data. 2- Your data will be indexed and curated, making it easier to find. 3- Some databases make it simple to connect your dataset to other datasets and link metadata to other datasets **linked metadata** 4- Dataset licencing: some databases offer controlled or limited access to protect your data. > ## Exercise 2: Choosing the right database for your dataset > [FAIRsharing](https://fairsharing.org/) helps researchers identify suitable data repositories, standards and policies relating to their data. It also contains > > the latest policies from governments, funders and publishers of data. > Please use **FAIRsharing** to identify data-repositories for plant genomes? Think about one other example of domain-specific dataset, Identify and discuss data-reposity for it? > >> ###Solution >>The following short video shows process of identifing sutitable data repository for plant genomes using FAIRsharing** >> ![Screen recording showing the search process in FAIRsharing](../fig/m1.gif) > {: .solution} > {: .challenge} > ## Resources > > - FAIRcookbook recipe on [Depositing to generic repositories- Zenodo use](https://faircookbook.elixir-europe.> org/content/recipes/findability/zenodo-deposition.html) > - FAIRcookbook recipe on [Registering Datasets in Wikidata](https://faircookbook.elixir-europe.org/content/ > recipes/findability/registeringDatasets.html) > - RDMkit guidelines on [Data publications and depostion](https://rdmkit.elixir-europe.org/data_publication) > - RDMkit guidelines on [Finding and reusing existing data](https://rdmkit.elixir-europe.org/existing_data) > - FAIRcookbook recipe on [Search engine optimization](https://faircookbook.elixir-europe.org/content/recipes/> findability/seo.html) > - FAIRsharing offers a nice portal to different [examples of databases](https://fairsharing.org/search?> > fairsharingRegistry=Database&subjects=life%2520science&page=1) {: .callout}

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.