RDM live questions / Knowledge base

# RDM live questions / Knowledge base ###### tags: `training`, `data` :::info How to **use** this document: - While working on the exercice, please write here your questions (Q). - We will answer in this document, too (A). How to **edit**. You can either : - Click on the pencil ![](https://i.imgur.com/LcOPXSk.png) on the top-right corner of the page - Type "Ctrl+Alt+E" Need help with the syntax of this document ? - Look for the help under ![](https://i.imgur.com/v958tfb.png) on the top of this page ::: ## Data documentation * **Q1**: ... * **A1**: ... * **Q2**: ... * **A2**: ... ## Data creation / acquisition * **Q1**: Could you specify what is meant by "open format"? * **A1**: Thorough explanation : http://opendatahandbook.org/guide/en/appendices/file-formats/#open-file-formats. An open format is one where the specifications for the software are available to anyone, free of charge, so that anyone can use these specifications in their own software without any limitations on re-use imposed by intellectual property rights. * **Q2**: ... * ... ## Data storage / sharing * **Q1**: Any suggestions on platform to share large >50gb data with peer-reviewers or collaborators that wouldn't be public yet. * **A1**: In Zenodo it is possible to load datasets larger than >50 GB if you ask them directly, also for free. Then, for Zenodo datasets it is possible to set "restricted acces". Also, try to see if https://idr.openmicroscopy.org/about/submission.html allows for a closed upload, as they say "*Dataset size is typically not an issue, but for sizes significantly larger than 1000 GB special planning may be needed*" * **Q2**: How can I setup an automatic and periodic back up of my data? Is it a good practice to have one back up on a hard drive and a second back up on a cloud storage system? ## Data processing * **Q1**: ... ## Data analysis * **Q1**: ... ## Ethical issues * **Q1**: ... ## Intellectual Property issues * **Q1**: Who is in charge of this matter in terms of an international collaboration for a publication? * **A1**: regardless of the international character of the collaboration, the issue should be agreed upon BEFORE the project starts. Normally the Principal Investigator would have the main authority on this, but there could be specific arrangements. * **Q2**: How and what are the considerations involved in sharing data from third party that we processed into an open or more easily usable format, if possible at all? * **A2**: From a legal standpoint, if there's no contract or other binding agreement nor limiting license attached to the 3rd party data, then there shouldn't be any problem at all. Following the motto "*As open as possible, as closed as necessary*", once one sees no legal barriers, then the only considerations left are about conconvenience: if it is useful (ex. in a collaboration) to share data converted in a more open format, then just do it. In most cases, even if a license is on the way, it's simply a matter of asking the permission to the previous authors, so in principle this shouldn't be a problem. ## Data publication * **Q1**: Is there any copyright issue if I upload the same paragraph as the published paper on Zenodo? * **A2**: if the journal is "cool" with it, no problem. BTW, as of now, all your papers should in principle be made open access at max 6 months after publication. This doesn't mean that you can plagiarize others, but of course (depending on the specific terms of the editor) you can surely rewrite your own words (or upload the same data :smile:). * **A2b**: the abstract is usually without too many restrictions. ## Data preservation / archiving * **Q1**: Is it usefull to save the same data into two different open formats? (ex: csv, tab/ jpeg , png). * **A1**: It depends. If the 2 formats are very similar (for ex. csv and tab) there is no good reason. If you want to make the data available for the broadest possible usage, there could be. Think of the DM3 microscopy images this morning: regular images (TIFF or PNG, or even JPEG) could be useful for those who just want to see what the image looks like, whereas a more data-oriented format (HDF5 for example) would be more useful for a more specialized use. * **Q2**: Would one also need to share own scripts/code that were used for analysis of data? Even silly read/plot routines? * **A2**: If you are operating under an open data policy (from your institution, or funder, or personal conviction), this code is necessary to reproduce the data that you are sharing, so yes. Plot routines are not always the most important part, but they could be if you produce very specific visualizations - and who can tell where "silly" ends and "interesting" starts? So just include them in all cases. * **Q3**: Is Zenodo also a preservation/backup platform? * **A3**: Zenodo is not for backup, but it is for preservation. Depending on the openness you set for the dataset you deposit, it can be either private or open (with different degrees and licenses). But we do not recommend to use Zenodo as a dump for simpler backup purposes. For more insights on the difference, please refer to slide 14 of [06_EDCHRDM-2022_Theory3.pdf](https://moodle.epfl.ch/pluginfile.php/2851741/mod_folder/content/0/06_EDCHRDM-2022_Theory3.pdf?forcedownload=1) * **Q4**: How do we decide which data are interesting for long-term storage/preservation? * **A4**: There's no single answer. A pragmatic approach might consist in preserving only but all data and code and documentation that's useful for reproducing a published scientific result from A to Z. It is difficult to estimate in advance whether a negative result or some outlier result will be of use later-on for the scientific community, so another approach migh also cconsider the effort (time, financial resources, special equipment, rare materials, etc.) that's been spent in creating the data in the first place. Even if not useful for published articles, some data can actually be part of a larger dataset that others can create thanks to you.