List of Open-Source Software for Research Data Management
===
###### tags: `software`, `data`, `open`, `research`, `RDM`, `ORD`
## Introduction
Welcome to our list of open-source software and platforms, beneficial for Research Data Management (RDM) and beyond. This document serves as a supplementary resource, offering software options that leverage on Open Source to drive the research process and the productivity in general.
##### Purpose of the Document
This document is designed to provide researchers and people working in an academic setting with a wide range of tools to assist in managing, analyzing, and sharing research data. Many tools are also adapted for more general tasks ofeveryday work, while others are very specific. This list is in no way complete or definitive, but it is thought to be a more or less curated catalog introducing software and platforms with a focus on open-source. The tools listed here range from some developed at EPFL, to the many external ones that may prove beneficial to researchers and academics in general.
##### How to Contribute
We welcome contributions from everyone. If you wish to contribute or showcase your open source eTool, you can simply [edit](https://hackmd.io/@V5wZVbSgQ9CS5HkfUkTRZA/HkhBW0uDB/edit?both) this document directly (Markdown language, you should sign-in) or contact francesco.varrato@epfl.ch.
Each item on the list should present the name, link, and short description of the software/platform. While it is preferedd to list specific tools or services, in some cases it might be convenient to list external, curated resources where such tools or services are already well listed and categorized.
Your contributions will help in making this list comprehensive and keeping up-to-date.
##### Disclaimer
This document is not intended to replace any curated list of software present on the principal EPFL Library's RDM Team webpages, go.epfl.ch/rdm. The openness of this document means that EPFL Library takes no responsibility for improper use nor sudden loss of text. [Francesco Varrato](https://people.epfl.ch/francesco.varrato?lang=en) or EPFL Library reserve the right to pull offline this document whenever deemed necessary.
## **EPFL-made** eTools
* [AIIDA](http://www.aiida.net/): Automated Interactive Infrastructure and Database, espacially useful for Computational Science. It is developed publicly on the [aiida-core](https://github.com/aiidateam/aiida_core) GitHub repository.
* [Akantu](https://akantu.ch/): Open Source software for Finite Elements simulations, connecting accurate models with high performance computations, by the LSMS laboratory at EPFL.
* [DHCANVAS](https://garzoni.hypotheses.org/files/2017/03/DHCanvas-ICDay2016-min.pdf), a Web Application for Exploration and Annotation of Historical Documents
* [EPFL ELN](https://eln.epfl.ch/): Especially conceived around chemistry needs, the it is an Electronic Laboratory Notebook as well as a repository for spectroscopic data, with some helpful tools.
* [MaterialsCloud](https://www.materialscloud.org/): Web platform and data repository, conceived to assist Materials Scientists in the life-cycle of their computational projects. Pubicly developed on the [materialscloud-org](https://github.com/materialscloud-org) GitHub repository.
* [MyDEP](https://mydepsoftware.github.io/): Computational tool dedicated to the study of dielectric particle response to AC electric fields. Freely downloadable from Zenodo ([v1.0.1, January 11, 2019](https://doi.org/10.5281/zenodo.1321928)).
* [NOTO](https://noto.epfl.ch/): A JupyterLab platform conceived to test and train with coding in Python, Bash, Octave, C, R, plus other useful features, all in the cloud.
* [OpenIOT](http://www.openiot.eu/): Support of “Sensing-as-a-Service” for cloud-based and utility-based sensing services, via an adaptive middle-ware framework ([EPFL-LSIR](http://lsir.epfl.ch/research/past/openiot/)). --> https://www.epfl.ch/labs/lsir/research/tools/ (linkrots everywhere!)
* [Pryv](https://www.pryv.com/open-pryv/), a free, full production, easy-to-install open-source solution for the collection and management of sensitive personal and health data.
* [Renku](https://renkulab.io/): A software platform that enables reproducible and collaborative data science, with reproducible analyses and automatic generation of Knowledge Graphs.
* [Design Explorer](http://design-explorer.epfl.ch/): open-source tool developed by *CORE studio | Thornton Tomasetti* for exploring building design spaces. Code on [GITHUB | Design Explorer](https://github.com/tt-acm/DesignExplorer).
* [THOT](https://www.thot.so/): tool to manage and analyze data in a reproducible and automated fashion, with graphical interface.
## **External** eTools
### Computational workflow
* [SnakeMake](https://snakemake.readthedocs.io/en/stable/), Workflow management system that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern specification language in python style. Snakemake workflows are essentially Python scripts extended by declarative code to define rules (for more information you can refer to the Snakemake’s [documentation page](https://snakemake.readthedocs.io/en/stable/)).
* [Taverna](https://taverna.incubator.apache.org/), Open source multi-platform tool for designing and executing workflows. Taverna is discipline independent and used in many domains, such as bioinformatics, cheminformatics, medicine, astronomy, social science, music, and digital preservation” ([Wikipedia](https://en.wikipedia.org/wiki/Apache_Taverna)).
### Graphs
* [Graphviz](http://www.graphviz.org/), open source graph visualization software, a way of representing structural information as diagrams of abstract graphs and networks
* [Blue Brain Nexus](bluebrainnexus.io/docs/index.html): open source, data and knowledge management platform
### Authoring articles, annotations, notifications
* [dokieli](dokie.li) (source: github.com/linkeddata/dokieli)
### Collaborative writing
* [hackMD](https://hackmd.io), the one used for this collaborative document :wink:
* [Authorea](https://www.authorea.com/) (bought by private company)
* [ShareLatex](https://www.sharelatex.com/) (bought by private company)
* [Etherpad](https://etherpad.org/), customizable open source online editor providing collaborative editing in really real-time
* Edupad.ch, collaborative text editor based on Etherpad
* [Curvenote](https://curvenote.com), open-source writing tool designed for technical writing, connected to Jupyter, and collaborative in nature
* [GitBook](https://www.gitbook.com/about), simple open source tool designed to let developers quickly publish content from a git repo
* [Joplin](https://joplinapp.org/), open source note-taking app with support for images, videos, PDFs, audio files, math expressions and diagrams, with use of local storage or online synchronization on a variety of commercial or self-hosted services.
### Other Writing tools
* [Zettlr](https://www.zettlr.com/): Free and Open Source Software (FOSS), integration with Zotero for easy referencing, and LanguageTool for grammar checking, etc. WYSIWYG interface, supports LaTeX formulas etc. Can be hosted locally on own machine.
### Format converters
* [Pandoc](http://pandoc.org): a universal document converter
### Slack alternatives
* [Mattermost](mattermost.com) -- self-hosted
* [Gitter](https://gitter.im/)
* [Delta Chat](https://delta.chat/en/), interactive, End-to-End encrypted chat app where messages use the e-mail addresses, so it works even with people not using Delta Chat.
### Github alternatives
* [git](git-scm.com), the basic versioning protocol on which github is build
* [ownCloud](owncloud.org) (?) + docker + git local deployment
* [nextCloud](nextcloud.com)
* [gitlab](gitlab.com), paid and slghtly less user-friendly, but with a community edition
* [RopenSci](ropensci.org/), for R packages
* [rsync](rsync.samba.org), incremental file transfer tool
* [Sourcetree](www.sourcetreeapp.com)
* [TortoiseGit](tortoisegit.org)
### MS Office alternatives
* [OpenOffice](openoffice.org)
* [LibreOffice](www.libreoffice.org)
* [onlyoffice](www.onlyoffice.com)
* [Slides](https://slides.com), presentations on Slides are powered by the reveal.js HTML presentation framework. Exported presentations can be self-hosted and customized down to the last line of JS.
### Google Docs alternatives
* [sandstorm](sandstorm.io), open source platform for self-hosting web apps
* [HackMD](hackmd.io), open source online note taking app in MarkDown, good for collaboration and organiation of documents. It also has a desktop app and a sel-hosting version called [CodiMD](https://github.com/hackmdio/codimd).
### Google Forms alternatives
* [LimeSurvey](https://www.limesurvey.org/)
* [ngSurvery](https://www.ngsurvey.com)
* [Maian Survey](https://www.maiansurvey.com/)
* [JD eSurvery](https://www.jdsoft.com/jd-esurvey.html)
* [SurveyJS Analytics](https://surveyjs.io/Overview/Analytics)
* [Formr survey framework](https://formr.org/)
* [Form.io](https://www.form.io/)
* [ThunderSurvey](https://github.com/yzhang/ThunderSurvey)
* [Open Foris Collect](https://openforis.org/tools/collect/)
> Check github repositories on [this list](https://medevel.com/open-source-survey-tools-2/)
### Web search alternatives
* [DuckDuckGo](duckduckgo.com)
* [Qwant](https://www.qwant.com/?l=en), a multiplatform search engine that does not track its users and does not filter the content of your web searches
* [Swisscows](https://swisscows.ch/?culture=en), a web search engine based on semantic data recognition that does not store users' data
* [CC Search](https://search.creativecommons.org/), a web search for free content in the public domain and under Creative Commons licenses.
### Search engines
* [Elasticsearch](https://www.elastic.co/)
* [Solr](http://lucene.apache.org/solr/)
* [Amundsen](https://github.com/lyft?utf8=%E2%9C%93&q=Amundsen), Lyft’s data discovery & metadata engine ([article](https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9))
* [Open Knowledge Maps](https://openknowledgemaps.org/index)
### Articles search engines
* [CORE](https://core.ac.uk/), The world’s largest collection of open access research papers
* [unpaywall](https://unpaywall.org/), Open Access content from over 50,000 publishers and repositories, easy to find, track, and use
* [Open Access Button](https://openaccessbutton.org/), free, legal research articles delivered instantly or automatically requested from authors
### Research publishing platforms:
* [Octopus](octopus-hypothesis.netlify.com) (with its [GitHub repository](https://github.com/octopus-hypothesis)), the same as [sciencepublishing.online](sciencepublishing.online)
* [F1000](f1000research.com)
### Data reproducibility
* [REANA](http://reana.io/), Reproducible research data analysis platform
* [Qresp ](http://qresp.org/), facilitates the organization, annotation and exploration of data presented in scientific papers
* [ReproZip](www.reprozip.org), can automatically pack your research along with all necessary data files, libraries, environment variables and options into a self-contained bundle.
* [Cascad](https://www.cascad.tech/), i.e. Certification Agency for Scientific Code and Data, is a non-profit, certification agency created by academics with the support of the French National Science Foundation (CNRS) and a consortium of French research institutions. The goal of this agency is to provide researchers with an innovative tool allowing them to signal the reproducibility of their research.
* [WholeTale](https://wholetale.org/), an NSF-funded initiative to build a scalable, open source, web-based, multi-user platform for reproducible research enabling the creation, publication, and execution of tales - executable research objects that capture data, code, and the complete software environment used to produce research findings (a beta version of the system is also [available](https://dashboard.wholetale.org)).
* [Astropy](https://www.astropy.org/), a community effort to develop a common core package for Astronomy in Python and foster an ecosystem of interoperable astronomy packages.
### Data modelling / cleaning
* [Dbdiagram.io](https://dbdiagram.io), online database diagram designer for developers and data analysts, with an intuitive code-based interface to draw Entity-Relationship (ER) diagrams.
* [HeidiSQL](https://www.heidisql.com/), data modeling tool for MariaDB and MySQL, that also supports MS SQL, PostgreSQL, and SQLite database systems.
* [Archi](https://www.archimatetool.com/), conceptual and physical data modeling tool that uses the ArchiMate modeling language and supports the analysis and visualization of various complex database systems.
* [PgModeler](https://www.pgmodeler.io/), multiplatform (GNU/Linux, Windows, macOS) database modeler that supports multiple PostgreSQL databases, with an intuitive interface, which also provides a functional database server administration module.
* [MySQL Workbench](https://www.mysql.com/products/workbench/), a multiplatform (GNU/Linux, Windows, macOS) visual tool that provides data modeling, SQL development, and comprehensive administration tools for server configuration, user administration, backup, and much more.
* [Umbrello](https://umbrello.kde.org/), a multiplatform (GNU/Linux, Windows, macOS) tool for creating and editing UML diagrams.
* [ModelSphere](http://www.modelsphere.com/), an UML modeling tool that supports all forms of data models - conceptual, logical, and physical - and allows for the conversion of models from one type to another.
* [Database Deployment Manager](http://ddmproject.weebly.com/), a database design tool that allows users - typically programmers - to create models and diagrams, and is also a database management software that enables users to create and maintain databases and create ER diagrams between tables
* [DBDesigner](https://www.dbdesigner.net/), an online, simple and intuitive database modeling tool that allows users to design database schema without writing any SQL code, supports reverse engineering but only for MySQL, PostgreSQL, and Oracle databases, and is available in over 26 languages.
### Data extractor from plots
* [WebPlotDigitizer](https://automeris.io/WebPlotDigitizer/)
### Atlassian alternatives
* [Tuleap](https://www.tuleap.org/)
### Permanent webpages
* [Perma.cc](https://perma.cc/)
### Schools
* [School of Data](https://schoolofdata-ch.github.io/)
### Password manager
* [Bitwarden](https://bitwarden.com/)
* [KeePass](https://keepass.info/)
* [Firefox Lockwise](https://lockwise.firefox.com/), an experimental product from Mozilla, the makers of Firefox. It’s an app for iOS and Android that gives you access to passwords you’ve saved to Firefox. Open sourced on https://github.com/mozilla-lockwise.
### Self-hosting social network
* [Mastodon](https://joinmastodon.org/)
### Data curation / Benchmarking
* [CEDAR OnDemand](https://chrome.google.com/webstore/detail/cedar-ondemand/bbllhpbnjiddckppfdheoignbnmngmfm?hl=en) + [article](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2247-6) Metadata generator
* [GoodTables](http://goodtables.io/) + GoodTables try [validator](http://try.goodtables.io/)
* [OpenBenchmarking.org](https://openbenchmarking.org/) for test bbenchmarking
### Open DNS
* [OpenNIC](https://www.opennic.org/)
### Open QR code
* [Project Nayuki](www.nayuki.io/page/qr-code-generator-library), a QR Code generator library
### Robotics
* [Webots Open Source robot simulator](https://cyberbotics.com/)
### Dictionary
* [CASRAI](dictionary.casrai.org): standard dictionary of research administration information
* [FreeDict](freedict.org)
* [GCIDE](http://gcide.gnu.org.ua/): GNU Collaborative International Dictionary of English
* [Wordset Dictionary](https://github.com/wordset/wordset-dictionary) GitHub repository (not maintained)
### Collaborative book
* [The Turing Way](https://the-turing-way.netlify.com/introduction/introduction), a lightly opinionated guide to reproducible data science.
### ELN/LIMS
* [eLabFTW](https://www.elabftw.net/), free and open source electronic lab notebook,powered by PHP/MySQL in Docker containers.
* [ELOG](https://midas.psi.ch/elog), web application developed at the Paul Scherrer Institute, used to create personal and common logbooks
* [LabKey](https://www.labkey.com/), open-source LIMS-like data management system
* [OpenBIS](https://labnotebook.ch/), LIMS for storing information about materials and methods used, an ELN for describing experimental and computational procedures, and a data management module for storing experimental and computed data.
* German [publication by ZB MED](https://dx.doi.org/10.4126/FRL01-006415715) about choosing a proper ELN system, with toolbox for a requirements analysis and best practice cases (with the following software: openBIS, RSpace, LabFolder, eLabFTW, homegrown)
* Harvard's ELN [selection resources](https://datamanagement.hms.harvard.edu/analyze/electronic-lab-notebooks), with articles, tool, and news
* University of Cambridge's [page to guide](https://www.data.cam.ac.uk/data-management-guide/electronic-research-notebooks) researchers to choose an ELN
### File transfer
* [Firefox Send](https://send.firefox.com/) [DISCONTINUED], a free encrypted file transfer service that allows users to safely and simply share files, open sourced on https://github.com/mozilla/send.
### Text Analysis
* [https://catma.de](catma)
### Metadata Standards
* https://rd-alliance.github.io/metadata-directory/standards/
* https://fairsharing.org/standards/
### Data Publication
* [datasette](https://datasette.io), a tool for exploring and publishing data, as it helps people take data of any shape or size, analyze and explore it, and publish it as an interactive website and accompanying API.
### Security
* [Cryptomator](https://cryptomator.org/), is a simple tool for digital self-defense. It allows you to protect your cloud data by yourself and independently on most cloud providers, encrypting data not only during transmission.
* [VeraCrypt](https://www.veracrypt.fr/code/VeraCrypt/), free Open source disk encryption software for Windows, Mac OS X and Linux. In case an attacker forces you to reveal the password, VeraCrypt provides plausible deniability. In contrast to file encryption, data encryption performed by VeraCrypt is real-time (on-the-fly), automatic, transparent, needs very little memory, and does not involve temporary unencrypted files.
### Anonymization
* [Amnesia](https://amnesia.openaire.eu), a flexible data anonymization tool that transforms relational and transactional databases to dataset where formal privacy guaranties hold. Amnesia transforms original data to provide k-anonymity and km-anonymity.
* [Arx](https://arx.deidentifier.org), a comprehensive open source software for anonymizing sensitive personal data. It supports a wide variety of (1) privacy and risk models, (2) methods for transforming data and (3) methods for analyzing the usefulness of output data.
* [Graasp Insight](https://insights.graasp.org/), a free, open-source, cross-platform desktop client to anonymize your datasets. Easy to use, you can anonymize datasets via configurable, pre-made algorithms, or create and add your own custom scripts. Your datasets are stored and processed locally.
* ...
### Miscellaneous
* [Quasar - Orange suite extension](https://quasar.codes/), open source project, a collection of data analysis toolboxes extending the Orange suite
* [Characterisation Virtual Laboratory (CVL) Desktop](https://www.cvl.org.au/cvl-desktop), a free cloud-based virtual environment used to perform analysis of complex image and microscopy data, an Australian Research Data Commons (ARDC) funded [initiative](https://ardc.edu.au/news/connecting-the-image-analysis-community/)
* [datatables](https://datatables.net/), advanced interaction controls for HTML tables
* [Open Data Inception](https://opendatainception.io/), 2600+ Open Data Portals Around the World
* [ZenHub](https://www.zenhub.com/blog/open-source/), independent project management tool natively integrated with GitHub
* [Open Access Button](https://openaccessbutton.org/),
* [Storyboarder](https://wonderunit.com/storyboarder/), a movie studio released free and open source
* [Lean](https://leanprover.github.io/), open source theorem prover and programming language that aims to bridge the gap between interactive and automated theorem proving
* [DORIS](http://www.boris.unito.it/pages/doris.html), open-source, interactive object detection and tracking software with a graphical user interface.
* [Nominatim](https://nominatim.org/) uses OpenStreetMap data to find locations on Earth by name and address (geocoding) and the reverse, find an address for any location on the planet.
* [CKAN](https://ckan.org/) is an open-source DMS (data management system) for powering data hubs and data portals, making it easy to publish, share and use data.
#### Videoconferencing (Zoom alternatives)
* [Apache OpenMeetings](https://openmeetings.apache.org/index.html)
* [BigBlueButton](https://bigbluebutton.org/)
* [Element](https://element.io/plans-and-pricing/pro)
* [Jami](https://jami.net/together-the-new-version-of-jami-and-a-new-step-forward/)
* [Jitsi Meet](https://jitsi.org/jitsi-meet/)
* [Linphone](https://www.linphone.org/licensing-services)
* [Nextcloud Talk](https://nextcloud.com/talk/#hpb)
* [OpenVidu Call](https://openvidu.io/)
* [SignalWire Work](https://signalwire.com/pricing/signalwire-work)
* [Wire](https://wire.com/en/)
## **Dataset** Search Engines & Repositories
### Dataset Search Engines
+ data.mendeley.com
+ datahub.io
+ dans.knaw.nl/en/search
+ data.europa.eu/euodp/en/home
+ toolbox.google.com/datasetsearch - datasetsearch.research.google.com/
+ www.google.com/publicdata/directory?hl=en
+ next.openspending.org
+ opendata.cern.ch
+ datadiscovery.nlm.nih.gov/browse
+ paperswithcode.com/datasets
+ www.kaggle.com/datasets
+ [Amazon Web Services (AWS) Open Data](https://aws.amazon.com/marketplace/search?FULFILLMENT_OPTION_TYPE=DATA_EXCHANGE&PRICING_MODEL=FREE&CONTRACT_TYPE=OPEN_DATA_LICENSES&filters=FULFILLMENT_OPTION_TYPE%2CPRICING_MODEL%2CCONTRACT_TYPE)
+ quandl.com
+ https://opendatanavigator.switch.ch/ (needs SWITCH Edu-ID login :smiley: ... not so open after all)
### Data/Code dissemination platforms Repositories
* [Landscape of data repositories and platforms](https://doi.org/10.5281/zenodo.5769279)
* [https://go.epfl.ch/datarepo](https://www.epfl.ch/campus/library/services-researchers/data-publication/data-repositories-and-related-platforms/)
#### Data repositories
May Data Repositories can be found via https://www.re3data.org. Here we just want to list the main used ones
##### Generic (domain agnostic)
+ zenodo.org
+ figshare.com
+ www.swissubase.ch
+ dataverse.org
+ eudat.eu
##### Specialized (domain specific)
+ www.materialscloud.org (Computational Materials)
+ www.datadryad.org (Bio / Medical)
+ www.wikidata.org (for wikis)
+ [UCI Machine Learning Repository](https://archive.ics.uci.edu/)
+ [World Bank Open Data](https://data.worldbank.org/)
#### Data/Code journals
* [https://go.epfl.ch/datajournals](https://www.epfl.ch/campus/library/services-researchers/data-publication/data-code-journals/)