List of tools for Open Science
===
(Adapted from the OpenCon Switzerland 2018 [tools workshop](https://bit.ly/2MU3Ff3))
###### tags: `open`, `software`, `data`
## Principles
For many tools, even if they are open-source, you are stuck because the data can not be taken out and imported elsewhere (see: [https://www.w3.org/wiki/LDP_Implementations](https://www.w3.org/wiki/LDP_Implementations)).
When choosing the tool, you should think about the format of the data produced and how it can be downloaded and ported to other tool.
The "ultimate tool" should stick (at least) to the following principles:
* Interoperable - using open Web standards
* Data produced is portable into a different format/software
* Decentralisation
* Accessibility
* Portability & allowing reproducibility
* No vendor-locking
* Datasets are managed following the [FAIR principles](www.go-fair.org/fair-principles/)
The "ultimate tool" may not exist, but you need to be know where you are going into. Even if you have the "ultimate tool" and lose track of the finality or nobody uses it, then the tool is useless. Also check [JROST](https://jrost.org) for a roadmap.
## Proprietary software used at the moment
Atlassian tools (Confluence, Jira, HipChat)
Google Drive (Google Docs, Spreadsheets, Photos, Gmail)
## List of alternative, open software
Service to compare tool alternatives: alternativeto.net
### List of lists
* 101innovations.wordpress.com (not only open source, the data is in the google doc:
* [400+ Tools and innovations in scholarly communication](https://docs.google.com/spreadsheets/d/1KUMSeq_Pzp4KveZ7pb5rddcssk1XBTiLHniD0d3nDqo/edit)
* Mostly non-interoperable / vendor lock-in.
* github alternatives: itsfoss.com/github-alternatives
* [Open Science MOOC](opensciencemooc.eu/modules) has interesting tools in module 5
* Skype alternatives: opensource.com/alternatives/skype (more are at opensource.com/alternatives)
* Frictionless Data: [Apps, Integrations, Libraries, and Platforms](https://frictionlessdata.io/software/)
* DLCM Tools list (archived): web.archive.org/web/20160911231442/http://www.dlcm.ch/ressources_category/tools-collection
* Web Archive and its projects: archive.org/projects
* [A Selection of Research Data Management Tools Throughout the Data Lifecycle](https://infoscience.epfl.ch/record/211157), by Jan Krause
* [CERN services](http://information-technology.web.cern.ch/services/cern-commercial)
* [Awesome Self-Hosted App List](https://github.com/Kickball/awesome-selfhosted)
* [7 Best Free and Open Source Project Management Software Solutions](https://www.goodfirms.co/blog/best-free-open-source-project-management-software-solutions)
* [Top 7 open source project management tools for agile teams](https://opensource.com/article/18/2/agile-project-management-tools)
### **EPFL-made** Computational academic tools
* [MyDEP](https://mydepsoftware.github.io/), to study the dielectric particle response to AC electric fields
* [DHCANVAS](https://garzoni.hypotheses.org/files/2017/03/DHCanvas-ICDay2016-min.pdf), a Web Application for Exploration and Annotation of Historical Documents
* [OpenIOT](http://www.openiot.eu/) provides support of cloud-based and utility-based sensing services enabling the concept of “Sensing-as-a-Service”, via an adaptive middle-ware framework for deploying and providing services in cloud environments ([EPFL-LSIR](http://lsir.epfl.ch/research/past/openiot/))
* [AIIDA](http://www.aiida.net/), Automated Interactive Infrastructure and Database for Computational Science
### **EXTERNAL** tools
Graphs
* [Graphviz](http://www.graphviz.org/), open source graph visualization software, a way of representing structural information as diagrams of abstract graphs and networks
* [Blue Brain Nexus](bluebrainnexus.io/docs/index.html): open source, data and knowledge management platform
Authoring articles, annotations, notifications:
* [dokieli](dokie.li) (source: github.com/linkeddata/dokieli)
Collaborative writing:
* [hackMD](https://hackmd.io)
* [Authorea](https://www.authorea.com/) (bought by private company)
* [ShareLatex](https://www.sharelatex.com/) (bought by private company)
Converters:
* [Pandoc](http://pandoc.org): a universal document converter
Slack replacements:
* [Mattermost](mattermost.com) -- self-hosted
* [Gitter](https://gitter.im/)
Github replacements:
* [git](git-scm.com), the basic versioning protocol on which github is build
* [ownCloud](owncloud.org) (?) + docker + git local deployment
* [nextCloud](nextcloud.com)
* [gitlab](gitlab.com), paid and less user-friendly
* [RopenSci](ropensci.org/), for R packages
* [rsync](rsync.samba.org), incremental file transfer tool
* [Sourcetree](www.sourcetreeapp.com)
* [TortoiseGit](tortoisegit.org)
MS Office alternatives:
* [OpenOffice](openoffice.org)
* [LibreOffice](www.libreoffice.org)
* [onlyoffice](www.onlyoffice.com)
* [Slides](https://slides.com), presentations on Slides are powered by the reveal.js HTML presentation framework. Exported presentations can be self-hosted and customized down to the last line of JS.
Google Docs alternatives:
* [sandstorm](sandstorm.io), open source platform for self-hosting web apps
* [hackmd](hackmd.io)
Web search alternatives:
* [DuckDuckGo](duckduckgo.com)
* Qwant (?)
* Swisscows (?)
Search engines:
* [Elasticsearch](https://www.elastic.co/)
* [Solr](http://lucene.apache.org/solr/)
* [Amundsen](https://github.com/lyft?utf8=%E2%9C%93&q=Amundsen), Lyft’s data discovery & metadata engine ([article](https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9))
* [Open Knowledge Maps](https://openknowledgemaps.org/index)
Articles search engines:
* [CORE](https://core.ac.uk/), The world’s largest collection of open access research papers
* [unpaywall](https://unpaywall.org/), Open Access content from over 50,000 publishers and repositories, easy to find, track, and use
* [Open Access Button](https://openaccessbutton.org/), free, legal research articles delivered instantly or automatically requested from authors
Research publishing platforms:
* [Octopus](octopus-hypothesis.netlify.com) (with its [GitHub repository](https://github.com/octopus-hypothesis)), the same as [sciencepublishing.online](sciencepublishing.online)
* [F1000](f1000research.com)
Data reproducibility:
* [REANA](http://reana.io/), Reproducible research data analysis platform
* [Qresp ](http://qresp.org/), facilitates the organization, annotation and exploration of data presented in scientific papers
* [ReproZip](www.reprozip.org), can automatically pack your research along with all necessary data files, libraries, environment variables and options into a self-contained bundle.
Data extractor from plots
* [WebPlotDigitizer](https://automeris.io/WebPlotDigitizer/)
Atlassian replacements:
* [Tuleap](https://www.tuleap.org/)
Permanent webpages:
* [Perma.cc](https://perma.cc/)
Schools
* [School of Data](https://schoolofdata-ch.github.io/)
Password manager
* [Bitwarden](https://bitwarden.com/)
* [KeePass](https://keepass.info/)
Self-hosting social network
* [Mastodon](https://joinmastodon.org/)
Data curation / Benchmarking
* [CEDAR OnDemand](https://chrome.google.com/webstore/detail/cedar-ondemand/bbllhpbnjiddckppfdheoignbnmngmfm?hl=en) + [article](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2247-6) Metadata generator
* [GoodTables](http://goodtables.io/) + GoodTables try [validator](http://try.goodtables.io/)
* [OpenBenchmarking.org](https://openbenchmarking.org/) for test bbenchmarking
Open DNS
* [OpenNIC](https://www.opennic.org/)
Open QR code
* [Project Nayuki](www.nayuki.io/page/qr-code-generator-library), a QR Code generator library
Robotics
* [Webots Open Source robot simulator](https://cyberbotics.com/)
Dictionary
* [CASRAI](dictionary.casrai.org): standard dictionary of research administration information
* [FreeDict](freedict.org)
* [GCIDE](http://gcide.gnu.org.ua/): GNU Collaborative International Dictionary of English
* [Wordset Dictionary](https://github.com/wordset/wordset-dictionary) GitHub repository (not maintained)
Collaborative book
* [The Turing Way](https://the-turing-way.netlify.com/introduction/introduction), a lightly opinionated guide to reproducible data science.
ELN/LIMS
* [eLabFTW](https://www.elabftw.net/), free and open source electronic lab notebook,powered by PHP/MySQL in Docker containers.
* [ELOG](https://midas.psi.ch/elog), web application developed at the Paul Scherrer Institute, used to create personal and common logbooks
* [LabKey](https://www.labkey.com/), open-source LIMS-like data management system
* [OpenBIS](https://labnotebook.ch/), LIMS for storing information about materials and methods used, an ELN for describing experimental and computational procedures, and a data management module for storing experimental and computed data.
* German [publication by ZB MED](https://dx.doi.org/10.4126/FRL01-006415715) about choosing a proper ELN system, with toolbox for a requirements analysis and best practice cases (with the following software: openBIS, RSpace, LabFolder, eLabFTW, homegrown)
MIX:
* [Quasar - Orange suite extension](https://quasar.codes/), open source project, a collection of data analysis toolboxes extending the Orange suite
* [Characterisation Virtual Laboratory (CVL) Desktop](https://www.cvl.org.au/cvl-desktop), a free cloud-based virtual environment used to perform analysis of complex image and microscopy data, an Australian Research Data Commons (ARDC) funded [initiative](https://ardc.edu.au/news/connecting-the-image-analysis-community/)
* [datatables](https://datatables.net/), advanced interaction controls for HTML tables
* [Open Data Inception](https://opendatainception.io/), 2600+ Open Data Portals Around the World
* [ZenHub](https://www.zenhub.com/blog/open-source/), independent project management tool natively integrated with GitHub
* [Open Access Button](https://openaccessbutton.org/),
* [Storyboarder](https://wonderunit.com/storyboarder/), a movie studio released free and open source
# Working open with Git github/gitlab
- gitlab can run locally (github too in theory, but most too expensive)
- If for political reason you cannot use github, use gitlab. But you need to have someone to look after it. Otherwise, use github.
- Github has a stonger community
- Shouldn't ignore the archiving of one's code/data. Don't rely on one service. Git is "Distributed" VCS, so you can push to multiple sources.
---
## Workflows for open science
### Ideas
- Overview of tools in a open access graph similar to https://graphcommons.com/ technology behind this is [neo4j](https://neo4j.com) --> tranfering the 400+ tool into a explorable graph.
- To have a real *open alternative* to a tool should also mean to has some certainty that the organization behind the open tool will be around in 10 years from now. Or that open tools run over shared protocols that will be supported in the long term.
-
### Choosing a tool checklist
- Which open standards is it using?
- What are the privacy implications?
- How accessible is it for different uses?
- Can I export the information/data we are generating
- Are we able to work collaborative
### Tasks for which we are looking for tools
- how can I share my data
- what should I use for versioning
- git as a tool to version work, GitHub to share those versions with others
- are there any specific tools which help me in working in a reproducible way
- how do I present my work (reproducible presentations instead of copy-paste into PowerPoint)
- https://github.com/gnab/remark/blob/develop/src/remark.js which lets me make text files that I can copy&paste into new presentations
- https://github.com/RaoOfPhysics/201804_PCST
- Presentation available https://raoofphysics.github.io/201804_PCST/#/
- Built from `.Rmd` containing presentation text and code for generating plots
- Three levels of use, documented here: https://github.com/RaoOfPhysics/201804_PCST#instructions-for-working-with-the-raw-files
- collaborative writing of papers
- hackmd
- authorea
- sharelatex
- reproducibility
- could you expand this point?
- are there any open tools that already became a standard?
- git
- Is there something like a basic toolkit which we could recommend to newcomers in coding etc.?
- related to Tim´s comment on how little programming students learn while studying: what recommendations can we give as a starting point? (relates to the point mentioned before on a basic toolkit)?
- https://carpentries.org/ as a source of training as well as recommended tools by looking at what they teach
- tools for active research data management (similar to ELNs, LIMS etc.); how can I easily annotate, store and sort my data, metadata and protocols in a handy way
- what is a ELN or LIMS? ELN=Electronic Laboratory Notebooks (instead of old-school paper notebooks), LIMS=Laboratory Information and Management System, both things being helpful on the way to reproducibility, most often used in Life Sciences. Some say that LIMS are overkill (too much options, too expansive) for life science.
- Tools should have labels to track andeasily assess their openness: ideally, it should be a standard body, not a private company
## On the importance of learning digital tools
We have all spent so much time learning maths, we could also take some time to learn how to use computers (= new tools)
Any idea how can get this more into teaching? Currently we try to get research data management to be established in the students´curriculum, so how to get more programming into the curriculum. Who to approach, how to reach out, also to communities and disciplines that are not so much into programming yet.
- software carpentry, data carpentry
- free-lance workshop creators (access2perspectives.com: 1000€ a day for 12 students in general)
- co-learning: see meetup.com: R-ladies,...
- there are also some train the trainer programs (FOSTER for example)
It will take some time for this to get into curriculum, how do we incentivize (?) people during this transition to acquire these skills?
- For data management, see (and contribute to) [this open project who collects and creates outreach material](https://rdmpromotion.rbind.io)
The carpentries are an option but still it has to come from someone hyper motivated.
That is exactly the point; it would need an immediate initiative to implement programming even more in the curriculum; my feeling is that no one really feels responsible for it...Is it the professors taking this over, the assistants or even student initiatives?
I don't think it's a matter of responsibility, but of which are the standards around, what everyone is using, what is indeed useful and easy. That's why -going back to Sarven's point- I think mainly point should be educating in the *criteria* rather than the tools. If pedagogically the tools are more appealing, then these tools have to be very well-chosen, taking in account these criteria.
Criteria would be a great way to also help professors transfer knowledge!
And to include professors that not necessarily come from the tech world, because it's not a 100% tech discussion.
And these criteria could also include a checklist on how to choose a tool, depending on your needs (e.g. thinks you need to consider before choosing a tool, as discussed befpore).