34c3 Open Science/Research Workshop

Day 2 (Dec 28) 11:00–13:00 in Hall 3 (CCL, 1. Etage)

https://events.ccc.de/congress/2017/wiki/index.php/Session:Open_Science/Research_Workshop

In this workshop about Open Research/Science we want to first identify what we can do to improve any aspect of Open Research/Science (like Open Access, Open Educational Resources, Open Peer Review, Open Data, …). We then pick one or more of these aspects and start working on solutions (Hackathon-style).

The workshop is an excellent place to connect with other people who are interested in making research and science more open.

Bring your own ideas and make sure to add them to this document (even better if you do so before the event)!

Participants

André Gaul (Mathematics/Computer Science)
Manuel Baumann (Mathematics)
Paul Seyfert (Particle Physics)
Jochen Klar (Astronomy)
Michael Herbst (Quantum Chemistry)
Frank Löffler (Computer Science, formerly Numerical Relativity)
Michaela Voigt (Library / Information Science)
Emanuil Tolev (Freelance Software Developer in Open Science. https://doaj.org developer and sysadmin)
Katrin Leinweber
Malte Reißig (Open Source Consultant, Knowledge Media Science) remote participant
Maximilian Fuchs (Internet- and Web-based Systems, Natural Language Processing)
Add your name here
(There have been a lot more present at the workshop, but most either didn't bother to add themself here or didn't want to.)

What do you understand under "Open Science"?

Possibilities:

Open Education
Open Access to publications
Open Access to (almost) all data/software used for publications
Reproducible Research
Wikipedia: "Open research is research conducted in the spirit of free and open-source software. Much like open-source schemes that are built around a source code that is made public, the central theme of open research is to make clear accounts of the methodology freely available via the internet, along with any data or results extracted or derived from them."
The "Open" in science should IMHO try to reflect the four principles of openness in software but for publications:
- ability to use (read/downlad),
- to study (full access with instructions on the sources),
- to adapt (build on the research data, use the research data with different analytical methods, translate the publication)
- to distribute (print, sell, whatever)
Open infrastructure available for everyone wanting to do research, in computer science e.g. access to computing reseources, access to ordinary working space, some special web service, etc.)

How to improve Open Science/Research?

We could do some research about

public statements from research institutions on what they think "Open Access" is and what it actually is
- so they are better informed and stop making bad or wrong contracts with private publishing houses
- can't claim that they do "open access" on their websites
gather further notions of "Open access", especially those accepted in all EU member states
gather further notions of "Open Science"

With that we could target "außeruniversitäre Forschungseinrichtungen" (those doing programmatic research) and force them (via the Bundestag and the BMBF) to gradually adopt a special notion of "open research" (starting with some of their research projects).

What i found is that, if one really understands the "Berliner Erklärung for Open Access" from 200? - and now finds statements on institutions website that they support that declaration. But sometimes even on an institutes website you find out that this institutions really has no idea what they are talking about. See therefore also, this comment on 10 years after the declaration by the MPI.

Tools:

Open Peer Review: Tool that anonymizes author names to eliminate biases (gender, ethnic, origin, …)
Open Access: create a dat with all PDFs of CC-BY-licensed articles on CrossRef (better accessibility, easy text and data mining, …)
Tool/Compilation/List of all available scientific positions in research related centers

Awareness (might depend on science area):

Create a short flyer why to choose CC0 for research data in a language scientists can relate to.
Compilation of services / websites which offer open-source, open-data, open-access hosting
- Supported licences
- Implications with traditional publishers regards first-publication or exclusive-publication of scientific results or similar
Raise awareness and promote a scientific culture of talking or publishing about failed investigations, too. Not only publish successes, as that seems still the common approach in established research institutes. Maybe even create "fail" repo for "fails in methodology", "fails in literature research", "fails in data collection" or "fails in data publishing", etc.
Add your idea here

Questions from the internet

How does the "Allianz der Wissenschaftsorganisationen" (e.g. MPI, Fraunhofer, Helmholtz and Leibnitz) and thus all their institutes in germany understand "Open Science"? The programmatic annual research budget for those institutes (not being universities) is roughly 13bln €.
- For example, the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, GWDG
- or the Niedersächsische SUB Göttingen on Open Science and Access
- or see the Helmholtz Open Science website
- and the Leibniz Open Science description
- or Projekt Deepgreen which tries to "convert scientifical publications after the embargo-time on their license has run out and turn them into OpenAccess", see @oa_DeepGreen
How do specific universities in germany understand "Open Science"? Compared to the research institutes mentioned above universities are more free to choose in what they research.
How do institutions in the EU e.g. the European Parliament understand "Open Science"?
- European Comission: Open Science Policy Platform
- or Research Infrastructures in Horizon 2020
- or Science Europe
How do institutions outside the EU understand "Open Science"?
- e.g. see Center of Open Science
Open Access in academia seems to be merely used to describe the problem that the publishing houses and journals don't want to change. It expresses kind of the smallest denominator boths sides could agree on and therefore it should be used carefully when thinking "Open Science".
Do you know about the DARIAH project? And its TextGrid Software?
Did you had a look at this RDF schema of a ScholarlyArticle?
The Free Software Foundation Europe has also published an extensive position paper on open science practice in the context of Horizon 2020, has someone read it?
List of Open Access Portals:
- http://www.leibnizopen.de/blaettern/
- …

Projects

Repository of OA fulltexts

Participants: André, Paul, Katrin, Michaela, Michael, Emanuil
Goal: Create a repository with fulltexts of Open Access publications.
Intermediate Hackathon goal: Create a tool that downloads OA fulltext documents from one publisher (in a way that we can extend it to more publishers).
Next meeting: Day 2, 18:00, Freifunk Assembly -> we are currently in CCL level 2 hall 12
https://github.com/andrenarchy/libfulltext
Python 3!

Draft API

Fetch DOIs/metadata from CrossRef:
- https://pypi.python.org/pypi/crossrefapi/
- https://github.com/fabiobatalha/crossrefapi
CrossRef:
- CC-BY set in OAI-PMH API? -> OAI-PMH for subscribers only :(
- REST API: example request Elsevier CC BY 2016 (used filters might need further testing) https://api.crossref.org/works?facet=published:*,license:creativecommons.org/licenses/by/,container-title:*&filter=member:78,has-license:true,from-pub-date:2016-01-01,until-pub-date:2016-12-31
crossref-get-dois -m MEMBER_ID -a AFTER_DATE -b BEFORE_DATE -l CC-BY
- Target: Scrape all DOIs in one licence and from one publisher
- Result: One DOI per line on stdout
- Prefixed with a namespace (doi:, arxiv:, …)
```
doi:10.000/XXXX
doi:10.000/YYYY
```
get-fulltext -c CONFIG_FILE -d FULLTEXT_DIR -m METADATA_DIR -f FAILED_FILE
- Reads IDs from stdin
- Target: Download metadata, download fulltext (pdf)
- Plugin-like system for publishers
- METADATA_DIR/doi:10.000/XXX/crossref.json
- METADATA_DIR/arxiv:1208.0264/arxiv.xml
- FULLTEXT_DIR/doi:10.000/YYY/fulltext.pdf
- FULLTEXT_DIR/arxiv:1208.0264/v1.pdf
- Later: Supplementary information could be dropped in the fulltext directory as well.
- Steps:
  1. First check ID prefix (ID Prefix handler)
  2. Download metadata
  3. From first two decide publisher
  4. Determine plugin/module to use for download from publisher
  5. Initiate download
- Check and implement later:
  - API calls for mass fulltext download? (i.e not one by one, but bulk)

external APIs

Springer, anotherlink
- https://link.springer.com/content/pdf/DOI.pdf
crossref, crossref-python
crossref TDM
Elsevier
- Article (Full Text) Retrieval API (Format: XML)
- Text Mining API (requires API key, Format: XML, txt)

Existing useful projects

Downloads articles per journal ContentMine/quickscrape
Detect and extract licenses from many publisher websites (crawler). Could be extended to download papers for those publishers. Plugin-based system, CottageLabs/OpenArticleGauge
in case more handles are needed, there is https://addons.mozilla.org/en-US/firefox/addon/cnri-handle-extension-for-fire/ which probably contains a list of them
Hybrid OA Dashboard: Detect so called hybrid OA articles, i.e. OA articles in otherwise closed access journals: https://najkoja.shinyapps.io/hybridoa/ (mainly R, check GH repo)

Paper2HTML

Goal: Convert paper .pdf files to .html files as a markup language allows integration of various semantic vocabularies to describe / annotate the contents of a paper.

Paper2Text

Participants: Maximilian Fuchs, …
Goal: Convert paper .pdf files to .txt files
First step: Use "k2pdfopt" to get a better format of pdf
Second step: Use OCR to get the raw text
Project started, unfinished code availible at https://github.com/SammyRamone/Paper2Text

Data licensing flyer

Participants: Jochen, TODO
Goal: TODO

Networking

some places to get information about open sciency things

deRSE page and mailing list
forschungsdaten.org german wiki about research data with a lot of links to other resources.
open-science-de Discussion list for the german speaking open science community
open-science Discussion list for the open science community

10.1016/j.physletb.2017.11.066

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.