owned this note
owned this note
Published
Linked with GitHub
# 34c3 Open Science/Research Workshop
**Day 2 (Dec 28) 11:00–13:00 in [Hall 3 (CCL, 1. Etage)](https://34c3.c3nav.de/l/c:1:301.56:384.57/@1,299.34,385.18,3.65)**
https://events.ccc.de/congress/2017/wiki/index.php/Session:Open_Science/Research_Workshop
In this workshop about Open Research/Science we want to first identify what we can do to improve any aspect of Open Research/Science (like Open Access, Open Educational Resources, Open Peer Review, Open Data, ...). We then pick one or more of these aspects and start working on solutions (Hackathon-style).
The workshop is an excellent place to connect with other people who are interested in making research and science more open.
Bring your own ideas and make sure to add them to this document (even better if you do so before the event)!
## Participants
* [André Gaul](https://twitter.com/andrenarchy) (Mathematics/Computer Science)
* [Manuel Baumann](https://twitter.com/ManuelMBaumann) (Mathematics)
* [Paul Seyfert](https://gitlab.cern.ch/pseyfert) (Particle Physics)
* [Jochen Klar](https://twitter.com/jochenklar) (Astronomy)
* [Michael Herbst](https://michael-herbst.com/) (Quantum Chemistry)
* [Frank Löffler](http://fusion.cs.uni-jena.de/fusion/members/frank-loffler/) (Computer Science, formerly [Numerical Relativity](https://www.cct.lsu.edu/~knarf/index.html))
* [Michaela Voigt](https://twitter.com/mv01gt) (Library / Information Science)
* [Emanuil Tolev](https://twitter.com/emanuil_tolev) (Freelance Software Developer in Open Science. https://doaj.org developer and sysadmin)
* [Katrin Leinweber](https://twitter.com/gittaca/)
* [Malte Reißig](https://twitter.com/topmalte) (Open Source Consultant, Knowledge Media Science) *remote participant*
* [Maximilian Fuchs](https://github.com/Maximilian-Fuchs) (Internet- and Web-based Systems, Natural Language Processing)
* *Add your name here*
(There have been a lot more present at the workshop, but most either didn't bother to add themself here or didn't want to.)
## What do you understand under "Open Science"?
Possibilities:
* Open Education
* Open Access to publications
* Open Access to (almost) all data/software used for publications
* Reproducible Research
* Wikipedia: "Open research is research conducted in the spirit of free and open-source software. Much like open-source schemes that are built around a source code that is made public, the central theme of open research is to make clear accounts of the methodology freely available via the internet, along with any data or results extracted or derived from them."
* The "Open" in science should IMHO try to reflect the four principles of openness in software but for publications:
* ability to *use* (read/downlad),
* to *study* (full access with instructions on the sources),
* to *adapt* (build on the research data, use the research data with different analytical methods, translate the publication)
* to *distribute* (print, sell, whatever)
* Open infrastructure available for everyone wanting to do research, in computer science e.g. access to computing reseources, access to ordinary working space, some special web service, etc.)
## How to improve Open Science/Research?
We could do some research about
* public statements from research institutions on what they think "Open Access" is and what it actually is
* so they are better informed and stop making bad or wrong contracts with private publishing houses
* can't claim that they do "open access" on their websites
* gather further notions of "Open access", especially those accepted in all EU member states
* gather further notions of "Open Science"
With that we could target "außeruniversitäre Forschungseinrichtungen" (those doing programmatic research) and force them (via the Bundestag and the BMBF) to gradually adopt a special notion of "open research" (starting with some of their research projects).
What i found is that, if one really understands the "Berliner Erklärung for Open Access" from 200? - and now finds statements on institutions website that they support that declaration. But sometimes even on an institutes website you find out that this institutions really has no idea what they are talking about. See therefore also, this comment on [10 years after the declaration](http://openaccess.mpg.de/mission-statement_de) by the MPI.
Tools:
* Open Peer Review: Tool that anonymizes author names to eliminate biases (gender, ethnic, origin, ...)
* Open Access: create a [dat](https://github.com/datproject/dat) with all PDFs of CC-BY-licensed articles on CrossRef (better accessibility, easy text and data mining, ...)
* Tool/Compilation/List of all available scientific positions in research related centers
Awareness (might depend on science area):
* Create a short flyer why to choose CC0 for research data in a language scientists can relate to.
* Compilation of services / websites which offer open-source, open-data, open-access hosting
* Supported licences
* Implications with traditional publishers regards first-publication or exclusive-publication of scientific results or similar
* Raise awareness and promote a scientific culture of talking or publishing about failed investigations, too. Not only publish successes, as that seems still the common approach in established research institutes. Maybe even create "fail" repo for "fails in methodology", "fails in literature research", "fails in data collection" or "fails in data publishing", etc.
* *Add your idea here*
## Questions from the internet
* How does the "Allianz der Wissenschaftsorganisationen" (e.g. MPI, Fraunhofer, Helmholtz and Leibnitz) and thus all their institutes in germany understand "Open Science"? The programmatic annual research budget for those institutes (not being universities) is roughly 13bln €.
* For example, the [Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, GWDG](https://www.gwdg.de/)
* or the [Niedersächsische SUB Göttingen](https://www.sub.uni-goettingen.de/en/electronic-publishing/open-science/) on Open Science and Access
* or see the [Helmholtz Open Science](http://os.helmholtz.de/) website
* and the [Leibniz Open Science](https://www.leibniz-gemeinschaft.de/forschung/open-science/) description
* or [Projekt Deepgreen](https://deepgreen.kobv.de/de/deepgreen/) which tries to "convert scientifical publications after the embargo-time on their license has run out and turn them into OpenAccess", see [@oa_DeepGreen](https://twitter.com/oa_DeepGreen)
* How do specific universities in germany understand "Open Science"? Compared to the research institutes mentioned above universities are more free to choose in what they research.
* How do institutions in the EU e.g. the European Parliament understand "Open Science"?
* European Comission: [Open Science Policy Platform](http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-policy-platform)
* or [Research Infrastructures in Horizon 2020](http://www.rich2020.eu/proposals)
* or [Science Europe](http://www.scienceeurope.org/)
* How do institutions outside the EU understand "Open Science"?
* e.g. see [Center of Open Science](https://cos.io/)
* Open Access in academia seems to be merely used to describe the problem that the publishing houses and journals don't want to change. It expresses kind of the smallest denominator boths sides could agree on and therefore it should be used carefully when thinking "Open Science".
* Do you know about the [DARIAH](https://de.dariah.eu/) project? And its [TextGrid](https://de.dariah.eu/textgrid) Software?
* Did you had a look at this RDF schema of a [ScholarlyArticle](http://ns.science.ai/)?
* The Free Software Foundation Europe has also published an extensive [position paper on open science practice in the context of Horizon 2020](https://fsfe.org/activities/policy/eu/Horizon2020-Position-Paper.en.html), has someone read it?
* List of Open Access Portals:
* http://www.leibnizopen.de/blaettern/
* ...
## Projects
### Repository of OA fulltexts
* Participants: André, Paul, Katrin, Michaela, Michael, Emanuil
* Goal: Create a repository with fulltexts of Open Access publications.
* Intermediate Hackathon goal: Create a tool that downloads OA fulltext documents from one publisher (in a way that we can extend it to more publishers).
* Next meeting: Day 2, 18:00, Freifunk Assembly -> we are currently in CCL level 2 hall 12
* https://github.com/andrenarchy/libfulltext
* Python 3!
#### Draft API
* Fetch DOIs/metadata from CrossRef:
* https://pypi.python.org/pypi/crossrefapi/
* https://github.com/fabiobatalha/crossrefapi
* **CrossRef**:
* CC-BY set in OAI-PMH API? -> OAI-PMH for subscribers only :(
* REST API: example request Elsevier CC BY 2016 (used filters might need further testing) `https://api.crossref.org/works?facet=published:*,license:creativecommons.org/licenses/by/,container-title:*&filter=member:78,has-license:true,from-pub-date:2016-01-01,until-pub-date:2016-12-31`
* `crossref-get-dois -m MEMBER_ID -a AFTER_DATE -b BEFORE_DATE -l CC-BY`
* Target: Scrape all DOIs in one licence and from one publisher
* Result: One DOI per line on stdout
* Prefixed with a namespace (doi:, arxiv:, ...)
```
doi:10.000/XXXX
doi:10.000/YYYY
```
* `get-fulltext -c CONFIG_FILE -d FULLTEXT_DIR -m METADATA_DIR -f FAILED_FILE`
* Reads IDs from stdin
* Target: Download metadata, download fulltext (pdf)
* Plugin-like system for publishers
* `METADATA_DIR/doi:10.000/XXX/crossref.json`
* `METADATA_DIR/arxiv:1208.0264/arxiv.xml`
* `FULLTEXT_DIR/doi:10.000/YYY/fulltext.pdf`
* `FULLTEXT_DIR/arxiv:1208.0264/v1.pdf`
* Later: Supplementary information could be dropped in the fulltext directory as well.
* Steps:
1. First check ID prefix (ID Prefix handler)
2. Download metadata
3. From first two decide publisher
4. Determine plugin/module to use for download from publisher
5. Initiate download
* Check and implement later:
* API calls for mass fulltext download? (i.e not one by one,, but bulk)
#### external APIs
* [Springer](https://dev.springer.com/restfuloperations), [anotherlink](https://dev.springer.com/)
* https://link.springer.com/content/pdf/DOI.pdf
* [crossref](https://github.com/CrossRef/rest-api-doc), [crossref-python](https://github.com/fabiobatalha/crossrefapi)
* [crossref TDM](http://tdmsupport.crossref.org/)
* Elsevier
* [Article (Full Text) Retrieval API](https://api.elsevier.com/documentation/ArticleRetrievalAPI.wadl) (Format: XML)
* [Text Mining API](https://dev.elsevier.com/tecdoc_text_mining.html) (requires API key, Format: XML, txt)
#### Existing useful projects
* Downloads articles *per journal* [ContentMine/quickscrape](https://github.com/ContentMine/quickscrape/blob/master/README.md)
* Detect and extract licenses from many publisher websites (crawler). Could be extended to download papers for those publishers. Plugin-based system, [CottageLabs/OpenArticleGauge](https://github.com/CottageLabs/OpenArticleGauge)
* in case more handles are needed, there is https://addons.mozilla.org/en-US/firefox/addon/cnri-handle-extension-for-fire/ which probably contains a list of them
* **Hybrid OA Dashboard**: Detect so called hybrid OA articles, i.e. OA articles in otherwise closed access journals: https://najkoja.shinyapps.io/hybridoa/ (mainly R, [check GH repo](https://github.com/subugoe/hybrid_oa_dashboard))
### Paper2HTML
* Goal: Convert paper .pdf files to .html files as a markup language allows integration of various semantic vocabularies to describe / annotate the contents of a paper.
### Paper2Text
* Participants: [Maximilian Fuchs](https://github.com/Maximilian-Fuchs), ...
* Goal: Convert paper .pdf files to .txt files
* First step: Use "k2pdfopt" to get a better format of pdf
* Second step: Use OCR to get the raw text
* Project started, unfinished code availible at https://github.com/SammyRamone/Paper2Text
### Data licensing flyer
* Participants: Jochen, TODO
* Goal: TODO
## Networking
*some places to get information about open sciency things*
* [deRSE page and mailing list](http://www.de-rse.org/de/index.html)
* [forschungsdaten.org](http://www.forschungsdaten.org/) german wiki about research data with a lot of links to other resources.
* [open-science-de](https://lists.okfn.org/mailman/listinfo/open-science-de) Discussion list for the german speaking open science community
* [open-science](https://lists.okfn.org/mailman/listinfo/open-science) Discussion list for the open science community
10.1016/j.physletb.2017.11.066