owned this note
owned this note
Published
Linked with GitHub
# The Turing Way
*22 & 23 November 2018*
*Alan Turing Institute*
### Who's here
* Kirstie Whitaker @KirstieJane
* Patricia Herterich @pherterich
* Louise Bowler
* Rosie Higman @rosiehigman
* Anna Krystalli @annakrystalli
* Becky Arnold @r-j-arnold
* Martin O'Reilly ([@martintoreilly](https://github.com/martintoreilly))
* Alexander Morley @alexmorley (tw: @alex__morley)
### Motivation for this hackmd file
We need to capture our goals and vision for the Turing Way project! This file will be archived in our github repository as a jumping off point for where we want to go.
Please add any and all notes as we go. And please re-organise and summarise notes as we develop them.
### How does binder fit into best practise?
Useful links:
* https://mybinder.org/
* https://binderhub.readthedocs.io/en/latest/index.html
* https://repo2docker.readthedocs.io/en/latest/
* https://www.docker.com/resources/what-container
When you containerise your analyses you capture the computational environment. That makes it easier to transfer between users. It also makes it easy to freeze the packages that you need so that as they are developed further you can still look back in time to what the packages were when you published the results.
What's nice with binder is that you make it very easy to use containers etc. By providing the **binder** instance, you also create the docker image and the requirements for the analysis, and so others with more experience would be able to jump in at the more technical levels but EVERYONE would be able to use the docker image.
Binder makes it very easy for researcher to *share* results in progress. Our goal is to use the tool to share the work for review early and often with collaborators. Binder prevents the PI from having to install all the different packages etc.
Question: Can you commit a change in a binder instance.
Answer: Not at the moment but it could be a thing to do.
It would be really nice for an outcome of this project to be a BinderHub offering on the Azure Marketplace, with the code, documentation and knowledge transferred to the core BinderHub team ([Azure Marketplace publisher's guide](https://docs.microsoft.com/en-gb/azure/marketplace/marketplace-publishers-guide))
### What topics do we need to include?
* GitHub
* git
* packaging your code
* Good project structure/file naming + the power of convention
* sharing your data
* binder
* dois
* glossary
* principles of why reproducibility matters
Important to make the distinction on what the different tools are and what they can and can not do.
Really valuable to have a jargon busting section so that it's super easy to get people on board with all the things we're doing. Probably also needs some definitions for terms that are used differently.
The motivation of why people should do reproducible research and do this in a way that makes sense to different groups we are looking to target.
### How do we get the message across?
* Self contained chapters
* Nesting access to the information to make it accessible to a wide audience
### Structure and architecture of the information
We want to make information really highly curated.
Ideally the first page is only a couple of sentences, but each includes a link to more detail, and then those go to lots of details. (Maybe apply the rule of 3 here?)
### Workshopping the architecture
Maybe use card sorting?
* Ask for feedback early and often on the order of information and whether people intuitively would look in the same places that we've guessed as we develop
### Skill levels for each chapter
* Include at the start what skills are required to understand the article
* Make clear whether there are other chapters that should be covered first
### Summaries in each section
* Can we include summaries of each chapter at the start?
* Something inspired by the
* Simple english version of articles, just say how to do the ting with no extranious information.
### Glossary
Make sure you have good search functionality, and include some synonyms in there to help people who are searching for other words.
### Creative ways to make our point!
https://youtu.be/s3JldKoA0zw
https://www.youtube.com/watch?v=N2zK3sAtr-4
http://whyopenresearch.org/
Hans Rosling on the power of open data back in 2006. Makes a great point about how it will only work if we all do our bit: https://youtu.be/hVimVzgtD6w?t=908
Interview like case studies? - Real world examples to guide a reader through certain chapters, lessons learnt. Possibly use some continuous examples across chapters to help create a narrative.
### Workshops/Tutorials
Have funds for workshops, these could be about Binder, BinderHubs, or more generally the Turing Way.
Advertising probably needs to be wider, Binder might need to be tailored to RSEs specifically e.g. at the next RSE Conference:
https://rse.ac.uk/conf2019/
UK Reproducibility Network: http://www.dcn.ed.ac.uk/camarades/ukrn/
What if we *don't* run workshops? What if we don't have time to build up the tutorials and administer them?
#### Target group
Post grad students and more senior, not yet undergrads in the first round of the project
Target senior researchers as well as postgraduates. Would it be possible to do this at a lab/group level - either in a separate workshop or as an exercise individuals could take back to their groups.
There are ~200 Turing fellows and we need to be continuously engaging with them and finding out what they want/encourage their contributions.
Accessibility - https://accessibility.blog.gov.uk/2016/09/02/dos-and-donts-on-designing-for-accessibility/
### Booksprint
Get feedback and edits on the content.
### Sustainability
Share guidelines with funders & coordinators of doctoral training programs. To convince them to
### Roles
Rosie - workshops, tutorials, community building (internal and external)
Becky - would like to talk to lots of researchers about their practice
Sarah, Louise, Becky - BinderHub - testing, documentation, and testing of documentation.
Patricia - community building, book/hosting technologies, workshops, tutorials
Rosie and Patricia - data management chapter
Anna - templates and checklists
MISSING:
People able to explain:
- How do you collaborate?
- Tailoring to data science issues (big data, sensitive data, deep learning)
- Disciplines to be excluded?
### Other ideas
- Get an idea of how long it takes to make certain aspects reproducible to get an idea of costs so this can be included in grants/ raise awareness among funders that this needs resourcing!
### Continuous Research
Slides from Alex
* Modular research - should be able to share all the different components of research:
* hypothesis
* analysis
* test hypotheses
* update theory
* Challenges:
* Share data in a reusable way
* Standards - getting people to use them & develop them!
Solution - people want a convention to follow. They don't necessarily want to build the standards, but in general they'll be reasonably happy to follow them!
BIDS - http://bids.neuroimaging.io
* BIDS apps: http://bids-apps.neuroimaging.io
Research compendium: whole collection of code & data
* How to read guide: https://arxiv.org/abs/1806.09525
* Research compendium: https://research-compendium.science/
* DataONE Reproducible Research Compendia Guide: https://github.com/benmarwick/onboarding-reproducible-compendia/blob/master/packaging_guide.md
![](https://i.imgur.com/jhUcRhO.jpg)
Source: Ben Marwick, Carl Boettiger & Lincoln Mullen (2018) Packaging Data Analytical Work Reproducibly Using R (and Friends), The American Statistician, 72:1, 80-88, DOI: [10.1080/00031305.2017.1375986](https://doi.org/10.1080/00031305.2017.1375986)
### Long term archiving
How long do things have to be reproducible for?
Continuous integration - packaging of the compute environment allows for an analysis to be reproduced on publication, but will it still work a year from now? Will it work in an application? What if things change?
* Note that this is an incentive problem in academia - not rewarded for things working in the long term. Not usually asked to compare to previous published work - looking for improvements and change rather than verification.
Library community need to know about these tools - lots of opportunity for education around **best practises** across the board.
### What are the incentives that we can provide
* Get cited for different parts
* Get promoted
* Make your life easier
* Go faster! Integrating new tools
### Balance of aspirational vs not scaring people off
* Better is better. Doesn't matter which of these points you take on!
* It's ok if you don't take on all the different parts.
* Checklists - everyone likes them! Maybe score each point (0, 1, 2, 3) rather than binary. 0 - not done, 1 - 1st level, 3 best etc.
* DLR Software Engineering Guidelines http://doi.org/10.5281/zenodo.1344612
* Structure the book with a bunch of levels - easy, middling and advanced levels. Don't worry about hitting them all though!
* Important to link to project scope! Don't need to go way overboard :smile_cat:
### Testing
What are the questions that motivated people might have about writing tests?
* What is a test? What is a unit test? What is an integration test?
* When should I start testing?
* Some examples of tests - things that are quite common across projects
* Did the function get the arguments that it was expecting
* Data the wrong shape
* Data the wrong type
* Missing data
* Range of sensible values
* Checking a numerical method/algorithm/function with a known example to check that it does what you think its doing
* What's the difference in documentation and testing
* Bots to tell you what you're covering with tests?
* Run time testing & exceptions etc are helpful to learn about
* Assertions that aren't in the *functions* they're in the "analysis" scripts - these are often not captured as "tests" in scientific research, but many researchers WILL in fact do these types of checks.
Possibly start with focusing on end to end analysis tests! Something that a lot of people will likely be doing anyway! Just maybe not often enough.
Problem to overcome: takes a long time to run an analysis - travis times out after 10 minutes for example! But some other integrations can take a long time - need to figure out how you configure stuff to be able to run really useful analyses.
* https://github.com/poldracklab/fmriprep
* https://github.com/ME-ICA/tedana
SSI blog https://www.software.ac.uk/blog/2018-05-24-five-failed-tests-scientific-software
### Binder resources
- Tim Head's workshop resources: https://github.com/rcs18/Contents/blob/master/Notes.md#tools-for-reproducible-research-workshop--tim-head-wild-tree-tech