[Under Review - SSI Blog] Lost and Frustrated but Persistent: personal narratives about usability challenges with Open Source Scientific Software

# [Under Review - SSI Blog] Lost and Frustrated but Persistent: personal narratives about usability challenges with Open Source Scientific Software By Meag Doherty, Anja Eggert, Yomna Eid, Kjong-Van Lehmann, Christian Meesters, Lennart Schüler, Damar Wicaksono ![](https://i.imgur.com/4FilRe8.jpg) ###### This illustration is created by Scriberia with The Turing Way community. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807 Over the course of four days earlier this month, a group of us at the **Open Science Retreat** [link to event recap blog post] spent time discussing **challenges and opportunities with usability in Open Source Scientific Software** [link to breakout recap blog post]. As a way to wrap up the time spent together, a few group members wrote personal narratives (including self-assigned catchy titles!) that highlight some of the individual and collective experiences in usability. The narrative structure included the following questions: * What is your role and context? * What is the core issue regarding usability in your daily work? * What are the critical barriers you need to overcome? (or maybe you don't know yet!) * Who can help you and the community over this critical barrier? ### The Lost Early-Career Researcher As an early career researcher, I have primarily had the experience of using pre-existing packages and integrating them into the workflows I am developing for my analysis with spatio-temporal data derived from multiple-resources. Having only tinkered around with building my own package, I am lost in a flurry of information. I would like to create workflows that are guaranteed to still run in the future, but it is not uncommon to experience errors with broken scripts due to the package-dependencies. Generally, I try to steer away from unstable packages, but one isn't so lucky all the time. I also have a hard time incorporating certain functions from individual packages into my workflow, on more than one occasion, as sufficient documentation is hard to come by all the time. I am less inclined to use such a package and seek alternatives. **The best packages to use for me are ones that go a step further and provide a test-example of how certain functions work.** It is always easier to learn by example rather than by text alone. And if I magically happen to get so far as to generalize my algorithm and workflows into complete ready-to-go packages that represent my methodology, **I do not know where to start to ensure that someone beyond me would use it and benefit from it.** I need concrete guidelines on what makes up the robustness and usability of the package and the type, amount and extent of through documentation. To me, it seems arbitrary most times and a function of personal taste rather than a standard. It doesn't help that sometimes these standards differ across domains/communities. This compilation of issues could easily discourage early-scientists (likely scientists, in general) from openly sharing their code, as the lack of best practices guidelines in combination with the lack of acknowledgement of the effort that goes into is high investment but low reward. I discovered that there exists a gap between my ambitions and training, that they do not align perfectly to meet their goals. That is partially due to **the ambiguity of what the scientific community up to this point defines as good software/packages, even though we know a good one when we see one**. Without the formal training of a software developer, I ask myself what could be done to make this journey easier and less complicated than it seems? ### The Accidental Research Software Engineer I'm an RSE coming from substantive research, using software as a medium to make the methods I develop more accessible to fellow scientists and students. For me, that frequently means creating graphical user interfaces that reduce the reliance on code and coding skills. I don't have a formal background in usability, so **while I care deeply about making the software useful and approachable, I am limited to haphazard attempts and intuitive solutions.** The bulk of my user feedback comes from teaching students, running workshops, observing colleagues and answering their questions, so I rarely get to observe entirely naive first-time users systematically. My impression is that **(my) interfaces are designed around the underlying technical design of the software, and may not serve users' interests in the best possible way.** I wish there were institutional support for usability research and improvement. My contributions are largely evaluated by their scientific merit and substantive feature set, and **I can't always justify spending time on interfaces alone without expanding functionality.** This does not mean that researchers don't care about usability -- to the contrary, I believe that usable tools are a great way to overcome barriers and pave the road towards better research practices, and that better-designed software has a larger impact. However, this is not reflected in the current incentive system. **I would love there to be design- and usability-related guidance for the entirety of the research software lifecycle.** Starting early on, feedback could help shape the tools and interfaces we build, and better align them with scientists' needs without having to scrap and overhaul existing work. Later, it could point out areas for improvement, and help successful open source tools compete with proprietary alternatives. Throughout, I believe this would be most helpful if it could keep in mind the scientific goals and resource constraints in RSE work, and support developers and researchers in implementing the proposals, possibly by including the user community. Ideally, rather than merely adding to the list of requirements for a research software tool, this could help reduce the support load that many RSEs currently face. ### The Pragmatic Bioinformatics Researcher As a Bioinformatician in an academic setting, I often analyze high-dimensional data using the latest state-of-the-art methods. The decision on which method to use is driven by the expected novelty of the results, the overall performance, matching assumptions and plausible theoretical or algorithmic foundation but not usability. At the same time, we also strive to contribute novel approaches. Both tasks involve running similar methods to decide on the best method to apply or to benchmark the novel approach against existing methods. **Getting other researcher’s methods to run is a time-consuming process** that already stalls at installation procedures and dependency resolutions, input formatting, usage examples, documentation and versioning. While the situation overall has slowly improved, in some cases, the underlying method and published results convincingly suggest a superior performance but the accompanying software is unusable. This sometimes requires to be pragmatic and re-implement the approach from scratch. In a research setting, the final product is not the software but the dissemination of the novelty (e.g., publication). **Therefore, there is little reward for good software engineering and long-term maintenance.** Maintaining relevant software packages past publication takes away time from future research and often, the developer has moved on. Also, Bioinformatics Researchers can write code, but they are usually not trained software engineers. However, their contributions in terms of methods and models from this community have arguably played a part in getting biomedical research to where it is now. The dependency on these contributions coupled with software development without software engineering support probably plays a large role in the current state of research software. Adding **training may alleviate the situation.** But how much software engineering expertise should we be expecting from a researcher? Providing researchers with software engineering support could be an alternative approach. While it seems like an additional expense to an institution, overall productivity and quality may significantly improve. ### The Eagerly Support-Guy "Working on a High Performance Compute Cluster? - Well, if I have to …" The lament is long and ranges from overly bureaucratic access regulations to various technical details rendering one's research on an official cluster impossible. I overheard this HPC-no-thank-you remark frequently in hallway tracks. And working to support cluster users, I have to admit: **Usability and HPC-clusters do not go along well.** Adapting life science workflows is particularly challenging. A single such "workflow" (designed to analyze some sort of data, e.g. genetic material) may require several dozen applications. All of these software tools need to be deployed, I/O-issues need to be solved. Some applications clearly are "runs on my system"-only software packages. Not all HPC clusters allow using “[Bioconda](https://doi.org/10.1038/s41592-018-0046-7)”, a package management systems tailored for the life sciences which can be used by any user to install needed software. As a consequence, a frustrated user base ("No, we cannot support _your_ software.") is inevitable. This HPC attitude forces entire user groups to build and maintain their own compute base. As a result, we see a waste of money and resources poured into redundant infrastructure. Is there any silver lining? In order to meet user expectations, **the HPC community mindset has to change**. Away from [chasing FLOPS and providing ever faster computers,](https://www.top500.org/) towards the attempt to support all scientific domains. This would require to reach out and listen. We would stop providing ["true HPC"](https://en.wikipedia.org/wiki/No_true_Scotsman) clusters and gain a pleased user base. ### The persistent biostatistician Being the statistical consultant in a medium-size research institute, I support the scientists in their data processing and statistical analysis. As I'm an advocate of open and reproducible science, the software equipment I need for my work is beyond statistical software. Using containers for reproducible workflows (like Docker or Singularity) has become a wide-spread approach. Me as a scientist, I see the many benefits of utilizing software containers. But I'm also aware that security risks come along when using it on an institute's server. As I see it,**it should be the task of the IT to set up rules and to find a good balance between security and usability.** Unfortunately, the answer often is: "NO, we will not install Docker on the server because of security issues." Arguing with the IT engineers often feels unfair - I mean I'm a biologist and statistician by training, I'm not a computer scientist or engineer - and I leave such conversations as a loser… I wish I could have argument papers ("How To convince the IT to set up Docker well-thought-outthought out arguments. ### The Altruistic Author I work in an interdisciplinary academic setting where it is already a challenge to find a common vocabulary. On top of that, many people in my fields don't have a very strong background in computer science. As the author of a scientific software package, I quickly realized that **only documenting every single feature is not enough when facing such obstacles.** But step-by-step examples of how to use features of a piece of software can give a good starting point for other people's research endeavors. By seeing the results of small examples, users get a much better understanding of how to use the software. Besides the actual learning, examples can also be a neat way of explaining certain pitfalls or sensible value ranges. There are **not many incentives for investing time and resources into such usability improvements.** But when users have questions or problems, it actually saves a lot of time when you're able to simply refer to an already existing example which shows how to solve the problem. Providing such a reference is surprisingly often enough help users need. Apart from that, getting feedback from new users who are happy because they could set up their research workflows quickly and without much frustration is great. ### The not so unhappy Research Software Developer/Product Owner I am working in Research Data Management of a medium sized, non academic environmental research facility as a developer in several smaller software packages and the product owner of a larger scale data infrastructure project. I am in the lucky situation to be a full time software research developer, without the need to intermingle two very distinct careers paths, namely being a scientist **and** a software developer. Usability is an issue for us, with two distinct sides to it. On the one hand it is an important measure, as user experience is one of the main driving factors for adoption of our products and services and adoption is one of the most important assessment criteria for our work. On the other hand, **we are lacking knowledge, experience and tools on how to assess and improve the usability of our artifacts.** Currently, we are addressing the issue mainly by starting out with our own, but heavily developer and software design oriented ideas on the topic, bring a project into the state of a minimal viable product before we address the broader public with user/stakeholder workshops and introductory courses. While this approach is working so far, it is still rear facing and **addresses usability after the fact instead of putting it into the center of the software development process.** I guess, **what we really need is more internal and/or external formal knowledge on the topic.** Training for developers and probably also external consultants, working with the development teams during the critical phases of a project, would likely help a lot. However, I also think **this would mean another cultural shift in the science-software relationship, before the necessary first one is completed.** We are still in a transition phase, were the idea, that the development of scientific software is a dedicated profession in its own right and not something scientists can or should do during their spare time, is not entirely settled yet. So **we need to see a further professionalization of the software development process in science first**, before we are able to finally integrate more of the ideas of the commercial software development processes into our own culture. ### An anxious research software engineer outside the comfort zone I'm currently working on the development and maintenance of a Python package for high-dimensional interpolation. There are recent results in this topic from applied mathematics that we think are beneficial to bring to the wider scientific audience. Although I'm working closely with mathematicians in the project, I'm not a trained mathematician myself. While this can be frustrating at times (due to my lack of some theoretical foundations), this difference in the background can be an asset as well, especially when it comes to designing public interfaces for the non-mathematicians audience. There are papers, there are software implementations, and there are usable (to the target audience) software implementations. Creating bridges between them is non-trivial. Translating results from papers to (any) software implementation is probably the most straightforward for us, we write the software for ourselves and we use it ourselves to produce even more results. Even if the software is badly designed, one can get used to that and still be (somewhat) productive with it. Creating a usable public software implementation, on the other hand, is tricky. It requires additional knowledge about the potential users, their needs, and their backgrounds. **All the public interfaces and the documentation must be well considered, not to mention taking into account users' feedback from different channels. We want people to use the software, after all.** I think the effort that we put into usability depends on the goal of the project itself. If it's for internal consumption, and it's just about new features and producing the next results, then I think any ways of writing software are good enough as long as the product serves its (internal) purposes. But if one of the goals is also to disseminate the development to a wider audience, then it entails certain responsibility of adhering to the **community standards and best practices of usable open research software.** It's certainly more than, say, simply putting the source code on an online public repository. Adhering to those standards--from properly naming things to writing public documentation--takes time, and most probably would hinder the development of brand-new features. It would be great if the goal of a (presumably open) scientific software project is clarified from the beginning: is it actually just for internal consumption within a small group or indeed for public consumption? If it's for the latter, then **usability should be part of the requirement.** The project lead should understand this, and together with the team, must juggle between different priorities: adding new features, maintaining the old ones, and developing more usable software. _.. Do you see yourself in these narratives? Do you have ideas on how to make things better? Join [the discussion and share your share experience](https://github.com/meagdoh/ssi-fellowship/discussions/12)._

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.