Brainstorming - HackMD

--- tags: sprint, NASA, TOPS --- # Brainstorming ## [Learning Outcomes](https://docs.google.com/document/d/1XJjT5NOvpAlm7YwycM6qKh_FFlkguL6MtF0jRbQwt1o/edit?pli=1#bookmark=id.xfiegql7gxik) As the first part of this sprint, you will develop learning outcomes for your modules. These outcomes will serve four important purposes in the design process and later in the delivery of modules. ### Learning Profile Template - NAME is a DESCRIPTION working at/on INSTITUTION/NATURE OF WORK. - NAME is OVERVIEW OF SITUATION/PROBLEM THEY ARE TRYING TO SOLVE. - If they have this skill, then they could CONSEQUENCE. Try to answer the following questions: - why does this matter in someone's life? - what would they be able to accomplish through this knowledge/training? #### Example Lorena is a post-doc working on an x-ray beamline doing biomedical research. Their particular area of research is loosely computational in nature, involving analytical work that historically has operated at human/manual scale. Their grant requires “open, reprodicible” science. As a consequence of the historical nature of their research, their lab has never previously focused on software reprodicibility, on open source, or on scientific software development beyond ad hoc, idiosyncratic, isolated ventures into computational analysis (e.g., one post-doc independently rewriting a single Excel spreadsheet as a Jupyter to solve some narrow operational problem—regenerating charts easily—without a larger deliberate strategy.) Lorena wants to take a deliberate approach to raising the technical sophistication of their work, integrating into the broader community of scientific researchers conducting open science, and wants to satisfy the requirements of their grant. Through this module, Lorena wants to understand how to share their work with other researchers in their same field. This involves understanding how to appropriately publish their software—e.g., what the impact and consequences of license choice may be—as well as how to ensure that shared software is actually directly runnable by other users—e.g., what the best practices are for delivering reprodicible software artefacts, incl. technical aspects like versioning & containers & collaborative aspects like testing and documentation—and how to appropriately cite and credit the work they are building on (and what expectations they have for how their own work can be cited and credited.) ### Learning Profiles #### Taher Chegini Tyler is a graduate student who recently started working on his thesis in the computational hydrology field. Through his research he is trying to gain a better understanding of the urbanization impact on hydrological processes in urban areas. He has been reading relevant research papers and notices that some authors share their code via some repository services online and some share theirs on-demand. Upon going through some of the publicly available source codes he makes some oberservations: * He is able to run only a few of the codes and reproduce their results * He could understand and follow the logic and structure of only a few of the published codes * Some codes gives explicit permission to reuse the code with some limitations some don't * Some codes have instructions for community contributions and raising issues and asking questions, others either don't have such instructions or they provide the code as-is. Through this module, Tyler wants to gain a better understanding of pros and cons of open-sourcing codes for more effective dissemination of his research. Since he is not a software engineer he wants to understand how following best software development practices affects his ability to perform his research, is it true that open science leads to slow science? Additionally, he is worried about getting into legal trouble by using or getting inspitations from the codes that others have pubished. He is interested to know how others will aknowledge and give attribution to his work. #### Ana Vaz B is a professor/PI of oceanography in Brazil. B has been developing numerical models of ocean circulation for decades, as well as writing code to post-process his results. B teaches courses about oceanography and manages a lab with undergraduate and graduate students. He shares his codes with students within his lab, and they share with each other. They do not have resources for paid software (ie, Matlab), and so open-source tools are fundamental for their work. B would like to learn how to access resources that can improve the openness of science in his lab - how to be open but also get credit for your work. He is also interested to access other’s resources effectively, saving time for research and class preparation. C is a manager of a government agency branch dedicated to research. C is a researcher familiar with data-analyses and a user of open sources applications but not a developer herself. She is interested to learn more about open software development to encourage the use of best practices in the team she manages. She is also interested to learn about it to be able to review projects that will make use of open source software and/or development - what are the best practices, are the projects accomplishing what they have proposed regarding open-science? D is a postdoc working on a team to write his first grant as a PI. They have benefited from open source software and data, and have shared some of their own (as allowed by their PIs), but now they and their co-PIs want to include open-science from the inception of their project. They are grasping about all aspects of it - - when to share - as the project unrolls, once all data is post processed and verified? As the code is developed or after it is done and checked? - what to share - raw data, post processed? Incomplete code or a clean/commented/completed version? - what licenses to chose from? what platforms to use? #### Johanna Bayer H. is a PhD candidate working and studying in a graduate program at a University on her cumulative PhD thesis. Her topic is on using computational and machine learning methods on big open neuroimaging data sets. H. has a background in a non-technical field, but has acquired basic computer science skills (basic command line, script-based analysis language and statistics) as part of her graduate studies and by teaching herself while working on her projects. As part of one of her projects, she writes a small software tool which she wants to publish and make openly accessible. She also has to run extensive analyses pipelines on her data (which is quite big). She has heard of open science and software and would like to make her analysis reproducible. Through this module H. is hoping to learn how she can publish and maintain her software in an effective manner (version control), and how to choose a license that suits her needs, She also would like to publish the analysis pipelines to her papers. She has heard of unit testing, but has no idea where to start. #### Babatunde Onabajo John is a postgraduate student studying economics at university. John is specialising in financial economics and is interested in innovative fields he has heard about such as "machine learning" and "natural language processing" but does not know where to start. John does know that some software that deals with this, like Stata, are expensive and the university he is at has not made this available to him or other students. He also knows that much information relating to financial economics - such as stock prices - are freely accessible and open to the public. John does not have any knowledge of coding, which is fairly common in the economics field. Through this module, John hopes to learn how open source software can greatly eliminate the issue of cost when it comes to using machine learning and natural language processing in financial economics research. John will also learn about how to effectively maintain and manage code (version control), choose the appropriate licence (e.g. the difference between an MIT License and an Apache 2.0 licence) as well as about repositories such as Github. John also hopes to learn about the difference between using a graphical user interface (GUI) and a command-line for coding. At the end of this course, John will learn about the importance of the open source community and how it saves effort in ensuring people do not reinvent the wheel. John will also be provided with information on how learning about open source software can improve employability. ### Learning Objective Template - Today… CURRENT SITUATION - As a result… CONSEQUENCE OF CURRENT SITUATION - After completing this module… WHAT THE MODULE WILL DO - In this module… WHAT MODULE MUST CONTAIN - - - #### Example of One Objective - Today, I write software that I want to be able to share with others. - As a result of this, I need to understand how licensing works so that these codes can be shared and appropriately disseminated throughout my field. - After completing this module, I should understand what the choices of software licensing are and I should be able to make a decision about what license is appropriate for my work. - In this module, I should see a reference listing of all OSI-approved licenses with direct guidance for how to pick the appropriate one and guidance for how they differ. ### Learning Objectives Johanna Bayer: Upon completion of this lesson, participants will be able to… 1. understand the requirements of open software (basic standards, order of code) 2. understand the requirements of documentation of software, how to write (and as a consequence also read) software documentation 3. understand basic and different types of code testing (unit tests, black box testing etc..) and how they can improve your software 4. understanding different version control systems, how they work, what they can be used for 5. understanding differences between software licenses and how and which to choose for which project 6. understand how to facilitate open and public contribution and maintenance to open software Ana Vaz: Upon completion of this lesson, participants will be able to… 1. Understand and define concepts and key terms related to Open Software. Make sure your code is open: reproducible and shareable. Articulate the benefits of being a developer and user of open software (as a scientist, researcher and developer in an academic or industry context and as a citizen). - e.g. Stronger science, reproducible, reusable, contribution and feedback from a large user base, collaborate with a diverse userbase, follow grant guidelines (accountability), gain hands-on experience on development and maintenance. - present examples of open-source software from big to small. 2. Navigate through hurdles (real and perceived) of open source, gaining knowledge to overcome common fears (sharing imperfect code, authorship questions, worries not getting recognition for your work). - Licenses, their different attributes, reasons for choices. - Authorship and DOIs. - Expectations under different licenses for users and developers. 3. How to download and use open source software. How to properly cite open-source software (with or without a DOI). How to make contributions to a software (from reporting bugs to making changes). - Understand how forking - and copyright ownership - works. Examples of contribution workflow. 5. Share their software at different stages of development. From making a plan for the development of software that will be effectively open at the conception of proposals to sharing code that has been sitting in their computer for decades. - Adapt code already developed for sharing (not perfect, better shared than not). - Implementation choices. - Human readability (format, comments, style, static code analysis). - Documenting your work: - Comments for easy of understanding and portability - Readme files bolts and nuts (what your code is and what it does, how to install, run and use it, necessary computational environment and dependencies, the software license, contributors, links and help). - Examples for testing and pre-post processing code if applicable. - Checklist for code sharing (e.g., license, generate DOI, upload code). 6. Navigate the post-upload. Continuous testing, negotiating feature requests, working with a diverse audience. *Notes:* - I think we talked a bit about academic/funding/better science, but having open-source software development experience is an important asset for post-ac jobs - can help with demonstrating technical abilities, gain reputation, experience working on complex software development and maintenance - e.g., companies that will take into account candidates gitub contributions. Other topics: negotiating feature requests, implementation choices, and work with a diverse community to achieve to goal. This might be a plus to motivate undergrad/grad students/early carreer researchers. - I would love if the module could touch on the expectations for users for contributions - in my experience, we get a ton of requests for new features, but never get versions that were modified by users with new features. It is hard to navigate (PIs/institutional) expectations and receive support for development/maintenance when there is only a one-way development hapenning (so not really community based, despite the fact that the community uses and adapts the tool). - Another note on inclusivity: sharing a link to a non-public access manuscript in your readme is not accessible for many researchers with limited access to paid subscriptions (eg. Global South, Federal agencies, independent researchers). Inclusion of a pre/post print (eg arXiv.org) with a permanent link might be a good solution. Sierra Kaufman: Upon completion of this lesson, participants will be able to… 1. understand the importance of version control for reproducibility, execute basic version control commands (link to external resources), know the difference between version control and version control repository sites, and be comfortable uploading existing code without previous version control onto common repository sites in order to begin the process. 2. write descriptive documentation for an existing code, understand the major elements that should be included in that documentation (not just a description of what the software "is supposed to do"!), explain the benefit to providing example notebooks that show the software in action for future users, and know the options for serving/hosting those notebooks (especially if there are large, many, or remote example input files). 3. be confident that sharing code/software used for publications before it looks "production ready" is preferrable to never sharing and be able to identify resources for improving their code cleanliness. 4. explain what a software license is and why it matters, identify why and when someone may pick a more vs. less restrictive license, choose an appropriate license for a specific project, and abide by the license on software they are using. 5. generate DOI for their own software, cite software with a DOI, and cite/acknowledge open software without a DOI. (Note that version control repos are not archival grade and therefore these are two different things.) 6. articulate the benefits to both to themselves and others of open software, and analyze common concerns with sharing software and concerns with *using* open source software. *other topics to consider:* - test coverage/setting up tests for the software (the importance and credibility that this lends software is real, but test suites are **extremely** rare in scientific software, perhaps it would be good to encourage them to be more commonplace, but perhaps we'll scare people off with too much effort to be open), - contributing to existing open source software (even through something like a fork), - getting permission to share software you've made yours based off of and how authorship should be assigned related to this, - using open source software for your open source software development, e.g. python vs. matlab code, (keeping in mind many people only know one software language or are limited by the physical hardware they are using in the case of scientific instruments) Taher Chegini: Upon completion of this lesson, participants will be able to… 1. recognize the cost-benefits of open-sourcing a software; facilitate adapting FAIR practices, code quality concerns, user expectations, e.g., request for new features and addressing issues 2. gain a general understanding of the professional software development and how non-computer science researchers can benefit from them without adversly affecting the progress of the most important aspect of their research: Answering research questions and help their peers to build upon their work 3. write effective documentation to improve reproducibility and facilitate community engagement 4. criticize and analyse the quality of the existing open-source software from software development point of view 5. make contributions following the guidlines provided by developers, if exists, or general software development code of conduct 6. understand existing open source licenses and their implications so they can 1) select the most suitable license for their software, 2) assess compatibility of licenses of the software that they plan to use, and 3) and give proper credits to the software and other code snippents (e.g., from stackoverflow) that they use in their code Yeo Keat Ee: Upon completion of this lesson, participants will be able to… 1. Understand the concept/definition of Open Source Software and the importance of it. (Give some examples of open source software) 2. Understand what are the challenges of open source software and how to overcome it. (List down the types of software license and differences) 3. How to share the code/software 4. How to generate a proper documentation for the open software. 5. How to maintain 6. Babatunde Onabajo: Upon completion of this lesson, participants will be able to… 1. Define and understand what open source software is as well as provide examples of open source software in the real world, giving examples of what kinds of open source software are used in which sectors. The participant should also be able to understand that free software is not necessarily open source software. 2. List and critically understand the advantages of open source software compared to proprietary software. 3. Be able to push and pull code from a local source to a remote source as well as to a repository such as Github with the held of an Integrated Development Environment. The participant should also be able to understand about version control and its practical importance. 4. Be able to write clear documentation and etiquette in responding to requests from others. 5. Understand the various licences associated with open source software. 6. Critically evaluate existing open source software from the viewpoint of a developer, and outline ways that they can be improved. 7. Attain a knowledge of the various ways open source software fund themselves, if at all. *Notes* * In relation to objective 3, the candidate will **not** be expected to do this using Git as this is very advanced. * In relation to objective 6, the candidate will be expected to evaluate open source software from the viewpoint of the community itself, the frequency of its updates, security, accessibility, performance, transparency as well as reproducibility. ### Room 1 & 2 Learning Objectives - Why Open Software? - How to share your own code - How to use others Open software - citations, ensuring others have DOIs - selecting the right repo - Code management/quality - Documentation - Maintaining: Post-upload - Licensing - Contributing to existing repos *29th June 2022: Babatunde Onabajo has suggested having a separate segment for monetisation/funding opportunities, as this will likely be of interest to those interested in the module. Why Open Software? (Fundamentals): Be able to articulate the benefits of open software. - include encouragement to the be a good community member and contribute to other code - “Just Share” - Definitions: key concepts and terms - cost -benefit analysis - Community engagement - Increase trust - FAIR practices Code quality (standards, testing) - Testing - Human-readable code (literate programming) - Code commenting Code management (version control) Licensing - The license can dictate how you can use other code (and if you can take authorship after the changes you make) Code sharing (documentation) Citations (DOI) ## Learning Objective First Draft ### Will Cover - Why Open Software? (Fundamentals) - **articulate benefits of open software** - There is a significant economic benefit to open source software. Research has found, for example, that if one version of Linux was to have been developed by commercial means, the estimated cost would have been around $9 billion using the Constructive Cost Model (COCOMO). - Code management/quality | Ensuring your code/the code you want to use is shareable - What does it mean to be of quality? - testing? documentation? “human-readable”? - Understanding code quality provides baseline expectations when looking for open software - Balance: “Just share it!” With “code quality” - what do I need to know for someone who wants to use my code should expect - avoid embedded PII - Post-upload: did I just become a permanent maintainer? - **Identify/Analyze markers of shareable/transparent/FAIR open software** - Licensing/Ownership & DOIs - Ensure you get credit for your work, legal implications, licensing compatibility - **Choose and abide by appropriate usage and referencing standards** - How to share your code and find shared code? - Step process? - Dealing with closed-source software - Contributing to existing projects - forking ### Won't Cover ## Learning Objectives (under revision) 1. Navigate the benefits and hurdles that producing open software encompasses. 2. Identify key markers of transparent software in others and own code. 3. Differentiate open, reproducible, and executable code. 4. Decide amongst specific licenses to ensure ownership of shared code. 5. Appropriately cite others software in publications. 6. Properly publish software so others may access, use, and grow. ___________________________________________ ### Outline Draft (Taher, Ee, Ana, Sierra, Babatunde) #### Why Open Software? (Fundamentals): *Be able to articulate the benefits of open software* - Introduction - Definitions: key concepts and terms - What is open software? - Software or code that is available for use by others without cost (add some research developed software that is something you would cite here as examples) >Ana's notes: >- I think we should share our definition (and/or present references). Open software can be commercial (!!). >- I understand we need to focus on "open software" instead of "oss". It feels that most of our content is direct towards OSS. Is our objective to have more researchers sharing their OSS? If so, we can add that sharing an executable is a _fantastic_ first step, but think about making your code open. - a figure on the "lifecycle" of os would be nice (planing/developing/testing/sharing/maintaining/community building (they are all continuous)) - Reproducibility - Reproducibility is obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis. - Reusable - definition - Replicable - Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data. Two studies may be considered to have replicated if they obtain consistent results given the level of uncertainty inherent in the system under study. - Software repositories - A tool for distribution of software where end users install your software e.g. PyPi, CRAN, Conda - Code repositories - A tool for distribution of the source code itself e.g. GitHub, GitLab - Preservation repositories - A tool for preserving your software e.g. Zenodo, Figshare, Hydroshare - Testing - definition - Code vs. Software vs. Algorithms vs. Scripts vs. Models - Basically we're referring to all of them as code/software here. Any stage is okay! - Democratization - definition - Version control - definition - Cost-benefits of open software - Having your code/software in a repository makes it findable and searchable! - Sharing your code increases trust in your science. It allows your results and how they were generated to be examined by all. - State this from a user perspective as well: A user can make sure they aren't using software as a black box. When there is an open software choice you can benefit from using that over a closed version. - Work can build upon prior work more easily rather than duplicating and redeveloping (this allows for better stewardship of funds) - Community software /support - Help improving your code - Having a larger user base (more feedback!) - More collaboration requests (from a more diverse base) - You might also get a lot of feature requests that you don't want to do or don't have time for! (feature/scope creep) - Some extra effort required - This will make your code more usable even to you! - You'll have a backup that won't depend on your own hard drive (and is usually free) - This is a journey, not a destination, even just publicly sharing your code is a great start initially - The more effort you put in to making a cleaner, more accessible code the more others will want to use it and collaborate with you - Open software experience is an important asset for many jobs it can help with demonstrating technical abilities, gain reputation, experience working on complex software development and maintenance - e.g., companies that will take into account candidates github contributions or want to see prior code they've written and shared. - Shared code increases the democratization of science and promotes more diverse and inclusive groups to be able to use research products without a cost prohibitive barrier to entry. - Some grants may require open science practices and this helps meet those requirements. - Open software allows more people to be involved in direct development, using programs not available to everyone decreases the pool of people who can contribute - Security concerns - Some people may have concerns open software is not as safe, just make sure you're downloading your code/software from an authoritative source like another researcher's project repository rather than a third party. #### Code management/quality: *Identify/Analyze shareable open software* - Introduction - Define "code management" and why it's important - While we maintain that sharing software at all is a great initial first step, the more the code is kept clean, maintained, and documented, the more others will be able to cite, use, and contribute to it. - What does it mean to be of quality? - Here we outline some baseline expectations for a user of open software. While there are definitely good software projects out there that do not include all of these items, in order for a person to evaluate whether they can and should use your code, they will expect certain items to be present. Conversely, if you're the one using someone else's code here is a list of items you can look for to identify usable and maintained software. - Contains good documentation - When someone comes across my code, what explanatory or descriptive overviews do they expect to see? - ReadME - Describes what the software does - How to install the software - When someone comes across my code, how do they expect to “install” it? What is packaging? How does this relate to “reproducibility”? - Options for installation: software repositories (what level of depth do we want here. I don't really see a way to touch this topic without going into specifics) - Links for language specifics (C, C++, Java, IDL, shell, bash, Python, [disciplines might have others]) - How to run your software - Might seem obvious for the developer, but not always simple for the user - Gives some usage examples - Describes the outputs - Details and notes, caveats, and the state of the software (Is it production ready or is it in alpha and a baseline code? Is it constantly maintained?) - How can issues/bugs be reported (if at all) - Contact information for the developer/researcher that wrote the software/code - License type - List dependencies and computational environment - Useful links, DOI of publications that used or describe the software. - e.g. for sections in a README: https://link.springer.com/content/pdf/10.1007/s10664-018-9660-3.pdf - Other resources (github, etc) - Operating system and/or browser the code should work in (compatibility) - Does the code work, for example, only work on Windows or Mac OSX? - If the code relies on the internet and has a visual interface, does the code depend on a particular kind of browser? Does the code use JavaScript? - Contribution instructions - Code of conduct - Dependencies - list what they are and what versions are being used - Example notebooks (This becomes language specific, perhaps just example code snippets/test cases) - When someone comes across my code, what example use-cases do they expect to see? Where and how are these useful? - Interactive notebooks can be useful tools to show additional usage examples for your users. - Journal/Research articles you've used this code in - It's important to make sure any publications you include in your documentation are not behind a paywall themselves; especially if they are fundamental to describing the software and how it works (This could be difficult and might not be always be possible, we can use "preferably") - Clean/readable code - There are many standards for coding (link to some examples) and each has their own philosophy on what constitutes "good" or "clean" code. - Some important things to think about here is if someone else would be able to pick up and read your code and understand what it was doing. - Comments can be helpful for explaining the code, but are not always the necessary or correct solution. For example, sometimes the desired effect can be achieved simply with a more descriptive variable name. - If you wouldn't know what was going on in your code if you left it for a year or two and came back, others probably can't understand it either. - As a user, --- what should a user without coding experience look for to know it's "good" code??? --- - Static code analysis is a tool that can help with reducing errors, performing quality control and increasing consistency (share resources? eg. https://the-turing-way.netlify.app/reproducible-research/code-quality.html) - Version control - The importance of version control for reproducibility - This ensures each change of the code base is tracked and users can determine which they are running - Any bugs that are accidentally introduced can be more easily found and removed because you can go backwards and check for them without losing your place. - The difference between version control and version control repository sites - git is not the same as github! (I know there are other options than git) You can have version control locally and you can choose which repo site you want to use - Reiterate again that version control repositories are not static versions of the software and are therefore not citable. You'll need to use an preservation repository (which can link back to your version control repository) for a DOI to make it citable. - Basic version control commands (link to external resources) - At least explain what a push and pull are and what a pull request is. These basic terms are necessary to decrease the barrier to entry and many texts online expect a base level of knowledge that are above this. - be comfortable uploading existing code without previous version control onto common repository sites in order to begin the process. - Using a version control repo doesn't automatically mean you're doing open software. Version control is good practice regardless of sharing and you can use a private repository and then share when you're ready with the push of a button instead of needing to do a ton of upfront work to get it done. - Testing - When someone comes across my code, what kinds of testing do they expect to see? Where and how are these useful? - Tests confirm that the same input into your code/software will have the same output each time. They are a critical part of ensuring your software is creating reproducible science. - A large portion of scientific software doesn't have tests. Having tests increases credibility of the coding and ensures the software is actually doing what it says it is doing and what it is supposed to be doing - There are many types of tests that operate on different scales like end-to-end testing and unit testing (tests each function in a software) - As a user, a software/code with good test coverage (meaning the percentage of lines in a code that the tests actually test), especially that you can run, should give you more confidence in the software and peace of mind that it is something you can trust to use. - To learn more about testing, types of tests, and how to write good tests check out these resources: - link external resources on testing here. - Containerization - definition - what and why is it needed - package version, operating system issues etc - Post-upload: (I don't love that we're using the term post-upload like if something is uploaded it's the time that it's shared. Maybe change or add to post-sharing?) - Did I just become a permanent maintainer? - It's not a requirement to maintain - How do I convey whether this code is actively maintained? (Why is this important?) - How do I deal with all of the requests for features/bug fixes etc I get? - What if I don't have continued funding for this project? - You don't have to continue - You can hand it off to someone else - You may be able to commercialize - Changes needed with funding (federal and private) to allow funding for continous funding of widely used open-software - What are the responsibilities for maintaining the codes I've share? - How can I be more deliberate about my availability to maintain the codes I've shared? - How can I publish an open source tool without undertaking an open-ended responsibility to maintain it? - Etiquette in responding to requests from others. #### Licensing/Ownership & DOIs: *Choose and abide by appropriate usage and referencing standards* - Introduction - What is a software license? - Why do we need a license? - What if a (my) software doesn't have a license? - You technically (ethically) cannot use this software! You can reach out and ask. But if you share your software without a license no one can technically use it! - Types of licenses - Common licenses that are used in open software - Table of license types (or outside link) - How to choose a license - Specific steps on how to apply the license - Format the machine readable license - Compatibility of licenses of software used in your code - What is an OSI approved license? - Dual licensing models and how one can work openly while simultaneously retaining the rights for commercialization potential - Attribution and citation: - How to make sure you get/give credit for code. - What is a persistent identifier (DOI, software heritage)? - Code snippets used as is (add a comment with the shareable link) - Adopt vs. Attribute other's code - When to cite a software - Publishing open software: JOSS, Astronomy and Computing, Zenodo, etc. - Benefits include guidance and peer review on your code and licensing - IP, Ownership, Licensing, and Security Requirements (ITAR, GDPR) - Hyperspecific to your institution. You need to look to them. - include a checklist for upload? e.g - prepare your code, create a repo, get a DOI, get a license, prepare your README, and other files (code of conduct, etc). I use this site: https://choosealicense.com Turing Way site chapter on licensing: https://the-turing-way.netlify.app/reproducible-research/licensing.html Code publication: https://scicodes.net Computational Infrastructure for Geodynamics - example of a preservation repository that provides peer review: https://geodynamics.org A resource for when to cite software: https://f1000research.com/articles/9-1257/v2 #### Contributing to existing projects: *Verb-based learning objective here* ##### Types of contribution to an open-source project - There are multiple levels of contribution: - Add new features - Fix bugs/issues - Suggestions about improving the code - Improving and contributing to documentation - Create tutorials, use cases or visuals - Improve layout, automatization, structure of code - Code review ##### How to contribute? Files you should check out - Readme.md file: gives first information/summary about the project - Contributing.md: gives information about how to contribute to the project. Explains how the contribution process works and what type of contributions are needed. While not every project has a CONTRIBUTING - CODE_OF_CONDUCT: The code of conduct sets ground rules for participants’ behavior associated and helps to facilitate a friendly, welcoming environment. While not every project has a CODE_OF_CONDUCT file, its presence signals that this is a welcoming project to contribute to. Contributing via a version control system (this would be a very nice pic) This chapter assumes that the user has some basic knowledge about git. Important terms: committing, cloning, forking, branching, remote, origin, upstream, push & pull - When you have decided t contribute to a repository, you usually don’t have rights to commit directly into that repository. - Thus, you need to create a fork (a copy in your own repository) that you can write into. - You can clone this fork onto your local machine - Make changes - Push changes to origin (your fork) - Pull request of changes into upstream (the original repository) - Maintainer of upstream repository ahs to approve ##### Types of contributions (Padhye et al.) - Core developer making changes - External contributions via fork that go back into the upstream repository - Mutants (forks that are not committed back into the upstream repository - rare) ##### Branching - Branches are a branching point in your fork - Technical description with pointer - Allows to play experiment with code/ new features - Software development etiquette: stable main branch and development branches, that only get merged into main if stable. - If you don’t have commit rights in the upstream repository, you will have to create a pull request from your branch into main ##### Merge conflicts - Definition - How to avoid - How to solve ##### Best practises (https://deepsource.io/blog/git-best-practices/) - Adhere to templates when opening an issue - Make clean, single-purpose commits - Write meaningful commit messages - Commit early, commit often - Don't alter published history - Don't commit generated files - Refer to issue when creating a pull request - Assign reviewers ##### Naming Etiquette - Deprecated terms - Ambiguous terms ------ - Define "Forking" - When it's appropriate to "make your own" fork for a different direction (ethics of this?? CHECK THE LICENSE!) - As long as the license allows redistribution of the code then this is alright - Make sure you're citing the original work and pointing back to the original repo - - Getting permission to share software you've made yours based off of and how authorship should be assigned related to this - To what degree are your modifications substantial enough to say this is a different software? - using open source software for your open source software development, e.g. python vs. matlab code, (keeping in mind many people only know one software language or are limited by the physical hardware they are using in the case of scientific instruments) - How to make contributions to a software (from reporting bugs to making changes). - Etiquette in making requests/ code of conduct. Ref about github's readme files: https://link.springer.com/article/10.1007/s10664-018-9660-3 Turing way collaboration: https://the-turing-way.netlify.app/collaboration/collaboration.html External contributions https://dl.acm.org/doi/10.1145/2597073.2597113 _____________________________________ ### Question Based approach 1. Why to share your codes/why to use others open codes 2. How to make sure anyone actually uses your codes 3. How do you get credit for your work; how do you make sure that someone can (legally) use & remix your work 4. --- 1. Why share software? 2. Common fears about sharing software 3. How you share matters: - Software repositories - Software licenses - Software DOIs 4. Code quality 5. How to attribute/cite open software ## [Lessons](https://docs.google.com/document/d/1XJjT5NOvpAlm7YwycM6qKh_FFlkguL6MtF0jRbQwt1o/edit?pli=1#bookmark=id.w8rsga2goz4p) > [name=criddell]These are all replaceable. This is just a suggestion. 1. Navigate the benefits and hurdles that producing open software encompasses. 2. Identify key markers of transparent software in others and own code. 3. Differentiate open, reproducible, and executable code. 4. Decide amongst specific licenses to ensure ownership of shared code. 5. Appropriately cite others software in publications. 6. Properly publish software so others may access, use, and grow. ## Example/Suggestion of Question-based Outline 1. Does anyone really want to see my codes? - Sharing internally (within lab/colleagues) - Sharing externally 2. How do I share my codes… from an institutional perspective? - If I work at a university, who is the authority on how I can share and publish my research? - If I work at a government facility, who is the authority on how I can share and publish my research? - If I do classified work at a government facility, who is the authority on how I can share and publish my research? - Who owns the codes I write? 3. How do I share my codes… from a legal perspective? - What is a software license? - What are the common licenses? - How do I pick a software license? What is an OSI-approved license? - What should I be aware of when using other people's codes? - How do I ensure that I follow the terms in the licenses of other people's codes? (Why is this important?) - What is “forking”? 4. How do I share my codes… from a practical perspective? - When someone comes across my code, what explanatory or descriptive overviews do they expect to see? - When someone comes across my code, what documentation do they expect to see? Where and how are these useful? - When someone comes across my code, what example use-cases do they expect to see? Where and how are these useful? - When someone comes across my code, what kinds of testing do they expect to see? Where and how are these useful? - How do I convey whether this code is actively maintained? (Why is this important?) - When someone comes across my code, how do they expect to “install” it? What is packaging? How does this relate to “reproducibility”? 5. How do I make sure I get and give appropriate credit for my work? - What is a software DOI? How do I get issued a DOI? - How do I cite other's software? What are the best practices for citations? What do I cite and what do I not cite? (How do I know what to cite?) 6. Is software either strictly “open” or “closed”? - Can closed source tools be a part of open source science? 7. What long-term responsibilities do I accept when I share my codes? - What are the responsibilities for maintaining the codes I've share? - How can I be more deliberate about my availability to maintain the codes I've shared? - How can I publish an open source tool without undertaking an open-ended responsibility to maintain it? ## [Lesson anatomy](https://docs.google.com/document/d/1XJjT5NOvpAlm7YwycM6qKh_FFlkguL6MtF0jRbQwt1o/edit?pli=1#bookmark=id.m4erqqxyrvo) - Each lesson ~30 mins, between 3000-4000 words - Between 2500-3000 words if we have 5 min video + 5 min assessment activity - Introduction - 1-3 paragraphs discussing the material and subtopics that will be covered in the lesson. This may include historical or positioning details, concept definitions, and transitions from the previous lesson. (300-500 words) - Main narrative - Written in paragraph form, should align with at least one of the defined learning objectives for the module. Please note concepts or content where examples, case studies, or stories may be included at the instructional design phase. (1,500-2,500 words, depending on size of lesson) - Summary - Generally 1–3 paragraphs reiterating the major concepts and points discussed in the narrative that the learner should have learned during that lesson. (300-500 words)