# GitHub in EcoEvo manuscript outline Journal options (**please indicate any preferences**): - Nature Ecology and Evolution (They have perspective / comment articles. Would need pre-submission inquiry) > [Matt: I think NEE is worth a try - they seem to like things that are a bit different - worth sending off presubmission inquiry; Rob: that sounds good to me, why not try with a pre-submission inquiry?]] > [Rob: I'm now remembering that the open access fee for publishing in NEE is something like US$ 10k. I'm reviewing an article for them now. Maybe I can see if there is some flexibility? I checked with NEE, since they offer an option to publish for free behind paywall, they don't offer waiver to get open access.] 1) PLoS Biology?? [Rob: This could be a good fit too, Cole: I think any PLoS could be a great option] Saeed: Yes indeed. I already read a apaper maybe relevant to our munusrcipt conrtent: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000763. Rob: That's a great paper, and let's talk more about PLoS option. 5) Data Science Journal 2) Ecology and Evolution 3) Trends in Ecology and Evolution (would have to submit pre-submission inquiry) 6) Rio 7) Methods in Ecol Evo?? (would need to be a non-traditional paper)[Matt: Note to myself - check with Bob about if this would be suitable to MEE] 8) Ecosphere ## Motivation for the paper _Distilled from our [Sept meeting notes](https://hackmd.io/mdkgtm9YRpyqZVxiOM1Qjw)_ GitHub is really like a swiss army knife, with so many different tools that can enhance **collaboration**. > Luna > I am not a fan of the "swiss army knife" concept, since it alludes to something that can be used as a weapon, too. And it has the word "army" in it. Also, I really think GitHub is much more powerful than a swiss army knife. While some of it's strength comes from the amoung of tools that it provides, it's main strength comes from it's openness and the community that keeps on building off those tools. > We want to present a friendly guide to GitHub and all the things you can **currently** do with it. Very few papers focus on GitHub as a tool for collaboration (and we should also mention where GitHub falls short). ## Title options A friendly guide to how GitHub can benefit your Ecology and Evolution research Using GitHub for more collaborative Ecology and Evolution research Not just for programmers: How GitHub can make your Ecology and Evolution research more collaborative > [Matt: all are great - I like the "not just for programmers" phrase - it would help set up the paper flow too] Saeed: I also like that part because signaling to a broader range of researchers including programmers but also maybe early career independent researchers etc. GitHub as a tool for transparent and reproducible research GitHub: the Swiss army knife of Ecology and Evolution research Not just for programmers: A friendly guide on the versatility of GitHub > [Kaitlyn: I like this last one; the "not just for programmers" and "friendly" are nice additions. I just might add "ecology and evolution" in here somewhere too, so our audience is very clear to anybody skimming the title]. Rob: Good call Kaitlyn, thanks > Luna: I also like the not just for programmers part, but might be good to stress that it is for the Ecology and Evolution community. And why exactly it is worthy to use GitHub, what are the benefits, like enhancing collaboration, but ultimately it might be accelerating research. How about: Not just for programmers: A friendly guide on the versatility/benefits of GitHub for accelerating collaborative research in Ecology and Evolution ## Abstract ## Introduction * High-level/general background about GitHub * For decades, the software development community has used the web platform GitHub, and its underlying version control system Git to collaboratively work on code. Now has over 73 million registered users (GitHub, 2020) (Ali: Currently, 73+ million developers, 4+ million organizations etc. https://github.com/about. Rob: Thanks for updating this, Ali! ) (Brandon: I don't think GitHub has been around "for decades" so we'll definitely have to be careful with wording here, making sure to make clear distinction between git and GitHub). Great point Brandon! we'll have to be clear on the difference in how we talk about git vs. github, and I'll dial back that part about "decades" * GitHub is an indespensible tool for software development community becuase it provides a full "audit trail" on files and folders stored in a github repository in a way that's more structured and less ad hoc than passing files back and forth (Ram et al 2013). * As projects get more complex, with more files and collaborators, the tracking changes/audit trail continues (Ram 2013) * As ecologists and evolutionary biologists start to use and collaborate on computer code as part of their research many are interacting with GitHub for the first time. * For some first-time users, the GitHub learning curve can seem overwhelming. * We want to show that there is a wide range of ways to leverage GitHub's decades of development as a place for collaboration, sometimes without even knowing how to code. * We are focusing on how researchers in EEB can leverage existing tools to make the most out of their research and collaboartive projects. >[Eric: I think it's important that we acknowledge that some of the GitHub features are made with software development in mind, and that one of the goals of this paper is to 'translate' the features into uses for EEB] +1 Rob: Great point Eric, agreed that this should be mentioned right up front in the manuscript. * What's already been written about GitHub * There’s lots of detailed info in other papers about version control. Much less about using it as a tool for **collaboration**! * Maybe point to a resource map (in supplementary material? or as table?) * We'll avoid going into the underlying version control language (called git) since that discussion can get very technical very quick, and there are already many papers (Blischak et al. 2016; Perez-Riverol, 2016), books (Bryan 2018), tutorials go into that. More importantly, the GitHub platform is so robust at this point, that you can take advantage of many collaborative aspects without knowing even a line of Git code. * But an increasing number of researchers and organizations are focused on using programming and open science practices, including GitHub to foster collaboration among researchers (Lowndes et al. 2017). * Based on our lit review, here’s where you can go for more info * Also, GitHub is not the end all be all, take the elements of GitHub that work for you, but this can be done on many platforms. * What's already been done in terms of GitHub in EcoEvo * Very friendly description of what GitHub is and the main uses and advantages of using it in the natural sciences back in 2016 (Perkel 2016) * Open science and computational reproducibility go hand in hand - embracing GitHub is mentioned as part of their practical guide for open science (Powers and Hampton 2019) (see also Lowndes et al. 2017) * We must prepare young researchers for the computational expectations the future by engaging them in the process now, and creating mentoring relationships and incentive structures to promote open science (Powers and Hampton 2019) * What's missing about GitHub in EcoEvo and our objective: Introducing the GitHub ecosystem that's composed of many different elements! * Simple habits (of which github is one component) can do a lot to make research more reproducible and collaborative (Alston and Rick 2021) * In EcoEvo Github use is predicated on an understanding in R. This close connection has some benefits, but other programming languages are frequently used by researchers (e.g. Python, Julia). Lots of ways to use GitHub that are independent from R. We have in this hackathon a definite focus on R tools for interacting with GitHub, but sometimes the issues we present at 'Github' issues might be more about the ways that we interact with Github (i.e. through R vs. bash shell) The importance of using GitHub in research: Transparency in science is necessary To enhance visibility and to benefit from contributions from the community; for example in the field of behavioural ecology etc... public/private repositories ## Box 1: Overview of Basic/general/most used GitHub features A box for definitions of GitHub features (and maybe links to more info about how to use them). These then will be referenced throughout the use cases section. - Issues - Fork - Discussions - Release - Push/Pull - Projects - Actions *[Kaitlyn: there is a similar table in the [BES guide to reproducible code](https://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf), see page 34. It might also be valuable to do step back in this table/box and clarify what is "git" vs "GitHub" (vs "R" vs "RStudio"? or maybe we want to keep this language-agnostic) for people who are starting at a more basic level]*Rob: Oh yeah for sure clarifying git vs. github will be important, thanks for bringing that up. Not sure in terms of being language agnostic though. Since we have an EcoEvo focus, majority of people will be using R (I would think) so pointing out R vs. R studio could be helpful. That might set us up for talking a bit about how the GitHub widget in R studio can help people interact with Git without using command line. ## Example use cases (going from least to most collaborative?) Go through the examples, but make sure we are inspiring people to try out the examples. [Matt: A useful feature of lens.org is that it shows you where in a full text you can find your search term. I searched GitHub in Ecology journals and it returns 280 articles - most of these are "use cases" (rather than journal articles about GitHub). It would be quite easy to classify these *c.* 280 papers by use case (storing code, functions, standards definitions , storing data [even though it's frowned upon by data scientists ecologists still do it], etc.). Is this worth me doing?] ### Storing and archiving version-controlled data * Another potential use case/user perspective: Some people are just using GitHub to backup their data, use their code on differnt machines. Just push and pull (Box 1) from their own repo. * The entire repository is version-controlled so that earlier versions of data/code can be resurrected (and a history of changes/mistakes are saved). * GitHub integrates with Zenodo, a popular, free data archiving service funded by CERN. After linking your GitHub account to Zenodo and turning on archiving, any time a release (Box 1) is made, a snapshot of the entire repository is archived in Zenodo with a versioned, citable DOI. This can be used as a simple way to meet data availability requirements for a journal, but can also be used to provide access to code and supplemental information to reviewers as well as readers post-review. [*Eric: I think I know a couple example repos with releases made pre- and post- review, if that's helpful*][*Cole: I think also could be useful to pull in some other tools here. For example, with a little bit of work you can make your repository binder-friendly which means it's just one-click run-able for anyone who has never worked in R before. For those more computationally experienced, you can link repositories to tools like Docker/Singularity for increased reproducibility*] ### Virtual lab notebook * commits as a way to record daily progress * issues as a way to keep track of short-term objectives/goals, and progress towards them ### Responding to reviewer comments * using github issues (Box 1) to organize and respond to reviewer comments on a manuscript. See example [here](https://github.com/BrunaLab/HeliconiaDemography/issues?q=is%3Aissue+label%3A%22reviewer+comment%22+) ### Classroom teaching / Developing educational materials * Matthew D. Beckman, Mine Çetinkaya-Rundel, Nicholas J. Horton, Colin W. Rundel, Adam J. Sullivan & Maria Tackett (2021) Implementing Version Control With Git and GitHub as a Learning Objective in Statistics and Data Science Courses, Journal of Statistics and Data Science Education, 29:sup1, S132-S144, DOI: 10.1080/10691898.2020.1848485 * Aud Halbritter and Richard J Telford 2021 Git and GitHub https://biostats-r.github.io/biostats/github/ Motivation and step-by-step guide to using Git and GitHub with RStudio and `usethis` package * Rob is working on meta-analysis book here: https://github.com/robcrystalornelas/meta-analysis_of_ecological_data * Github Learning Labs, official self learning site, step-by-step, guided tutorials about common uses of GitHub. https://lab.github.com/ ### Collaborative manuscript ### Project management (Madison thought you might want to work on this paragraph) * issue & discussion features - can assign tasks, seek feedback, bounce ideas around, troubleshoot problems * Can talk about ESS-DIVE's project management using ZenHub/Jira * Tsitoara, Mariot (2019). Beginning Git and GitHub: A Comprehensive Guide to Version Control, Project Management, and Teamwork for the New Developer. ### Building website * Seems like the technical aspect of this is discussed in Dawson, Chris (2016). Building Tools with GitHub: Customize Your Workflow. O'Reilly Media * GitHub pages allows any .html document to be rendered as a website with a URL. This could be, for example, a report written in markdown or R Markdown rendered into a .html file. A simple use would be to create a shareable report of statistical analyses or figures to collaborators (EXAMPLE), but this feature can also be used to create interactive web apps (EXAMPLE), online books (EXAMPLE),... * This is a nice feature of GitHub, but it is also much more difficult to navigate than an 'out of the box' website builder like Squarespace or Wix. Just adding a caveat that GitHub's capacity to build websites is very flexible, but a bit more of a learning curve than other places... ### Making code citable * Linking with Zenodo, etc. to achieve a DOI helps work become findable, gives proper attribution (Hampton et al. 2015) * It is important to remember that GitHub is NOT a long-term data/code repository by itself (accounts can be deleted at will), so adding GitHub links in papers (which I've seen plenty of times), is not a good practice (imo). Instead, including a DOI (like above bullet) is better. ### Collaborative (code?) editing * Is it worth walking through how collaborative code editing works through GitHub, or just pointing to all the available resources for this? (e.g. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request) ### Writing manuscript * Caveat that GitHub has been called out for not being so user-friendly for manuscript development (Ram 2013). But getting better? > [Dylan: the discussion of the writing of this manuscript (and how we aren't using GitHub), might go well here?] Rob: :+1: * Tools that link with GitHub have been developed with synchronous writing in mind. HackMD provides a collaborative writing platform based on Markdown that integrates with GitHub. We used this platform early on in the process of writing this manuscript to generate an outline. >[Emma: from Pedro's presentation - use manubot (inspired by Green lab), but discussions work better than issues for things that don't need to be 'closed' e.g. papers that might be useful, so reserve issues for real problems with the manuscript- write each sentence on a new line, commit often e.g. every time you write a new sentence, break each section assigned to a different person into a different file. Get ~2 people to confirm any merge commits, citations are very easily with DOI links in text, includes markdown tips in the included documentation (usage file)] ### Open science discussions (e.g., github issues and discussions tab) * https://github.community/ ### Project continuity * 'thus preserving the long-term integrity of the project even as collaborations form and shift.’ (Hampton et al. 2015) * better to have old versions on GitHub than on somebody's personal hard drive! * allow for sustainable hand-over of research software, especially for graduate students or postdocs moving on to new positions (Fehr, J., Himpe, C., Rave, S. and Saak, J., 2021. Sustainable Research Software Hand-Over. Journal of Open Research Software, 9(1), p.5. DOI: http://doi.org/10.5334/jors.307) ### Asynchronous working ### GitHub organizations * Lab organization as a place to house research compendia as well as codes of conduct, protocols, training documents, etc. (documents that evolve over time and are shaped collaboratively) * Students can have full ownership over repositories in organization, but stay with the lab after they've left. ### Other uses (we could group misc. items here if we don't want to describe every example listed above) * **Developing data standards** This is pretty specific case from our team at Berkeley Lab, so I can write about it in a sentence or two here. * **Code review** I (Eric) can talk about this a bit, as I've been through rOpenSci's code review process, and also caught mistakes in code of published papers that could have been caught in peer code review. Also maybe say something about ReproHack. ## Discussion * General paragraph on GitHub on how, given all the potential uses of GitHub, it can enable more collaborative EcoEvo research * Despite all the awesomeness of GitHub, there are still plenty of times when you might look to other plantforms for collaboration *[Kaitlyn: Agreed, we don't want this to come across as an advertisement for GitHub but a thoughtful discussion of its strengths and weaknesses!]* * If you are interested in realtime editing, GitHub may not be for you * Whenever we type, we see things changing in real time. For GitHub, we have to push the information. GitHub doens’t allow for realtime live collab without something on the side. Something like hackmd. This is a deterrent to using it for manuscript. * Dropbox can at least tell you when you have a file open. Maybe the X is not a full X since we can’t collaborate at the same time. Dropbox has a new feature that opens a Microsoft Online tool within it, which allows for collaborative live/real-time editing of documents. * no concept of commit in google drive. With github, you can commit your 3 files at the same time. Nice to track across. multiple files within the one commit. * https://www.nature.com/articles/d41586-020-00916-6 * For this paper, we found that, at least in the intial stages we used a variety of platforms depending on intended use * HackMD or Google Docs for collaborative meeting notes * Gooogle Slides for working on Figures * Google Sheets for working on tables * HackMD for outlining manuscript * Why aren't more people using GitHub (make sure this focuses on GitHub and doesn't overlap much w/ the paper Dylan is leading on sharing data and code)? Could also just be replaced/combined with another idea we had for a paragraph about the "pain points" to using GitHub [*Eric: I'd maybe include some of this in the intro and frame it as the problems we are aiming to solve with this manuscript*] Rob: Good call Eric, thanks. We do already have a paragraph in the outline related to this, so seems like a good for for the intro/framing of the paper. * Learning to use Github requires time, but the payoff is *[may be?]* worth it. * Time vs. effort examples or analyses to demonstrate the payoff can help drive the point home to convince people to learn these tools * New job? Interdisciplinary collab? May have to be prepared for working with version control software. * What's the minimum github knowledge that you need to know to start using github * And, if you make a mistake, you can always undo it! * Will save you time down the road, when you need to reproduce your analyses, revise a paper, share your code with others, etc. * Fear of scooping or sharing data also feeds into reluctance due to worries of intellectual property. * to keep the repository private until publication, then make it public * make use of github licenses to indicate that material on github is citable! * And journals (e.g., AGU-related journals) require data/code sharing and they don't consider it 'shared' unless there's a working DOI For it. * But publishing data/code on github is not a 'one shot' aka eminem approach * >[Emma: re: Eminem approach (love the name), we can tie into the preprint process here as reducing the pressure for perfection throughout the research process and the recognition that researchers make mistakes and that the real goal is to advance understanding] * Addressing ways to share links to private code for reviewers or collaborators without making everything public would be helpful for this * There's a risk when people starting using a new software or programming language, is the time investment going to be worthwhile. But academics may be willing to take the risk to be "on par" with whatever is current in their field or adjascent fields. * Can describe as part of this section, or in entirely differnt section a general bias toward global north using GitHub in EcoEvo. DEI initiaves could increase access and use? ![](https://i.imgur.com/KVWQcPp.png) > [Dylan: It is amazing how biased the use of GitHub is! I was not expecting this. One small note is England should be consumed by "UNITED KINGDOM", right?] >[Emma: Great point here, I think we should mention that we don't want this to become another barrier/advantage held by researchers in Global North] >[Cole: Agree, definitely. Interesting because technically git/github are both free free != accessible, which I think we could highlight. An inelegant analogy, but similar to how simply releasing the IP on a covid vaccine isn't enough, support is needed to develop local capacity. You can't just make a product *available* if the support to integrate that product isn't also included] >[Kaitlyn: also some issues with restricting access to users from certain countries due to US sanctions. See [Twitter thread](https://twitter.com/saba_a/status/1215006080922570754)] * Our own limitations since we are mostly writing from the EcoEvo perspective/ additional github limitation * Reliance on R since we are generally in EcoEvo * Discussion of free vs. paid plans. When projects get highly collaborative may have to add / pay for accounts. At this point, little difference between paid and free. * Some GitHub features get discontinued e.g., GitHub's speakerdeck which was referenced as a nice feature of github in Hampton et al. 2015 * Just using GitHub is a good start, but there are lots of other practices to make your GitHub repository more user-friendly (related to Figure 2) * For example, Culina et al. did assessment of whether repos in Ecology had re-usable code, and they defined reusable as having a README file. I think we can argue for doing better than just having a readme file. >[Emma:wondering if we can end off with our 5/10 tips for how to gain knowledge/practice with GitHub here] Rob: Yes, let's do this as a nice conclusion kind of paragraph. >[Kaitlyn: I think we'll also be remiss if we fail to acknowledge the political controversies surrounding GitHub/Microsoft... calls for boycotts, etc] +1 Rob: Agreed, that can be it's own paragraph in terms of limitations for sure. ## Conclusion ## Tables and Figures ### Table 1 ![](https://i.imgur.com/YcLX9r4.png) ### Figure 1 Pedro also suggested incorporating a 2nd scale (maybe difficulty) like [this](https://store-images.s-microsoft.com/image/apps.30407.a39cd404-cf0e-4a2f-8e52-b93cf5172f28.37bb6413-ace0-4e24-b7ec-9539a7945bf5.fd284510-309b-4ee8-a6ba-44b5d6d82a5c.png) or [this](https://jscharting.com/examples/chart-types/scatter-plot/quadrant/) ![](https://i.imgur.com/JhUBg7a.png) ### Figure 2 ![](https://i.imgur.com/GDfcY5u.png) ### Figure 3 (maybe Box 2?) Based on [this concept](https://rstudio-education.github.io/learner-personas/) by Rstudio ![](https://i.imgur.com/Ju4JUvW.png) ## Contributions to manuscript using [CRediT taxonomy](https://casrai.org/credit/) Conceptualization Funding acquisition Investigation Resources Software Visualization Writing - original draft Writing - reviewing & editing