changed 5 months ago
Linked with GitHub

SORTEE Code Club Hackathon: Creating a Code Standard

tags: sortee open-code reproducibility

This is the collaborative notebook for SORTEE's Code Club Hackathon: Creating a Code Standard for the Society for Open, Reliable, and Transparent Ecology & Evolution (SORTEE).

Session #1: Wed Oct 16, 06:00-07:55 UTC +00:00 - slides
Session #2: Tue Nov 19, 09:00-10:30 UTC +00:00 - slides

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Links to more information:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Link to Hackathon Github page: CodeStandard

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Make sure you are familiar with SORTEE's Code of Conduct so this can be a safe and fun place to learn and discuss.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Hint: HackMD is great for sharing information in this kind of online set up, as the code formatting is nice & easy with MarkDown! Just add 3 ticks (`) for thecode blocks.
Otherwise, it's like a Google doc: it allows simultaneous editing. There's a section for practice down there ⬇️

Table of Contents:

Hackathon outline

Audience = Anyone with a Github account and experience in coding for scientific analysis is welcome to participate!

🔗 You will receive the zoom link to participate in the Hackathon on SORTEE's Slack

Why a Code Standard? Publishing our code and data is an important Open, Reliable, and Transparent (ORT) practice to ensure the reproducibility of research. A field-specific Code Standard can facilitate the production and reviewing of code as an accessible and easy way to implement ORT practices in your own coding by example.

Which code will we work on? The selected code is for a simple ecology/evolution analysis, coming from the published paper: Van Dis et al. (2023). Phenological mismatch affects individual fitness and population growth in the winter moth, Proc Roy Soc B, 290(2005)

How will we work? We will collaboratively edit a piece of R code in this Github repository. We will split up into six groups, with each group having a specific goal for code editing:

  1. Reported
  2. Run
  3. Reliable
  4. Reproducible
  5. Organisation & Structure
  6. Other (opinionated) considerations for public code sharing

(following the the 17-step code review checklist for Ecology and Evolution)

How to get started? Join the main breakout room for your group on zoom (find your group here). Together discuss what would constitute the "perfect" ORT piece of code considering your group's focus area (see above).

Below you will find suggestions of tasks for each group to get started. Discuss, edit and supplement these tasks. When you agree on a task, self-organise who will work on this task. The task owners can go to a side breakout room to start rewriting the code.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
When code editing, make sure to use git and frequently commit and push/pull to the Github repo to keep track of your changes. Make sure you are on the hackathon2 branch.

ICE BREAKER (practice HackMD editing)

Let's learn how to use this HackMD document by answering an ice breaker question!
Somewhere on the screen (probably at the top), you should see three icons

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
(a pencil, a two-column window, and an eye). You can add your answer by clicking the edit button
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
(the pencil).

Q: What is the word for to code in your first/second/third language?

Answers:

  • coderen (Dutch)
  • programar (Portuguese)
  • codgozari (Persian/Farsi)
  • coder (French)

Collaboratively writing the "perfect" Open, Reliable, and Transparent (ORT) piece of code

We have breakout rooms standing ready in zoom: one main room for each group plus additional side breakout rooms for task completion. Join the main breakout room for your group and start discussing!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
COPY-PASTE ME:

  • TaskX: [description]
    • Considerations:
    • Ideas:
    • Example code:

Group 1: Reported

  • Task1: Check if the code matches the methods as described in the corresponding article, metadata, and/or documentation. If not, edit

    • Considerations: Read the methods section of the paper and check along the code if the variables are correct and that the analyses is done like reported.
    • Example code:
  • Task2: Add comments to code, if necessary

    • Considerations: Add comments to the code that makes it easier to link to the reported methods/results in the paper.
    • Example code:
  • Task3: Consistency in naming of variables

    • Considerations:
    • _Ideas:_Keep consistent names for variables from paper to code
    • Example code:glm1 <- glmer(Event ~ (MismTreat1 + MismTreat2)*PhotoTreat + (1|TubeID), family=binomial, data=d_surv, na.action="na.fail", control=glmerControl(calc.derivs=F)) uses TubeID instead of MotherID, despite the paper saying that "mother ID was included as a random effect"

Group 2: Run

  • Task1: Does the code run in its entirety and without issue? If not, edit

    • Considerations: Even if code matches the methods reported in the paper, this does not mean that the code is executable. Programmatic and syntactic errors can make the code fail to run.
    • Ideas:
    • Example code:
    • Comment: rmisc was used in line 77, but not loaded beforehand. Caused error if user didn't have package installed.
  • Task2: Add a R project file to repo

    • Considerations: After download, the repo needs to be a standalone project without need to manually change working directory
    • Ideas:
    • Example code:
    • Comment: Added R Project at the root directory.
  • Task3: Discuss/research and implement an easy way for a user to recreate the analysis R environment

    • Considerations: Make it as easy as possible for someone to run the analysis code (preferably with as little manual/user specific steps as possible after downloading the repo)
    • Ideas: write a script with if statements for installing packages with the right versions; or find out if there is an automatic way to replicate the environment when e.g. using renv package lock files
    • Example code:
    • Comment: renv is currently one of the best approaches to allow other to reproduce the analyses using the same R environment and requires no additional manual steps, unless necessary system dependencies are missing. Therefore we implemented this approach.
  • Task4: Add data Alternative: write script to download the data ==>

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    to be discussed

    • Considerations: Dataset should be included alongside code if possible (?) <- BUT data management and technical considerations to not store data on git
    • Ideas: Write R script using R package rdryad to atuomatically download and store the data + all data files listed in .gitignore file
    • Example code:
    • Comment: We added the data file into the _data folder.
  • Task 5: Keep all objects in the environment

    • Considerations: When objects are created and later deleted using rm in the same script, it can be confusing for users - it makes it hard to go back and look at objects from a previous step. Created objects should be preserved within a single script.
    • Comment: We removed the rm calls and made sure those variables weren't reused in a way that might cause any issues.

Group 3: Reliable

  • Task1: Check if the columns of the data are correctly selected. If not, edit

    • Considerations: It is better to select variables explicitly than by index.
    • Ideas:
    • Example code:
      • good: selected_data <- data[, c("size", "color")]
      • bad: selected_data <- data[, c(3, 7)]
  • Task2: Avoid overwriting columns/data objects

    • Considerations: Avoid overwriting columns with the same name, especially in the case of factor() and the labels argument -> keep data.frame as is after read in and make a new data.frame that is data wrangled, and then can use the two to compare and do unit checks
    • Ideas: Use a slightly different name and keep original data column names consistent, to ensure that the name pairings have not been incorrectly overwritten
    • Example code:mutate(Treatment=factor(Treatment, levels=c("ChangDay-4", "ChangDay-3", "ChangDay-2", "ChangDay-1", "ChangDay0", "ChangDay+1", "ChangDay+2", "ChangDay+3", "ChangDay+4", "ChangDay+5", "ConstDay-4", "ConstDay-2", "ConstDay0", "ConstDay+2", "ConstDay+4")), ... could use Treatment2 instead -> more descriptive name needed! e.g. Treatment_relevelled
  • Task3: Check that main decisions are clear to find. If not, add

    • Considerations: Important configuration values/parameters are stored in variables and these important variables appear at the top of the code/section
    • Ideas: see "How-to guide for code sharing in biology paper" under Useful links below for an example
    • Example code:
  • Task4: Does the code include "unit tests" or other checks that verify the code is working as intended? If not, add

    • Considerations: For example, following a bit of data wrangling or transformation, is there code that checks whether the transformation has been accurately implemented?
    • Ideas: If there is an error/mismatch in an unit test, break/stop the code + throw an error (should be useful error message)
    • Example code:
      Check if the variables have the right structure:
      validation <- validate::validator( is.characted(id), is.integer(X), is.factor(group) ) data_check<- validate::confront(data, rules, raise = 'all) validation_summary <- summary(data_check) if (any(validation_summary$fails >0 | validation_summary$error>0)) { warning("Error message") }
  • Task5: Add comments on what the desired output is of the code

    • Considerations: Also link to Reported? e.g. output=Table 1
    • Ideas:
    • Example code:
  • Task6: Implement html report?

    • Considerations: maybe here just having a html outptut or something equivalent (using e.g. Rmarkdown, Quarto) is the simplest solution ? that way you can directly see what you are supposed to obtain (including graphs, tables etc) > Could make reliability checks easier e.g. for reviewer
    • Ideas:
    • Example code: rmarkdown::render()
  • Task7: Add comments about what values etc to check for statistical analysis

  • [ ]

Group 4: Reproducible

  • Task 1: Test and fine-tune renv implementation

    • Considerations:

        1. Make sure renv installs the correct package versions!
          How-to: Install packages with right version (code does not need to be included in repo?) and reinitialise renv / update renv.lock file
          The package versions used in the original analysis are in the _scr folder
          - 2. Check that all necessary files have been included in the repo to easily recreate the analysis environment with renv after repo cloning (and make sure all not necessary renv files are in .gitignore file )

      How-to: Clone repo and run the code

    • Question: how to work in another R version than the one installed on your machine?
      answer: using newer versions should not be a problem with renv, it might be a problem when using older version than the one use to produce the code.

  • Task2: Are the results and conclusions reproducible from the code as provided?

    • Considerations: Run the code and check that the results (and figures) match the ones reported in the paper.
    • Ideas: It will be useful if the code has comments linking the code to the paper results (figures, sections for example) (task 3)
    • Example code:
  • Task3: Does the code include clear documentation that detail the rationale behind it? If not, add

    • Considerations: e.g., rationale behind data wrangling decisions and analytical approaches should be documented
    • Ideas:
    • Example code:
  • Task4: Is the whole workflow code/script-based? If not, write code for the manual manipulation parts

    • Considerations: The workflow should be self contained (e.g., the code does not involve steps outside the script or pipeline, such as manual manipulation in other software like Excel).
    • Hint: Have a look at the Results section 3(a) in the paper - final paragraph, can you find the code to reproduce the percentages fitness loss mentioned there?
    • Ideas:
    • Example code:
  • Task5: Use of multiple R versions:

  • Task6: Check the need set seed

    • Considerations: Check if the analysis has any random processes that needs to set seed for reproducibility.
    • Ideas:
    • Example code:
  • Task7: Check the possibility for use Docker

    • Considerations: To have a fully reproducble environmet we can use Docker, with has the r version, packages but also the OS system. It needs to be checked how to in=mplement in R.
    • Ideas:
    • Example code:

Group 5: Organisation & Structure

  • Task1: Edit Repository structure

    • Considerations: Is the repository well organized? If not, edit
    • Ideas:
    • Example code:
  • Task2: Should the script be split into multiple scripts, one per main task?

    • Considerations: Discuss if it is better to have a single script (long) with all the steps or if it is better to have multiple scripts where each one produces an output that is used in the next step.
    • Ideas: If the script is split, than it could have the data preparation (01_data_prep.R), modeling (02_modeling.R) and summarizing results (03_summarizing_results.R) seperately
    • Example code:
  • Task3: Add a consistent code style

    • Considerations: Good readability of code is very important in enabling effective code review. Change the code to have a easy to read and consistent style. As a suggestion, you can use the tidyverse style.
    • Ideas: Keep snake_case vs. camelCase
  • Task4: Improve code readability

    • Considerations:
    • Ideas: Improve readability sticking to right-hand line margins when possible and using line breaks. Additional line breaks allow more opportunities for comments/notes.
    • Example code:
df |> 
  mutate(name="var",
         name2="var"))
  • Task5: Does the code use informative names for objects/variables? If not, edit

    • Considerations:
    • Ideas:
    • Example code:
  • Task5: [description]

    • Considerations:
    • Ideas:
    • Example code:

Group 6: Other (opinionated) considerations for public code sharing

  • Task1: Rewrite the code to increase efficiency

    • Considerations: For example, you can implement functional programming (e.g., use generalised, custom functions or for loops to repeat processes)
    • Ideas:
    • Example code:
  • Task2: Edit the code so that when calling functions, it explicitly calls the package namespace (i.e., package::function())

    • Considerations: More transparent and clearer which functions came from which package
    • Ideas:
    • Example code:
  • Task3: Make sure the code contains no hard-coding (i.e. the code includes no sections that assign fixed values or data directly rather than using variables)

    • Considerations:
    • Ideas:
    • Example code:
  • Task4: Make sure documentation (metadata, README) includes clear citations to related materials like data and the article

    • Considerations:
    • Ideas:
    • Example code:
  • Task5: [description]

    • Considerations:
    • Ideas:
    • Example code:

Add here any links to resources you think could be useful for the Hackathon.

Other TO-DOs

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Become SORTEE Code Club Leader 2025
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Are you interested in Open Science practices related to code and code review? Would you like to learn more? We are looking for you to lead Code Club in 2025!

No prior experience needed, just a willingness to learn and invest some of your time.

Perks: meeting lots of nice and like-minded people, the chance to develop your leadership skills, and planning Code Club sessions on topics that you’d like to learn more about.
Tasks: scheduling Code Club meetings, planning topics and potential speakers, and writing debriefs.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Applications to volunteer for SORTEE in 2025 are closed, but please feel free to contact the chairs of the committees you're interested in directly if you'd still like to apply!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Add to SORTEE's Library of Code Mistakes
The SORTEE Library of Code Mistakes is open for editing!

If you feel comfortable, please add your coding mistakes (and if possible how to recognize them). This way we can build a resource of (common) code mistakes that people can use during coding and code review.
NB: You can make your mistake as anonymous as you like.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Want to contribute to Code Peer Review in the future?
Add your details to the Find an ORTEE Code Reviewer list

Give feedback

Any feedback on this Hackathon or Code Club in general is welcome! Things you liked, things that could be improved, topics you would like to see in upcoming Code Clubs etc.

Feedback Session #1:

  • Nicely organised hackathon! It substantially improved my familiarity with code review
  • I did not know about HackMD and will start using it in my teaching. Thank you!
  • The time for discussion at the end was a bit short, but looking forward for the code club meeting.
  • it was a really nice idea for a hackathon
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Feedback Session #2:

Select a repo