---
title: Board Meetings
tags: board, statistical-software
robots: noindex, nofollow
---
# Agendas and Notes for Statistical Software Board Meetings
These notes are included in a [`hacmkd.io`
organisation](https://hackmd.io/@stat-software) which contains draft standards
for the categories selected for prototype development.
## Overall Project Timetable:
- End August 2020: Draft standards for all categories
- End September 2020: Revised standards, prepare for public release and publication
- End October 2020: First running demonstration of testing and reporting tools
- End November 2020: Public Announcement of impending opening of system
- End 2020: Demonstrated system with both publicly accessible API and local
tools (as R packages) for assessment and reporting on category-specific
software.
- Jan 2021: Begin accepting packages for review
---
## Agenda 04th May 2021
Meeting will step through [this demonstration of editorial process](https://github.com/ropenscilabs/statistical-software-review/issues/7). Note that your roles will be as handling editors. We anticipate most submissions obtaining no red crosses in the initial section, and so being passed from the Editor-in-Chief straight to you, where you may have to consider some of the details of the package report.
In the context of this role, we will ask each of you for feedback on the following questions:
1. Are there any general aspects of the general review process we might have missed?
2. Is the information contained within the initial automated report sufficient? Is it clear? Could anything be added or removed?
3. What do you anticipate being the most likely difficulties we may face when following this process?
4. And finally ... Do you think that process is sufficient for us to announce a public launch? If not, what else do we need to develop in order to do so?
RK:
- Likely need to go through review process to really know
- 2 reviewers is fine; editors may act as reviewers as last resort
BB:
- Agree with RK
- Suggestion that we all do an internal submission/review of our own packages?
LCT:
- having check package available for general use will be useful. Example: https://bioconductor.org/packages/biocthis which enables creating a GHA workflow with biocthis::use_bioc_github_action() that runs BiocCheck::BiocCheck(), which helps prepare your package before submitting to Bioconductor (like make sure that you are passing the checks before you submit it).
- having version versions for the check system will be useful (seems doable if it's a package). That is, as someone submitting a package, I'd like to be able to reproduce the errors/warnings shown in my computer, then verify that my changes address them, before re-submitting (though well, you could also trigger a new build).
TH:
- fix bugs in check system and good to go!
---
## Agenda 29th March 2021
The three tasks to be discussed and decided upon are:
- Discuss final steps prior to official launch of system
- Agreement on proposed badging system
- Transition of board members from current roles to roles as handling editors
following system launch
### 1. Official Launch
We believe we are in a state to be able to publicly launch the system very
soon. The general procedures expected to be followed by submitting authors, by
the editorial team, and by reviewers, are described in Chapters 3--5 of the
[*Statistical Software Review
Book*](https://ropenscilabs.github.io/statistical-software-review-book/index.html). Briefly:
- **Authors** must assure robustness of their packages via the [`autotest`
tool](https://github.com/ropenscilabs/autotest), and document compliance with
standards via the [`srr`
package](https://github.com/ropenscilabs/srr)
- Automated checking system features online endpoints including one which
performs numerous checks to confirm that software may be submitted and/or
considered for review.
- **Editor-in-Chief** generally only needs to confirm that package passes
primary and singular check, and proceeds to delegate handling editors.
- **Handling Editors** clarify desired grade (bronze, silver, gold) at end of
review; find and assign reviewers; and address prior to review any aspects of
automated checks unable to be fulfilled.
- **Reviewers** use the [`srr` system](https://github.com/ropenscilabs/srr) to
assess compliance with standards, then proceed to a general review through
addressing specific questions identified in the [*Guide for
Reviewers*](https://ropenscilabs.github.io/statistical-software-review-book/pkgreview.html#general-package-review).
**Questions to be Discussed**
- What else do we need to have in place prior to official launch?
- Is there anything missing from the general procedure? Anything we might have
failed to consider?
- Are the [general questions for
reviewers](https://ropenscilabs.github.io/statistical-software-review-book/pkgreview.html#general-package-review)
sufficient? Could anything be improved?
- Is the proposed system for stating and documenting a [*Software Life
Cycle*](https://ropenscilabs.github.io/statistical-software-review-book/pkgdev.html#pkgdev-lifecycle)
sufficient?
### 2. Badging System
We propose a system of **bronze**, **silver**, and **gold** badges, as
described in the [*Guide for
Authors*](https://ropenscilabs.github.io/statistical-software-review-book/pkgdev.html#pkgdev-badges).
- Might there be any preferable or alternative systems?
- Are there any potential issues with using the terminology of bronze, silver,
gold?
- Are the requirements for the [silver
grade](https://ropenscilabs.github.io/statistical-software-review-book/pkgdev.html#pkgdev-silver)
both appropriate and sufficiently clear?
- Could distinction between these three grades be formulated differently? How?
### 3. We want you as Editors!
Who of you is willing to act as handling editors following launch? Envisioned
roles of handling editors are described in the [*Guide for
Editors*](https://ropenscilabs.github.io/statistical-software-review-book/pkgsubmission.html#handling-editor). While we ultimately are looking editors to serve two-year terms, currently we asking that members of this board will act as interim handlers as we test the system in the next six months or so as we test the system.
Beyond that, note that there are currently five of you, while standards have
been developed for seven categories. Please add your name to any category for
which you agree to act as a handling editor, and please reach out to anybody
else who you know who might be able to help with any of the missing categories.
1. Bayesian and Monte Carlo Software - Ben (one of), Paula
2. Exploratory Data Analysis - Paula, Leo
3. Machine Learning Software
4. Regression and Supervised Learning Software - Ben (one of), Rebecca
5. Spatial Software - Mark, Paula
6. Time Series Software - Rebecca
7. Dimensionality Reduction, Clustering, and Unsupervised Learning Software - Stephanie (only starting in Aug 2021 though)
_Leo_: anything that involves some genomics/bioinformatics. (if that's allowed)
_Rebecca_: also happy to handle more environmental and health related submissions.
The following standards will be developed following launch:
1. Wrapper Packages
2. Network Analysis Software
3. Probability Distributions
4. Workflow Support Software
---
## Agenda 16th Feb 2021
**NOTE**: Please attend our upcoming [Community
Call](https://ropensci.org/commcalls/) on March 2nd 2021.
Meeting will be divided into two parts:
**First 20-30 minutes**: Demonstration and discussion on tools for developers
to assess their software prior to submission.
- Input, feedback, opinions on system for "injecting" standards into code,
system for developers to address those standards, and system for
automatically collating reports on standards adherence.
- Meeting will start with a short (5-10 min) demonstration of the system.
- Concrete questions will include:
- Current system uses three forms of tags (default/standard complied with;
NA; and TODO). Is this sufficient? Any other suggestions?
- Current system relies on `roxygen2` "roclets" to record and process
standards; are there better approaches?
- [`autotest`](https://github.com/ropenscilabs/autotest) and
[this system](https://github.com/ropenscilabs/srr) together present quite
a high hurdle to gain initial entry to *start* a review process. Is this
likely to act as a deterrent? If so, what can or should we do to negate
any deterring effects?
**Next 20-30 minutes**: Discussion of remaining tasks prior to launch.
Priority tasks in no particular order include:
- [ ] Resolve levels for green/silver/gold badging, which means only resolving
what "silver" means. *Proposal*:
- **Green** = decision to accept
- **Silver** is a statement of *intent* rather than status, that developers
have made some progress towards gold standard, with more to come.
- **Gold** = All standards deemed by reviewers to be *potentially*
applicable have been adhered to, and `autotest` passes cleanly.
- [ ] See which board members want to serve as associate editors for first few months
(together with Mark and Noam).
- [ ] Finish remaining standards (*Probability Distributions*, *Wrapper
Packages*, *Networks*, and *Workflow Support*).
- [ ] Decide whether packages will generally be integrated into
[`github.com/ropensci`](https://github.com/ropensci), or whether that will be
optional only, with authors able to retain packages in their own
organizations.
Non-verbal update - upcoming automation work:
- [ ] Integrate [`autotest`](https://github.com/ropenscilabs/autotest) output
with [`srr`](https://github.com/ropenscilabs/srr) to automatically
pre-populate standards checklist items.
- [ ] Complete extraction of [`srr`](https://github.com/ropenscilabs/srr)
documentation from code to populate report on standards compliance.
- [ ] Ensure stable prototype server to deliver combined
[`autotest`](https://github.com/ropenscilabs/autotest) and
[`srr`](https://github.com/ropenscilabs/srr) reports via bot
([`buffy`](https://github.com/openjournals/buffy)) command.
## Agenda Dec 1st 2020
- Introductions of new members (Paula, Leo)
- Two (plus one) main discussion points:
1. How will the process be organised?
2. What does "acceptance" look like?
3. (If time permits) What happens after acceptance?
Explicit questions to be addressed are *in italics*.
**1. How will the process be organised?**
Note#1: The current rOpenSci system is primarily organised by an occasionally
revolving team of around six editors (with no "specialist" or subject
editors). Their primary tasks are (i) to determine whether software is in
scope, (ii) to find and assign reviewers, and (iii) to manage the review
process. A number of aspects might be adopted and/or modified for our
statistical software systems, for which important questions include:
- *Do we integrate within current system, with perhaps a few more editors to
handle increased workload?* or,
- *Do we integrate within current system, but have statistics-specific editors
for any statistical packages?*, or
- *Do we keep submission and review system for statistical software separate
from previous system?*
Responses:
- SH: Leverage the system/people that you have, bring in additional editors as needed
- BB: Existing editors should know to recognize/assign, opportunity to expand the brand
- RK: Keep infrastructure, bring in editor to spearhead/be figurehead
- SH: What would be the role of statistical editors?
- NR: a little more outrach and recruitment probably than current editors, especially for a statistical "editor-in-chief"
If system is to be kept separate,
- *How to we "brand" accepted packages?*
- *With current rOpenSci badge plus additional one for standards?* or
- *Modified version of current branding and badging that includes reference to statistical software project/programme?* or,
- *Branding and badging distinct and separate from current rOpenSci system?*
Responses:
- RK: What about multiple badges for different standard categories?
- NR: We should record which categories are reviewed, and badge could link to checklist/cover
- MK: What about post-review development
- NR/MP: We'll return to this
- BB: Passing at different levels is important
- RK: We definitely need badging as an incentive
- NR: How should we do graded badging?
- BB: Need to have the subjective approach included, maybe not go to the edge of quantifying everything
- RK: Guidance for reviewers - make checklist enable _eligibility_ for certain level, then reviewers can approve that level or lower
- SH: How do we ensure uniformity? Automation is hard but need explict and specific badges to ensure it. Need very clear guidance, at least for base level, to make sure people know how to reach them.
- PM: Need to be clear that these are only for software quality, not methodological correctness/usefulness
- BB: a couple of thoughts: (1) a **verbal** rubric for bronze/silver/gold, indicating in some detail (but *qualitatively* what's expected at each level). Bronze/silver/gold is culturally very clear, but it would be nice to have slightly more granularity (5-point Likert scale with appropriate labels?)
- SH: Very quantitative for minimum bar, more subjectivity higher up.
- NR: Some of the gold/novel type practices are not very subjective, but may be onerous or are currently rare.
- RK:
Current system includes a couple of options for submissions to be considered
part of a review phase for subsequent submission to journals (notably Journal
of Open Source Software and Methods in Ecology and Evolution).
- *Should we endeavour to have our review process recognised as an initial part
of review for subsequent journal submissions?*
- *If so, which journals should we consider contacting?*
- *The Journal of Statistical Software*? (via Rebecca?) *R Journal*?
**2. What does "acceptance" look like?**
Note #1: rOpenSci's current system enacts primary judgement of whether a
submission is in scope. If so, submission is invited, following which reviewers
and editors generally work with developers to get a package accepted. Actual
"rejection" is rare. The model for the statistical software system will be
similar in that rejection should be rare.
Note #2: Relative to the current system, the statistical standards are much more "leading edge,"
in that most current packages are unlikely to meet them. Current RO standards are more "lagging"
in that they are more adoptions of already-evolved best practices.
Should acceptance be indicated by:
- *A simple badge as with current system?*
- *Step-wise badging (bronze/passing, silver, gold, as in
[coreinfrastructure](https://bestpractices.coreinfrastructure.org/en/criteria))?*
- *Graduated badging similar to code coverage?*
Possible templates are the [Criteria
Statistics](https://bestpractices.coreinfrastructure.org/en/criteria_stats) for
Core Infrastructure badges.
Further important question:
- *How should "not applicable" standards be considered?*
Current standards suggest the following:
1. Checking boxes of all standards which are applicable and which are met
2. Checking boxes of all standards which it is okay to consider not applicable
(and appending those with **N/A** to aid machine parsing of reviews).
3. Leaving unchecked boxes of all standards not met, as well as of all
standards currently unable to be applied, yet which should ideally be
applied.
**3. What happens after acceptance?**
Note #1: rOpenSci currently offers/requires that developers transfer accepted
repositories over the `ropensci` domain, which also also turns on some automated
and non-automated maintenance, checking, and documentation processes. The model
for the statistical software system need not follow that pattern.
Note #2: We will likely require developers to submit an expression of envisioned
lifecycle or future development plans.
- *How might a lifecycle plan best be incorporated within both the review
process, and an R package structure?*
- *How might we ask reviewers to consider these statements? How, for example,
should a reviewer judge a statement that a package represents the completion
of a unit of work, and so no further development is anticipated?*
## Agenda: 2020 Sept 01
- Walk through applications of standards to
[`lme4`](https://hackmd.io/VZ-wgQtZRV2pb-wFZNDM5g), with reference both to
[General Standards](https://hackmd.io/gVjTHFupS-GCy4qdjfn8gg) and
[Regression-specific Standards](https://hackmd.io/ipvRsLU-ShSNi7n2skS-6A)
- Discuss thoughts and opinions on general approach to standards thus far
- In particular, on the relative paucity of algorithm-specific standards
within each category (for example, compare
[Regression Standards](https://hackmd.io/ipvRsLU-ShSNi7n2skS-6A) with
[Unsupervised Learning Standards](https://hackmd.io/KHzx4Sq-SnOaEQ8N9-7qvA).
- Brief concluding discussion about soft "launch" in order to invite initial
submissions/enquiries from developers of software in the first 5 categories.
- FYI: Invitations sent to potential new board members:
- [Sarah Romanes](https://sarahromanes.github.io/) - U Sydney; statistical machine learning
- [Paula Moraga](http://www.paulamoraga.com/) - King Abdullah Uni Saudi Arabia; geostatistics, epidemiology
- [Leonardo Collado-Torres](http://lcolladotor.github.io/) - Lieber Inst for Brain Development; genetics, bioconductor stuff
### Notes from meeting 2020 Sept 01
ben: need to consider implications of backward compatibility for standards
ben: leave distinction between warning and error more flexible
G2.13 stephanie: "appropriately handle" seems strange - "adequately handle"
G4.4b Parameter recovery tests should use multiple seeds
- stephanie: important to acknowledge time trade-off, and possibility of relegating to extended rather than regular tests
RE1.2 ben a bit concerned about that requirement, but maybe because it's not
currently sufficiently clear that it pertains to documentation only. TODO:
Clarify
RE2.4 co-linearity: Max
- very hard to do, maybe not always possible
- if `qr()` function from Matrix package gets a number < #columns?
- provide some boiler plate to get going
4.1 - standards for algorithmic control? Would be a good thing to have in general
RE4.14-15 Forecasing. Comments by Ben:
- Very hard except by parametric bootstrapping
- Such things are not part of `lme4` because we've never been able to do it
- General half-matrix method applicable to most linear models
- prediction for new subjects ought also be part of it
RE6.3 Visualisation of forecast values. Comments by Ben:
- `predict` merely takes new data, and not necessarily extrapolation
- sounds very much just TS specific
- Max: How about just "feasible", or "when possible"
### TODO List:
- [x] Clarify that RE1.2 pertains only to documentation
- [x] Change or clarify "appropriately handle" in 2.13
- [x] Indicate that G4.4b may require relegation to extended tests
- [ ] Some boiler-plate examples for RE2.4
- [ ] Sections 4.1: Standards for algorithmic control?
- [x] RE4.14-15 Clarify that this may not be possible
- [x] RE4.14-15 Add new standard for prediction using new subjects/groups (where applicable)
- [x] RE6.3 Clarify that that only applies "where feasible" or "where possible"
## Agenda: 2020 July 14
1. Discuss high-level conceptual approach to Standards thus far, particularly
the comparably well developed initial versions for [`Time
Series`](https://hackmd.io/uu8AJDGnStmaNTfFd0SZ-g) and
[`Bayesian`](https://hackmd.io/38W9pcE3TWGawcAcBbFlNg) software.
2. Discuss the relative paucity of detail in the core *algorithmic* sections of
these categories, noting the following:
- Many standards beyond these core algorithmic sections might be ultimately
merged into more general, higher-level standards, and so not end up being
category specific at all.
- We have aimed with these first cuts to be as general as possible, and to
avoid conditional clauses as far as possible ("*If your software within
this category is of this sub-type, then ...*). Such conditional clauses
will ultimately be necessary, but how much might be too much? We'd like
to briefly discuss approaches to the development of category-specific
standards as an exercise in identifying and specifying sub-categories.
3. Current initial standards for the EDA category are notably different from
those for other categories, and are likely to remain so. These standards are
more *qualitative*, and suggest that developers should identify things like
target audiences and key questions. Many items in the standards for other
categories are intentionally more *quantitative*, partly reflecting our
attempts to develop standards able to be assessed in a (semi-)automatic way.
We'd like to discuss the issue of the potentially greater burden placed on
both developers and reviewers by these kinds of qualitative standards,
including such questions as:
- How much is too much?
- What are the relative advantages and disadvantages of qualitative versus
quantitative standards?
4. Remaining Categories are:
- Dimensionality Reduction, Clustering, Unsupervised Learning
- Machine Learning
- Probability Distributions
- Wrapper Packages
- Networks
- Workflow Support
- Spatial Analyses
The standards thus far likely provide good templates for most of those
remaining categories, although perhaps less so for the *Machine Learning*
category. We'd like to briefly discuss ideas for how we might address core
*algorithmic* standards in that category.
5. General logistical issues for brief discussion:
- Workflow from here: more regular Board meetings
- General timeline - what stage should standards be at in order to start submissions?
- Martin has decided he won't be able to participate in the future. Who should we invite in his place?
## Notes: 2020 July 14
- Noam mentioned Alex Hayes's four "Types of Tests" (correctness, parameter recovery, convergence, identification).
- Max: Worth putting in somewhere regardless, to at least get people thinking about them
- Rebecca: They are quite generic, so maybe not directly useful?
- Agreement that board members will nominate a package to help step through standards
- Next Categories?
- Clustering
- Probability Distributions
- Tasks:
1. Nominate package to be assessed
2. Nominate potential new board members
3. Actively engage with standards in current form
- Next meeting: Single task of walking through assessment of one package