Hackathon: Optimising Research Dissemination and Curation
Go back to Hackathon main page.
This is the summary page for our hackathon! Ideas and actions will be added and updated here throughout the week.
Interim update 10 November.
At the end of our second meeting (see notes below), we have progressed our goal of guidelines for training students to review pre-prints to put together a preliminary rubric that captures the different ideas from our discussions (captured in our whiteboard) and including elements of the Outbreak Science rubric. We will continue to comment on this shared document over the week.
Our goal: Transform this into a working prototype that can be piloted in the upcoming academic term.
Actions for the week:
- Team to comment on the different assessment criteria, especially those that we were not able to discuss in meeting time.
- Add items if they have been missed out.
- Add thoughts about the grouping of criteria. (They are sorted into related topics that could translate into an overall quick-label at a later stage.)
- Post overall thoughts about the rubric.
- Dawn will also start an OSF project as a place to develop this rubric and further study protocols arising from the hackathon.
Discussion notes
What do we need to rate in a paper–-what are important criteria?
- We could learn from the Octopus system: 3 ratings per paper section. Should different qualities apply to different sections?
- Another way to approach this: what should you target in a review if you only have a short period of time? What is the priority for review?
- What is the claim?
- How does the claim hold based on the data?
- What is the outcome of interest?
- Reliability is important, maybe we should not worry about the novelty, or whether literature review is complete. Especially if students lack experience to know what is left out of the literature review.
- Impact and advancement of literature is still a point of interest. It adds leverage to what we do, if we can show we are able to measure it, and means our work is more beneficial.
- As a research question, we might want to ask: can training processes (e.g., walking people through rubrics and metrics) help to improve reviewing quality?
- This is related to PREreview's work. They would likely be interested in how to demonstrate the effectiveness of training peer review.
- What to go in a structured form/developing review rubric?
- Trade-off between high-level peer review questions, and being able to elicit specific things about claims (e.g., how generalisable, is sample size large enough?)
- Rubrics and checklists work best when the topic is narrow. E.g., clinical trials, someone can review specific things even without experience.
- A rubric for all of research may be difficult because the subject area is too wide.
- With checklists for systematic reviews, clinical trials, the process began with small groups of researchers, who looked at how things were done, and had a protocol for putting it together.
- In PREreview the ratings were a top-down decision.
- Should we then develop different rubrics for each different kind of research? Different types of studies?
- A possibility is to develop an experimental line of meta-research to assess different aspects that are of interest to different fields (e.g., surveying researchers, rate, use cluster analysis to figure the main factors of interest people are interested in.)
- Pragmatically speaking, we need a flexible system that could change according to each field's needs.
- What about decision trees (start with certain types of research, then route to things you need to consider there).
- The problem with decision trees is that it means there is not the possibility of dividing into dimensions and allowing people to specialise and work on particular categories.
- Splitting up the review process could be good in terms of division of labour, but it is not always possible to make a comment on one section without looking at the others (e.g., results not comprehensible without methods).
- Merits of splitting up review processes: reviews of a whole paper takes very long, but there is this culture that no input or comment is valuable unless you judge a paper as a whole. If we can change that, realistically speaking it might be better for getting large numbers of people to weigh in on different sections of a paper.
- Returning to the process of training students:
- Students' perspective: we need a guided process, with direct foci and clarity for what each of the components of the guidelines mean.
- A recognition scheme for students trained in pre-print reviewing (e.g., certification) could help encourage students to engage more.
- If one has a set of dimensions involved in reviewing certain parts of a paper, we could train reviewers by asking them to answer questions about 10 pre-prints.
- Should we collect data on expert judgements to train new reviewers?
- In statistics for example, there has been scholarship on coding papers for statistical mistakes.
- A training programme should highlight things people may not have previously focused on, but they learn to do as part of the programme. We should be able to quantify the effects of the training.
- A way to measure: self-reflection pieces/surveys at the beginning of training, when students first do a review, and the follow-up survey and review at the end to compare.
- It may be worthwhile to clarify what kind of teaching we are aiming at. Is this a course for postgrads? Could it be part of an already existing course, where reviews of paper related to the course are done? We could create assessments smaller in scope but which provides the initial training for students.
- Undergrads could classify studies and get some basic review training from the process.
- Could some of this information just be source-able from abstracts?
- From a pragmatic perspective: we need a quick and dirty list of descriptors to describe research, followed by evaluative categories (either to rate or provide feedback on).
- We should try these things first and see if they work, before trying a complicated study design to test the educational value. Start with modest goals, evaluate the enterprise, and develop it from there.
- This could be a road map for where and when we will do different steps and set different goals.
- We should also make sure we are open to change our goals in the road map.
- What if we focus on specific sections of the paper (e.g., intro and discussion) that might be easier to have general rubrics on these, instead of the method and results, which are more difficult and different across fields.
- The introduction can be hard to evaluate if one isn't an expert. How would UG students know if a literature review is accurate, and if it covers everything?
- There is also a question of how we will popularise the initiative so people participate. Who will implement it with students? Which instructors will incorporate in the modules?
- Can we approach methods instructors at Universities? Start small when perfecting the rubric.
- Maybe one or two people can try it to start, then reflect on how the trial has gone?
- There are lots of initiatives now about improving online teaching–-we could add to that.
- This could be adapted to content modules as well, it does not need to be a special research methods module.
- There is a network here: CREPS who might be interested to collaborate.
- Potential issues in trialing the assessment rubric:
- We need to consider the cross-disciplinary element. Students may be facing studies they are not normally used to, we shouldn't be overly prescriptive.
- Outbreak science's rubric has some good questions that are not too wide in scope.
- Impact is difficult to judge if students lack experience and don't have knowledge of the most up-to-date stuff. How do we build their awareness here?
- Assuming we can get this running in a University, in an academic context students would do it because they have to. But we could also try community-based engagement focused around creating values (e.g., open science groups). It's like creating a culture: we are training a new generation of students to get into the right mindset.
- It is a good start to put this into a module. The incentive structures are not there for academics to take this on (no time, little incentives to review). If it is in a module as a starter, it's bait to get people doing stuff.
- Discussion on moving the rubric forward:
- Have a general set of guidelines that can be modified by individual instructors.
- We could have the same rubric for UGs vs. PGs, but be more stringent and expect more development/depth with PG students.
- Let's establish a rough framework that can be altered.
- Another stakeholder in this could be researchers who do power analyses, meta-analyses. If we are coding all this general information about pre-prints, we then have meta-data that at the meta-research level, allows people to have a better research study from this information.
- For example: if students are assigned a random sample of work within a topic, and they classify all their sample sizes, we get a sense of what general sample sizes look like for new research in that topic.
- How would students be able to comment on the pre-prints?
- Hypothes.is could be a useful tool (SciBeh is using this in the knowledge base).
- Verbal comments should be pre-moderated by instructor (e.g., handed in offline, receive feedback). Only when students get the go ahead, they could put it online.
- But what is the incentive to make the review/annotation public if students have already done it and got their feedback? – maybe annotations that are private at first and then public after moderation? (Dawn to look into this.)
- The Review Framework:
- Categories should be amended as appropriate.
- All categories must have an OTHER response for free text input.
- Stress to students that they don't have to fill in everything.
- It could be an iterative process, where students attempt it, followed by training and explanations of how it works, and then they follow up with more assessment in the criteria.
- People should not feel compelled to address all the criteria at once.
- Vera is getting permission to include such a rubric in her assessment struture for next term.
- We should make an OSF/GitHub to turn this into a living document/resource that can evolve over time.
Interim update 9 November.
Following discussions in our first meeting (see notes below), we decided to address guidelines for training students to review pre-prints and what aspects of a research output's quality needs to be evaluated and communicated.
We started by brainstorming what elements should go into a quick label that communicates the essence of a research paper and its evaluation. We identified three areas of high importance:
- Transparency–-this goes beyond open science, it means noting what is not there as well as what is there
- Appropriateness to answer research question–-e.g., methods may be sophisticated but wrong for the question; or methods may be simple but right for the question
- Clarity of communication of ideas: for example, can people follow? What terminology is used, being aware that similar may be used in different areas and subdisciplines–-are there clear definitions?
There are more areas we can pick out from our brainstorming (e.g., impact, application potential) that we did not get the time to dig further into.
Next we need to develop these into 'rubrics' that would be helpful in teaching a student how to do a review that addresses these matters. The other elements brought up in the brainstorming could be categorised as relevant to our main areas of importance.
Jay has also shared an example of what a quick label could look like.
Proposed actions for 10 Nov
- Translate these identified areas into an assessment rubric for training reviewers.
- Identify platforms we can collaborate with to trial student assessments of pre-prints.
- Work out a plan and timeline to put a trial of this idea (students reviewing pre-prints using our developed rubric) into practice. (And we should probably pre-register that!)
Discussion notes
Problems with the review & publishing process:
-
Academia has a warped incentive structure, where review is not properly built into the academic workload
-
Academics do not consistently receive proper training for peer review. We should have this from the time we are students—it should be teaching curriculum. This way, we would also be training up a new generation in these skills and practices.
-
Do we have a set of guidelines that anyone interested in can refer to for what constitutes peer review? (followed up later in these notes)
-
What about a theory-based way to think about quality assurance? Are there clever design principles we could look to, from industry, psychology, and biology?
-
A 'nutrition-label' for research papers could be a good way to communicate the quality of research, or where it lies on certain aspects/metrics–-but this should be understandable.
-
: BMJ groups evidence into 3 levels. NASA has a 9-step technology readiness scale. Chris Chambers proposed a 5-level quality pyramid. ScienceOpen uses 5-point ratings, ScienceMatters uses 1-10 ratings, PREreview uses yes/no/unsure ratings.
-
Could one annotate a pre-print in order for AI to come up with labels, and how?
-
How do reviewers make decisions? Theory/research in fast and frugal heuristics bring up the use of checklists to boost people’s decision-making processes and enable quality control (e.g., in medical fields)
-
We need to distinguish between whether we are talking about (1) researchers making decisions on/judging other researchers’ work, and (2) journalists/public making decisions from/judging research
-
Research on grant reviewers’ decision-making processes show we don’t actually know a lot about how reviewers make decisions: https://www.torrproject.org/
-
How do we remove reviewer bias? Rubin’s theory of evaluation suggested, with 4 different ways of evaluating contributions
-
For research communications to the public, levels of quality evaluation need to be understandable for people.
-
Pre-print servers don't have resources to do quality checks of the work submitted beyond a simple topic/copyright proof check (and even then, they are have a huge backlog).
-
We should try and tap into existing structures and activities that are already going on, e.g., journal clubs. They are doing evaluation, can we incentivise them to take notes and post them?
-
How to consolidate links of discussions when they do happen?
-
If people are discussing and commenting on research, why are there so few comments on the pre-print servers themselves? Why are they elsewhere?
-
It doesn't actually take more time to review research than to produce it, yet we are producing more than we have time to review: suggests an incentive/value problem on what we are willing to/rewarded for spending time doing. Changing this is difficult given incentives from the institutional side.
-
If we are developing guidelines for how to review, could we get students involved? This would mean training the next generation as well as widening the pool of potential reviewers. Maybe this could bridge the gap between resources being available for pre-print reviews, to people actually putting it into practice.
-
We can start with existing resources for reviewing (from PREreview, journal clubs, PLoS, journals)
-
Goals articulated:
-
Let's formulate as a teaching aid/rubric to get students to write comments on pre-print and test it in the new term.
-
We could use the rubric to develop important metrics that should populate a quick label
-
BMJ has a guide for journalists with a basic taxonomy to use in press releases (e.g., study type, classifications). Could we have a snapshot overview of a paper: e.g., classification of methods, DV, sample size? (These might tell people what are the associated limitations of the study in and of themselves.)
-
These and other aspects are used by scientists in replication markets (i.e., how likely other scientists think the work is to replicate).
-
We willl each come up with what a quick label criteria should entail, we can then reverse-engineer this to develop our rubric for training students.
-
On transparency: it is a better term than 'open science' because people (public) have a better idea of what it entails. Also, it goes beyond evaluating what is missing, especially with novel methodologies, where you may as a reviewer find it impossible to know what else should be included. If there are logs in the public domain about the research process, that could be followed to ascertain that the researchers are being transparent throughout the process.
-
For metrics with categories (e.g., low, medium, high) we should also allow open categories for what the reviewer (or student) may feel is appropriate as a descriptor.
-
However ensuring things are consistent and machine-readable would be good thinking long-term to when we may want this to be searchable or classified by AI.