owned this note
owned this note
Published
Linked with GitHub
# CW22 Hack Day: CFF in the wild
- [Original pitch document (Raven)](https://docs.google.com/document/d/1F5MX1sWXV5HbAr3VUpoqggoHm9TatUErTzx-eUE_cOw/edit#heading=h.njbyweqpatro)
- [Team Slack channel `#cff-in-the-wild`](https://collabw22.slack.com/archives/C03ACBUQ06P)
- GitHub-Repo: <https://github.com/sdruskat/cff-in-the-wild>
## Team members
- Stephan Druskat, GH: @sdruskat, stephan.druskat@dlr.de
- File cleaning and mapping
- Jez Cope: GH: @jezcope, jez.cope@bl.uk, ORCID: 0000-0003-3629-1383
- ...
- Sam Harrison, GH: @samharrison7, Orcid: 0000-0001-8491-4720
- CFF Parsing
- Mark Basham, GH: @markbasham email: mark.basham@rfi.ac.uk ORCID: 0000-0002-8438-1415
- QA, integration, code review, presentation
- Hugo Gruson, GH: @Bisaloo, hugo.gruson@normalesup.org, 0000-0002-4094-1476
- Repo Analysis
- Amal Alghamdi, GH: @amal-ghamdi amal.m.alghamdi@gmail.com, https://orcid.org/0000-0003-0145-5296
- CFF Parsing, Data Analysis
- Saranjeet Kaur Bhogal, GH: @SaranjeetKaur (Unforunately, I won't be joining this one as it is clashing with my pitch for the Hack Day. Good luck!!) Thank you :) THansk!
- ...
## Planning
0. [ ] Talk about [project setup](https://hackmd.io/jsPBZ6ynQF6IeUlhWvd9AA?both#Project-setup)
1. [ ] Create issues
2. [ ] Collating results from the search query
3. [ ] Recovering original repos from files (SD)
1. Write a CSV?
4. Clean dataset (SD)
1. [ ] Detect non-CFF files
5. Analysis
1. Research questions
1. CFF analysis
- [ ] Are all files called CITATION.cff
- [ ] What is the ratio of valid:invalid CFF files
- [ ] How many files have been created using [cffinit](https://bit.ly/cffinit) (judging by comment in file)
- Adherence to the [software citation principles](https://peerj.com/articles/cs-86/): What are the ratios for:
- [ ] providing `version`
- [ ] providing `repository-code` only
- [ ] providing `doi` or `identifiers/doi` and others
- [ ] providing `preferred-citation`
- [ ] Usage of `type: dataset` vs. (`type: software` || `None`)
- [ ] Usage of references (avg. number of references, reference type distribution)
- [ ] Usage of non-standard fields. Such fields might reveal a gap in the information stored by the default schema and might inform future development for the CFF schema
8. Repository analysis
- [ ] What programming languages use CFF files?
- [ ] Are there other metadata files present in the repo (`codemeta.json`, `CITATION`, `.zenodo.json`, `*.bib`)?
6. PROPOSAL: Repo Checker
1. A simple script which can check a repository for compliance with the above CFF rules
2. Also look to add other SSI questionaire parameters
3. README
4. LICENCE
7. Write a speed blog about the analysis done above so that it can be reproduced at a later stage too
8. Submit PRs to fix broken CFF files?
## Project setup
- Workflow?
- Single tasks, single solutions
- Knit them together w/ a Makefile or similiar for repro
- `main`protected, work through PRs with code review
- Interoperability?
- CSV
- Licenses?
- Apache v2
- CC-BY / CC0
- Testing?
- Feel free to :)
- Anonymisation?
- Only publish links to files
- Split up searching and downloading and provide code to reuse
- Presentation?
- Ignite talk? (MB) :tada:
## Links
- cffconvert (Python): https://github.com/citation-file-format/cff-converter-python
- schema: https://github.com/citation-file-format/citation-file-format
- guide: https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md
- Tools: https://github.com/citation-file-format/citation-file-format#tools-to-work-with-citationcff-files-wrench
# KEY takeaways
Full sample size: `3307`. Only 20% of those (647) were completely valid CFF files. About a third of these (240) where created with CFFinit.
Initial Software Citation Principles compliance stats:
```
Sample size: 2793
Has version: 2021 (72.4%)
Has code repo:785 (28.1%)
Has DOI: 1187 (42.5%)
Has citation: 469 (16.8%)
Compliant: 46 (1.6%)
```
## Presentation
1. Novelty, creativity, coolness and/or usefulness
Can you clearly define the problem that is being solved and how are you trying to solve it?
Are you doing something new, better, slick or really useful to yourself or others?
Is your solution purely self-serving, or is it enabling in some other way. You need to provide reasons as to how your Hack Day project benefits a wider community of potential users/developers to get the best marks during assessment.
The advice here is indicative; other justifications in this space are welcome (within the constraints of presenting).
2. Implementation and infrastructure
Are you following research software best practice for the use of infrastructure? Is a source code repository being used? Is there documentation? Are appropriate services and infrastructure being used (e.g. cloud computing, databases)?
If you are building on existing work, it’s essential that you are clear about what was done during the Hack Day in terms of adding features and functionality etc. (If this is not clear you will lose marks).
Does your solution work for the stated purpose - can this be shown during the demo?
If your team is developing a standard, are you using collaborative techniques and tools to allow contribution from the whole team?
For paper hackathons involving presentation of data or analysis, are you using reproducible frameworks for the paper authoring?
For other research software related hacks, is it clear you are using best practice in the construction of the work?
3. Demo and presentation
Did the presentation and demo show how your hack has fulfilled the judging criteria?
Did your team communicate the essence of why they did what they did and why it was important?
If your team were demonstrating results (e.g. from an analysis), were they appropriate for the data chosen?
4. Project transparency
Was your source code available on an open repository at presentation time? Teams may choose to work open or work closed. If you happen to decide that you want a publication from this work then you may choose to be open about your methods but not your data, for example. However, building and being able to build on each other's work during the Hack Day will be viewed favourably.
Ideally your repository should contain a README covering configuration, make and run instructions included with the code. In addition there should be a brief description of the project and what the software/scripts do, along with a license.
These criteria may not be directly relevant for certain categories of entry; in this case other aspects of transparency and openness will be used as decided upon by the judging panel.
5. Future potential
Was it clear how your work could be taken forward in the future, could it modify existing work, or be part of a new paper, initiative or bid?
Were ideas of future steps provided?
Was it mere fun or did the idea show usefulness in the long term?
6. Team work
Was your team led well, were they able to involve all interested team members?
Were non-technical members directed towards meaningful contributions; e.g. documentation, testing, usability and logo design in the case of more software-related hacks?
Did your team’s software practices support synchronised working and decrease duplication? Did your team achieve more together than would have been possible separately?
Was your team atmosphere healthy: disagreements are fine, but were they conducted agreeably?
Did it appear enjoyable and/or fun to be part of your team?
Point 1 -> How we manage project
Point 2 -> Tools ->
Point 3 -> Results ->
- S01: cff-in-the-wild
- S02: Team cff-in-the-wild
- S03: Aims
- S04:
- S05:
- S06: Point 1
- S07:
- S08:
- S09:
- S10: Point 2
- S11:
- S12:
- S13:
- S14: Point 3
- S15:
- S16:
- S17:
- S18: Conculsions
- S19:
- S20: Thanks
-