owned this note
owned this note
Published
Linked with GitHub
# Discussion notes about possible RSE book topics
# Green computing, ethics, ~~law, and AI~~
Suggestion: exclude "law" from "ethical computing" because "law" is too large of a topic itself. AI is also just considered in ethical terms and if the result justify the resource consumption.
- while we can follow/implement a particular research idea, the question is, should we be doing it?
- energy / power consumption
- how much water is used to cool down an HPC center
- ethical concerns
- how much does research progress with this idea
- should we require more justification for resource consumption and implementation choices
- who is going to decide about access to compute resources?
- it seems to work at KCL and HUB that every member gets a basic amount of resources for free (e.g. storage, compute, CPU/GPU cores, VMs/Containers, ...)
## culture problem
- it seems to be a common problem that after some bureaucracy (ticking checkboxes), researchers tend to abuse resources once they have access to it
- ? do we have references for that?
- low hanging fruit for efficiency: people don't use HPC resources effectively, or don't do checkpointing, small tests, etc
- set lower limits by default?
- compare ethics applications: a low threshold, above which you need to justify your research properly
- who sets the limits? how do you choose the thresholds?
- talking to the HPC group of a CS department, it became clear that all professors book resources but do not free them up if not needed, so the machines sometimes run idle (anecdotal evidence)
## transparency of consumption
- would dashboards help how much energy you've consumed by training a LLM
- > While a mundane search query finds existing data from the Internet, she says, applications like AI Overviews must create entirely new information; Luccioni’s team has estimated it costs about 30 times as much energy to generate text versus simply extracting it from a source.
https://www.scientificamerican.com/article/what-do-googles-ai-answers-cost-the-environment/
- > When comparing the average electricity demand of a typical Google search (0.3 Wh of electricity) to OpenAI’s ChatGPT (2.9 Wh per request),
in “International Energy Agency Report 2024”
https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-bf08-952fdd62118a/Electricity2024-Analysisandforecastto2026.pdf
- *A bottle of water per email: the hidden environmental costs of using AI chatbots* [link to WaPo article](https://www.washingtonpost.com/technology/2024/09/18/energy-ai-use-electricity-water-data-centers/)
- would researchers change their behavior if they saw how many resources they utilized?
## funder's interest
- funders are starting to care about energy consumption
- would it helpful to have certificates of only using *green/sustainable* energy
- Sustainable research computing certfication: https://www.software.ac.uk/GreenDiSC
- KCL data center is using certified renewable energy
- York Uni has [moved their data centre](https://energyadvicehub.org/york-university-reduce-carbon-footprint-supercomputer/) to a location with greener energy and less need for cooling
## societal impact
- data centres can impact the local population/environment: water use, electrical network, etc
- UK has a new top-down policy with regional energy pricing -- this is an attempt to move industry to the north of UK and that's how data centers may be moved up north
- one HUB data center has already exhausted its powerline and would need to get new powerlines into the building in order to increase its HPC offers and other services
- public cloud infrastructure allows you to choose where to run certain jobs
- so it would be possible to choose locations based on solar power
- An interesting podcast on the topic of data centre locations: https://techwontsave.us/episode/243_data_vampires_opposing_data_centers_episode_2
## law
- what is the current interpretation of copyright law in terms of genAI
- if you train a model do you have any rights on its output?
- German law allocates no rights on prompted output
- if LLM output can be reused but has no academic value assigned in terms of licenses or rights, is it ethical to do that stuff?
- [James] please can you ask how Open Source works in Germany with distinction between the two different kinds of authorship rights?
# Career development for RSE roles
distinction between inhouse and funding programms
How it works in UK:
- Academic vs researcher vs professional services contract
- Job titles communicate expectations, but aren't particularly significant - easy to change
- Main problem seems to be that we're often in "professional service" roles without progression mechanisms
How it works in Germany:
- Position on salary scale determines "researcher" vs "technician"
- Contract doesn't say much about what you actually do
- You have a short document which describes your work area
- Job titles carry a lot of weight - need justification to change
- Often have to define a specific role to justify it
- Main problem is lack of standardisation - no ability to benchmark
- "Beamte" - equivalent to "tenure" in the US
- Gives legal protection / responsibility to professors
- No willingness by universities to have this discussion about roles as it would mean increasing salaries
Resources
- UKRI roles in funding applications: https://www.ukri.org/publications/roles-in-funding-applications/roles-in-funding-applications-eligibility-responsibilities-and-costings-guidance/
- UK RSE role profiles: https://github.com/RSE-leaders/evidence-bank
- UK standard academic role profiles - nationally agreed by the unions representing university staff
- https://www.ucu.org.uk/media/3540/UCU-model-academic-related-job-family-role-profiles-Oct-09/pdf/ucu_arprofiles_oct09.pdf
- Comes from negotiation between UCEA (represents university senior management) and UCU (represents primarily academic and research staff)
* UK has role understanding
* HUB has not, long processes of discussion/implemention of role
* usually inofficial denomination
What would we like a pathway to achieve?
- Support people to come from different backgrounds - academic and industry
- Transitions between roles
- Promotion within a family of roles
What software cultures exist?
- We have a reasonably uniform culture within UK academia - you can expect a university to work in certain ways
- Culture in industry very variable
Roles within projects:
- In industry we'd probably want to talk to "product owners" - they're responsible for collaboration
- KDL uses Agile DSDM
- Describes roles and responsibilities within project
- King's Digital uses Scrum - RSE group in Manchester does too
- Adaptations based on size and engagement of project
- RSE as researcher vs RSE as service provider
- Would this affect promotion pathways? Would need to be general enough to work for both
# Developing/running training courses for researchers
## Stefan got ideas:
- everyone should be proficient in git
- using modern features of programming languages
- from the beginning teaching algorithmic thinking
- Continuous Integration (Continuous Deployment, less important - but very important for reusability)
## back to general overview:
- King's does regular workshops, open for interested researchers
- introductory courses mostly built upon Software Carpentries materials
- also develops additional intermediate courses (e.g. Python profiling & optimization)
- by the central RSE team
- "born out of pain" - the need for additional custom materials became apparent
- no mandatory training for HPC users
- hard to keep balance between annoying people and things people need to know
- already established groups/communities, born out of necessity
- also some online+async training materials in existence
introductionary materials
- maybe change/improve them?
- what are the actual goals of those materials?
- taking the fear from people
- basic concept
- dealing with errors & error messages
- teaching computational thinking
Digital History at HU
- intro courses: Python, Data Literacy
- no pre-requisites, diverse backgrounds: no common level of background knowledge
- a lot of pair programming
- a lot of self-organization
- it very much depends on students being able to work with/train other students
- **idea**: telling people upfront about the expectation re pair programming & providing train-the-trainers materials
Other departments at HU also offer training
- but not clear what topics/how
- IZ is working to improve communication
General issues:
- departments/institutes/etc. do not know what others are doing
- people giving courses are not necessarly trained in teaching people
Prerequisites?
- tension between scaring people away, not having enough people joining because of that & people needing to know how to use tools
- idea: making this tool/infrastructure-dependent; courses are without prerequisites, but the usage of HPC et al. has some
- those would be half-a-day or 2 half-a-day workshops
- courses using programming / data processing need some training built in
- need flexibility to add additional training where required based on the participants
- needs the lecturers to be willing and able to provide this training!
- need to catalogue the tools and concepts that are required for the course
- then you can do a survey in the first week & edit the course plan accordingly
# AI and software