# AaltoSciComp/AaltoRSE presentation to Data Agents ## Basics - Science-IT: not SCI, not IT - Who we are: 8 staff (2 RSEs, or "we are all RSEs") - Q: Aren't we 3 RSEs, or do you exclude yourself Richard? A: Yes, we are more than two. But if you include me, then we have to include others, and in effect we all are. Two specifically hired, full-time RSEs. - History - 200x: M-grid (cluster for materials science) - ????-2021: Finnish Grid [and Cloud] Infrastructure: Academy infrastructure - 2021-????: Finnish Computing Competence Infrastructure: emphasize the usage and support. - We are Aalto and national infrastructure ## What is our goal? ~~5 years ago: Provide computing infrastructure~~ or Now: Support computational research ## Aalto SciComp "Aalto Scentific Computing" (ASC), officially known as Science-IT, is the main team. - Organizationally Technical Services in the School of Science. - Where does the funding come from? - Hardware - Academy of Finland (periodic FIRI (Finnish research infrastructure) grants) - SCI - U-level - Staff - Member departments (mainly CS, PHYS, NBE) Who is related to PHYS? OK -> Plausible PHYS data agents(?) - SCI - RSE projects/funders - From which Aalto departments the most customers currently come from? - Users: ![](https://lh4.googleusercontent.com/rEhYcvpnLu_0wBQHCgshCzGznJg9luImPzw4XztNHc-iQYMzPFRAPpDJUxy4hkKIUQLS8iViKckTEZx0qbfBRkTzcsLIOg378_zydCnbTPkfnab3ssYzv-4SIY75JJi9jctTy__v86Y) - Types of users: ![](https://lh6.googleusercontent.com/_1fqECvD8JAzSVxXq6yET1DkezmNBa7qSaRpbkcLfidPxuFUq8mx6k1yy68ATGogkUf2wwxy5EngXem-i59NU-kkwOaTUtKFjjAr5S0EXjw5R7iFUVIi3mAr3rSOILmL8JyO4gKzoPw) - What are the most common customer scenarios or projects where the ASC is currently helping? - See below ## Aalto RSEs A "Research software engineer" is a bridge between *Research* and *Technology* (not just software). Engineering is about applying science to practical applications. In particular, we focus on the *integreity* of the output: does it use best practices, and in the context of open science, is it FAIR ? - Where does the funding come from? - 2020: We started with two initial RSEs as a seed: - SCI (50%) - CS, NBE, PHYS (50%) - 2021: ITS digi pilot grant - ongoing: Individual projects - We have advertised RSEs thourghout Aalto (e.g. BIZ) - Q: is this ok, can e.g. BIZ researchers expect RSE services? A: Yes, definitely OK! Due to the ITS grant, but in general we try to help wherever possible. - How does project funding work - Within member units (SCI, whole Aalto with ITS grant): - 1 month free, then project should fund out of its grants - Other units: - ???, small service for free - Other universities - According to contract research policies (= €€€€€) - What are the most common customer scenarios or projects where the RSEs is currently helping? - See below - Could we have an example of a project done by RSEs for a customer? - See below - From which Aalto departments the most customers currently come from? - CS probably dominates, then other SCI departments for most of the rest. Some across Aalto (includes at least ELEC, ENG, BIZ, CHEM). - What are the cool things you are currently working on? - See below - What are the challenges? - Getting our message out to researchers(, while at the same time...) - ... Running out of time to do existing projects - Cross-university collaboration (organization silos of funding) ## Most important internal projects and services This is a tour of our most important projects. ### Triton - Computer cluster - Each week, we spend about 100 years of CPU time. - "Ship of Theseus": continually upgraded (unlike most clusters that are fully recycled periodically) ### Data storage - On Triton: Lustre - 5 PB - Very fast for big data and computations - Not great for many small files - Department Teamwork storage - Managed by department IT (but in essence by Science-IT team, too) - Smaller but backed up - Data vision: data is the most important part and needs a vision of easy accessibility. ### scicomp-docs - scicomp.aalto.fi - FAIR: open-source, Github manageded, findable and reusable. Often high in web search rankings. - Lots of information on Triton and more. - Some duplicated info compared to aalto.fi: how should they be compared? - License: CC-BY 4.0 for text and CC0 for code snippets ### Daily garage - Zoom, everyday at 13:00 - Anyone may come and ask any question - "Front desk" to many larger problems - We direct other tricky issues here for discussion. - Also internal networking/teaching/socialization time ### Accessible computing - Working on one's own computer isn't enough (or FAIR enough) - Clusters are traditionally hard to use - We are trying hard to make computing *accessible*: - Rethink cluster usability as much as we can. Design to be easy to use. - Interactive tools, such as Jupyter or Open OnDemand - Training and support where needed ### Teaching: Coderefinery, Kickstart, and other teaching - These days, big gap between *complexity of tools* and *education* - Academic vs practical education - Cancelled "Computer as a tool" course - Extensive course lineup for all levels of researchers and students - Courses target to practical tools of researchers, not duplicating academic work. - All course material is FAIR (Github, open source, collaborative) - Teaching processes are also open: https://coderefinery.github.io/manuals (+ community teaching training) ## Other services ??? ## Example RSE projects and ASC support ### ParallelFTDT - Group had software tool, designer has left - New researcher needed to use it but it wouldn't install - We spent several days/weeks trying to get it installed - Realize "hm, this is made internally. The software is the problem" - Create RSE project, fix software, it is still used. ### Polar actiwatch data collection - Researcher wanted to do a major study using wrist-worn Polar smart watches - All data collected in one place - RSEs write data collection platform - And set up data pipeline to process and transfer to Triton in a usable form. ### Software instllation - Most scientific software is written quite badly - Common tools exist, but are often not used well by developers (or users don't know how to use them) ### Genomics data upload - Researcher has issue with data upload to a repository - Requires special program ("Aspera") which doesn't work on Aalto networks - ITS wasn't helpful in un-blocking firewalls because of specifics of this program - Make work-around within CS department - In 2022, finally get a permanent solution in Aalto firewall. ### KVKL data management - ENG research group has some very confidential data that is manually preprocessed - We help to automate this processing ### Custom courses: Julia - MS department group requested a course on Julia - RSEs found existing CSC course, requested it to be open-licensed, then presented the course internally. ### Code efficiency improvements - getting an existing, working code to run more efficiently either by moving to cluster usage, or by improving the code ### Some patterns in RSE projects - Few that are creating large softwares from scratch - Many in using tools and best practices in existing projects - Many about data collection - Optimizing or improving existing code - Releasing and reusability. ### Most common garage support cases - Can't install software - Can't transfer data - Unix problems using our tools ## Relationships - ASC and RSEs? - RSEs are a subteam within ASC: for all practical purposes, the same team - ASC and ITS - ASC isn't U-level. ITS provides many tools we use. We repackage and support others using ITS tools. - ASC and RES/IES - We work closely (most is centralized to Data Agents and related work now) - ASC and other schools - Other schools don't support or staff, but sometimes do our hardware. U-level funds us for all - ASC and department ITs - Overlap ## Open science in ASC - Most tools developed open by default - Always prefer to use open tools - Cross-organization collaboration ## Challenges - Increased complexity of computing makes a larger gap between best practices and what people do. - Support of researchers and science outside of SCI - Even being able to reach them. And should we reach them? - Ensuring that researchers knows our place and value. ## Future - Where do **you** think we should focus? - Reaching out to new departments and groups outside SCI, I believe you're helpful there - Stay flexible, stay available - - How can we (both Data Agents and RSE) manage to become the "go-to" point for any questions concerning our field? Essentially: How do we get more visibility for our services? - get local contact points, people in deparments and research groups that support their own groups and use you as the expert resource, transferring difficult issues to RSEs - It would be great fi essentially any new researcher gets a "If you have difficult questions with regard to this/that go there".