1\. Title: Responsible use of generative AI in assisted coding

# Responsible use of generative AI in assisted coding Lecture development // planning document **Agenda and notes** Meeting 1 8/12/2025 - What is this working group about? (Enrico) - Lesson on "hands-on AI" for CodeRefinery workshop -> connected to other topics we teach in the workshop - Intro round - what would you like to do here and what can you do timewise in next few months? - Enrico, Aalto University, AI + research integrity/ethics is my thing - Bjorn Lindi,NTNU, AI + hpc projects - Richard Darst, Aalto Univ, - Thomas Pfau, Aalto RSE, generative AI API expert - Pedro Silva, Aalto Univ, AI in research (co-teaching with Enrico at Aalto) - Johan Hellsvik, KTH, ML in synth data in material science - Bahareh Tasdighi, Århus University, reinforcement learning - Ina Pohner, UEF, AI in research, risks of generated code on HPCs - Ebba Þora Hvannberg, Univ of Iceland, interests in IDEs + gen AI - Ashwin Mohanan, Mimer AI f, critical user of genAI coding (MCP, more predictable) - Yonglei Wang, ENCCS, Mimer ai factory - Hemanadhan Myneni, Univ of Iceland, - Dhanya Pushpadas, Sigma2 - Ideal deadline for lesson ready: March workshop - What already exists and what is missing (Enrico) - Planning - start can happen asynchronously over december - From plans to action: mid january: we meet again - Actual work (jan/feb) - Open questions: - Do we write materials for a longer workshop or do we keep it short and sweet? - Do we keep it super practical only? Notes - Enrico described the 3 scenarios 1. manual, 2. IDE + etension; 3. Full agentic workflow - The scenario "in the middle" (IDE + AI tool) can get more useful things out of it and benefits of having the context available. Another +1 for the IDE+AItool. Understanding what goes under the hood (what is going to be uploaded to some remote cloud system)? - We can start small with the practical goal of the march coderefinery workshop and focus on the IDE + AI tool - Teaching more ethical tools vs the most popular in the industry (microsoft/google etc) - Local LLM with IDE can also be an option - E.g. Codex + Vscode allow a quite good integration, where you can cheange the AI backends to e.g. APIs provided by your university and only use the "prompts"/agent instructions provided in the framework. - Agentic workflows although more "risky" can add big value (when is it good to automate more, e.g. CI/CD) - If we use the "fix/update ci" suggestion we need to make sure that we also teach about where the risks in these things are. +1 - Local LLM and risks are good to be mentioned --- # 1. Project description and milestones **Who is involved?** - Enrico Glerean (project manager, basics \+ ethics \+ security aspects) - Bjørn Lindi, NTNU (potato) - Add your names - .. - … - … **Planning and preparation (all)** ==PART 1, to be done by mid January:== - Task 1: Read and expand the literature (see below) - Task 2: Create a few user stories using Bloom taxonomy (Knowledge, Comprehension, Application, Analysis, Synthesis, Evaluation): “I want to know what other researchers use to code with AI”, “I want to understand what happens when I type a prompt”, “I want to apply this to my project”...) PART 2 - Synthesise user stories into max 5 learning objectives - Define lecture format and structure - Outline lecture materials detailed skeleton **Developing** **(sub groups)** - For each section defined in the skeleton, add learning materials - Revise --- # 2. Background literature and existing materials (annotated list) 1. Enrico+Pedro's slides on the training we give to Aalto researchers: old version [https://zenodo.org/records/14032261](https://zenodo.org/records/14032261) new version being archived soon (we are recording the youtube videos) [https://drive.google.com/file/d/1\_gK7cnsZVqgP7\_8KEdu8zpmq3ZaA1PLQ/view](https://drive.google.com/file/d/1_gK7cnsZVqgP7_8KEdu8zpmq3ZaA1PLQ/view) 1. There are some “simple rules” for AI and coding, but it is very limited on that aspects. 2. [https://carpentries-incubator.github.io/gen-ai-coding/](https://carpentries-incubator.github.io/gen-ai-coding/) 1. Outdated since Codeium is not anymore (but it should work with the new Windsurf). Good background materials on the options for AI assistants, also the ethical part is well thought 3. [https://github.com/dlab-berkeley/AI-Assisted-Coding-In-R](https://github.com/dlab-berkeley/AI-Assisted-Coding-In-R) 1. Berkley, R. Focus only on IDE. No ethics or security or any risks 4. [https://zenodo.org/records/17093197](https://zenodo.org/records/17093197) 1. Too generic and basic. There is some good slide on prompt engineering, but I don’t think it’s too specific to the main lecture (more like an “info box”) 5. San Diego HPC centre 1. [https://www.sdsc.edu/education/on-demand-learning/index.html](https://www.sdsc.edu/education/on-demand-learning/index.html) 2. [https://www.youtube.com/watch?v=USk\_4Ckl05A](https://www.youtube.com/watch?v=USk_4Ckl05A) 3. [https://github.com/nrp-nautilus/fall25training](https://github.com/nrp-nautilus/fall25training) 6. [https://www.cisl.ucar.edu/events/genai-series-how-use-github-copilot-ncar-supercomputers-derechocasper](https://www.cisl.ucar.edu/events/genai-series-how-use-github-copilot-ncar-supercomputers-derechocasper) 1. Oldish course by ncar supercomputer centre, did not review it 7. [https://erashuttle.eu/ai-workshop-notes/](https://erashuttle.eu/ai-workshop-notes/) 1. Notes from a workshop, need to check deeper if there are aspects not covered 8. [https://www.researchgate.net/publication/397584324\_Vibe\_Coding\_For\_AI-Driven\_Development\_An\_Introduction\_for\_Research\_and\_Learning](https://www.researchgate.net/publication/397584324_Vibe_Coding_For_AI-Driven_Development_An_Introduction_for_Research_and_Learning) 1. Have not read, not peer reviewed 9. [https://advait.org/files/sarkar\_2025\_vibe\_coding.pdf](https://advait.org/files/sarkar_2025_vibe_coding.pdf) 1. Have not read, not peer reviewed 10. [https://talmo.uk/2025/slides/evkaya.pdf](https://talmo.uk/2025/slides/evkaya.pdf) 1. Not hands-on but good points in the background that are missing in other slides / courses 11. [https://ideas-productivity.org/events/hpcbp-092-genai-coding](https://ideas-productivity.org/events/hpcbp-092-genai-coding) 1. A talk on the topic of AI for coding scientific software. I did not watch it so will report later. 12. [https://oerc.ox.ac.uk/ai-centre/ai-guides/getting-started-with-ai-for-coding](https://oerc.ox.ac.uk/ai-centre/ai-guides/getting-started-with-ai-for-coding) 1. Generic page but still good collection of links (related [https://oerc.ox.ac.uk/ai-centre/ai-guides/getting-started-with-ai-for-researchers](https://oerc.ox.ac.uk/ai-centre/ai-guides/getting-started-with-ai-for-researchers) ) 13. An article about the missing precision in AI generated text, AI generated text vs students writing (in Norwegian :()): https://www.universitetsavisa.no/forskning-forskningno-kunstig-intelligens/jeg-tror-vi-ma-tilbake-til-penn-og-papir/445585 14. some thoughts from life science community in UK that could be useful? -> https://elixiruknode.org/articles/2025/community-insights-on-ai-all-hands-2025/ 15. [JupyterCon 2025: Real-time Collaboration Is Not Just for Humans Anymore - Zach Sailer & Abigayle Mercer](https://www.youtube.com/watch?v=l5DhqN3su0o) - Chat fatigue, and need to reprompt - Jupyter-AI and https://github.com/jupyter-ai-contrib/jupyter-ai-jupyternaut demo - "Agents": CodeCellEditor, BugFixer Enrico: I have more links saved, will finish pasting later. --- # 3. Blog post: Responsible use of generative AI in assisted coding (potential blog post for CR) In the context of generative Artificial Intelligence used as text-to-text synthesis, the generation of code from natural language queries has reached a state of advancement so that it is possible to *vibe-code* a full software application without necessarily knowing how to code, or how the underlying infrastructure works. In the context of academic research, researchers (both beginners and more experienced coders) have benefited from such tools, especially considering the lack of software engineering background in most fields of scientific research. Researchers have been empowered by the tools (citation needed) so that they can now write working data analysis pipelines with code generated by AI assistants based on large language models fine-tuned to generate code. However, the use of such technology does not come without risk: AI assisted coding is not simply the generation of code after a given *prompt* like in the interaction with a chatbot; the generated code now becomes executable in computer systems (from researchers’s own computer to large shared HPC systems) with potential safety and security risks, especially if the less experienced researcher is not able to verify each of the steps that the code is executing. It is also becoming common to grant more automated power to external AI tools, so that, within the IDE interface, automated *AI* *agents* are able to perform code edits, git commits, pull requests, install dependencies, change system configurations, and all sorts of software engineering tasks, without necessarily the supervision of a human. More broadly, there are also concerns on the sustainability of such AI tools, ethical considerations in how they were created, and possible risk for the final users as they can erode critical thinking and basic coding skills. In the new proposed lesson \*Responsible use of generative AI in assisted coding\* we want to make sure that our workshop participants will: - Understand conceptually how text-to-code AI systems work, and where risks can emerge;. - Learn AI assisted coding with increasing automation left to the external AI tools: - 1\. manually coding with AI Chatbots - 2\. integrated development with AI tool - 3\. full agentic workflows - Explore various prompting techniques, IDE workflows, and external tools (e.g. Google AI studio) - Self evaluate the risks on when AI is ok to be used, and when it is better to consider more *old school* approaches. Make informed decisions if the ethical principles of the users do not match those of the AI tool providers. *Blog post can start going into the aspects of tools and risks and for example explore the data used for fine tuning open-LLMs-for-code.* —