# Summary of ENCCS 2023 Training hackathon Summary of full notes from hackathon: https://hackmd.io/@enccs/training-hackathon-2023 ## Topics 1. [MOOC](#1.-MOOC) 2. [RISE Software Bootcamp](#2.-RISE-Software-Bootcamp) 3. [HackMD alternatives](#3.-HackMD-alternatives) 4. [Event pages](#4.-Event-pages) 5. [Lessons](#5.-Lessons) 6. [Registration questions](#6.-Registration-questions) 7. [Learning personas](7.-Learning-personas) 8. [LUMI Intro lesson](8.-LUMI-Intro-lesson) 9. [New lessons](#9.-New-lessons) 10. [Instructor training](#10.-Instructor-training) 11. [CoE training collaborations](#11.-CoE-training-collaborations) 12. [Publish lessons on Zenodo](#12.-Publish-lessons-on-Zenodo) --- ## Priorities Each topic (or subtopic) gets a priority factor: - 1: highest prio, urgent - 2: less urgent but high prio - 3: important but can wait - 4: remains to be seen when we have time for it --- ## 1. MOOC **Priority:** 2 **Goal:** We develop a MOOC based on all our GPU programming material. This will include: - choose a MOOC platform, e.g. Moodle or Thinglink - collaborate with colleagues at CSC, especially Tiina Leiponen - record 10 minute videos covering portions of material - making corresponding portions available in relevant format inside the platform - develop challenges that learners need to solve to proceed to next section - design some sort of final examination problem **Execution:** - The MOOC can be open for 3 months or so - Learners register and study at own pace - We have hands-on support or Q&A sessions at regular intervals - We use a chat inside the platform for Q&A **Challenges:** - GPU programming requires access to a GPU! - Using an ENCCS training project on e.g. LUMI or Leonardo might be risky, participants could misuse the resources, deplete the quota, run bitcoin mining etc. - Is there a mechanism in SLURM or otherwise to limit amount of resources available to each user in a project? - Consider using a cloud interface, e.g. Meluxina-cloud - Alternative: use ICE? - Alternative: LUMI has an OpenShift/Kubernetes container cloud platform for running microservices. Use Kubernetes to manage JupyterHub for single-user notebooks connected to resources? - Alternative: use google cloud? or another cloud provider where we pay for single-user access to 1 GPU for each participant? **Todo:** - Figure out GPU access to ~100 MOOC participants - Decide on platform - Contact Tiina and start collaboration - Attend own MOOCs to get inspiration and see what works well - Start designing material based on existing lessons --- ## 2. RISE Software Bootcamp Takes place Nov 6-10 **Priority:** 1 **Background:** - RISE HR department is funded to develop software bootcamp to onboard new staff and develop competences of experienced staff - ENCCS was asked to develop this concept and deliver full-week bootcamp - If we do this well, it can become a "product" that we reuse in other contexts - Will take place Nov 6-10, full week full days - Tentative event description and agenda: https://hackmd.io/MP4uinHaQWWkSgVTwa2OFQ - expected participant background - data scientists, ML engineers, data analysts - typically PhDs from natural sciences, environmental scientists **Challenges:** - Which depth should the material go in to? - How do we cater to participants with varying background? - Which materials do we rely on, and do we need to develop something new? - Should we classify topics into different levels, 1-5? **Todo:** - we should think about use cases and backwards design the curriculum - think about use cases and learner personas - decide prerequisites - Decide topics - Git (most important): levels 2-3. Intro and collaborative Git from CodeRefinery - Unix shell: level 1. - standard intro, file system, files and folders, basic commands, editors - good to include ssh also, and understanding the basics of the internet (IP addresses, ports, cloud, remote servers). - Derive from software carpentry lesson. - Programming in Python (levels 1-3): Require some homework ahead of bootcamp to not have to start from scratch? Focus on analysing data. - carpentry lessons - https://aaltoscicomp.github.io/python-for-scicomp/ - level of material could be adapted to audience - CodeRefinery lessons: https://coderefinery.org/lessons/from-coderefinery/ - another very relevant resource: https://carpentries-incubator.github.io/python-intermediate-development/ - Decide in-person / remote ratio - Access to ICE - we could use ICE resources (e.g. jupyter notebooks) during the bootcamp - this serves also as outreach for ICE internally - learning experience: accessing remote resources, Jupyter - need detailed installation instructions - book room Knuth in Kista, but which days? ## 3. HackMD alternatives **Priority:** 4 **Notes:** - HackMD drawbacks: not possible to create private notes under @ENCCS; sometimes glitchy - Hedgedoc hosted on ICE has its own drawbacks. Needs to be maintained and will ultimately cost moneoy - personal hackMD notes (as opposed to organisational via @ENCCS) can be made private and manually shared with selected registered users - for now, we continue using hackMD ## 4. Event pages **Priority:** 2 **Background:** - our event pages should be better, more complete - ILOs missing and information to help readers figure out if this training is for them **Todo:** - add intended learning outcomes - can copy from training material if ILOs present there - if not, write ILOs for lesson and reuse on event page - who is the course for? people reading the course description should be able to tell if it's something for them - we want the right people to attend the right training - we don't want to scare people away, and we don't want over-qualified people - describe what novices get out of training, what intermediate practioners get out, etc - add link to training material for workshops whenever possible ## 5. Lessons Refer to [hackathon hackMD](https://hackmd.io/@enccs/training-hackathon-2023) for lesson notes. ## 6. Registration questions **Priority:** 3 - we should ask more questions to find out who is attending our workshops - relate the questions to learner personas that we design in advance - relates to learner personas - possible questions to add: - why attend workshop? - why are you attending this workshop - what do you expect to be able to do after the workshop - how do you think this training will help you in your project - could be multiple choice question, with extra free text field - question to find out type of person - learner persona - out of these four "learner personas", who are you? - why is it important to you to take this workshop? - is there any research out there on what questions to ask to characterise learners? ## 7. Learning personas - **Priority:** 3 ### Personas for HPC training Asked GPT-3.5 to generate some personas that could be used to design our training material. All personas will not be relevant for all courses. This is just a starting point and we should rewrite and refine based on our experiences from previous courses. >These personas showcase the diverse range of individuals involved in education related to High Performance Computing, each with unique motivations, goals, and backgrounds. Designing educational programs and resources that cater to these personas can help foster a well-rounded HPC learning ecosystem. #### The Ambitious Student Researcher: This persona represents a driven undergraduate or graduate student who is passionate about pushing the boundaries of computational science and technology. They have a strong background in computer science, engineering, or a related field. They are eager to learn about the latest advancements in HPC, parallel programming, and optimizing algorithms. They often participate in HPC-related research projects, attend workshops, and engage with the HPC community. They are proactive in seeking out mentors and networking opportunities to enhance their knowledge and skills. This persona is hungry to make a significant contribution to the world of high performance computing and may go on to pursue a career in academia, industry research, or technology leadership. #### The Industry Professional Upgrader: This persona represents a mid-career professional working in industries like aerospace, finance, energy, or healthcare, where HPC plays a crucial role in solving complex problems. They recognize the need to enhance their skills to stay competitive in their field. This persona seeks out continuing education programs, online courses, and certifications related to HPC. They want to learn about parallel computing, GPU programming, and optimizing code for better performance. They might attend industry conferences, webinars, and networking events to connect with experts and peers. The Industry Professional Upgrader is motivated by the prospect of applying HPC techniques to streamline processes, improve efficiency, and drive innovation within their organization. #### The Novice Researcher Exploring HPC: This persona represents a researcher from a non-computational background, such as a biologist, social scientist, or humanities scholar, who recognizes the potential of High Performance Computing to enhance their research but has limited experience with coding and computers. They are curious and open to learning, but they might find the technical aspects of HPC overwhelming. This persona seeks accessible and beginner-friendly resources to understand fundamental HPC concepts, terminology, and tools. They might enroll in entry-level online courses or workshops specifically tailored for beginners in HPC. The Novice Researcher Exploring HPC is interested in understanding how HPC can accelerate data analysis, simulations, or modeling in their domain, and they hope to collaborate with computational experts to bridge the gap between their research expertise and HPC capabilities. >This persona highlights the importance of creating resources and educational materials that cater to individuals with varying levels of technical proficiency. By providing accessible pathways for novices to engage with HPC, the research community can expand its reach and foster interdisciplinary collaborations. #### The Industry Innovator: This persona represents an experienced professional working in a specialized industry, such as automotive engineering, where a specific project could greatly benefit from High Performance Computing. They have a clear project goal that involves complex simulations, such as crash testing or aerodynamic analysis, which requires immense computational power and precision. The Industry Innovator is aware that leveraging HPC can significantly speed up the simulation process, leading to quicker product development cycles and competitive advantage. However, they might lack the in-depth technical knowledge of HPC implementation. This persona seeks tailored consulting services or collaboration with HPC experts who can guide them through setting up and executing their simulations efficiently. They are results-driven and eager to see how HPC can transform their project, making it more accurate and cost-effective. >The Industry Innovator persona underscores the importance of connecting domain experts with HPC professionals who can translate industry-specific challenges into effective computational solutions. This collaboration can lead to groundbreaking advancements in sectors that heavily rely on precise simulations and data analysis. #### Refining personas These can be fleshed out more like for the Ambitious Student Researcher: ##### Background and Characteristics: The Ambitious Student Researcher is an undergraduate or graduate student with a strong background in computer science, engineering, or a related field. They exhibit a deep fascination with the field of High Performance Computing (HPC) and are motivated by the prospect of exploring the limits of computational science. This persona possesses a solid foundation in programming languages, algorithms, and computer architecture, which allows them to engage with complex HPC concepts. ##### Passions and Goals: This persona is driven by an insatiable curiosity to understand and apply HPC techniques to solve intricate problems. They view HPC as a means to accelerate scientific discoveries, tackle real-world challenges, and drive innovation across industries. Their primary goal is to contribute meaningfully to the realm of HPC, be it through academic research, industry collaboration, or developing cutting-edge software tools. ##### Activities: - Actively participates in HPC-related research projects, often collaborating with professors, mentors, and peers to explore advanced topics like parallel computing, distributed systems, and GPU programming. - Attends workshops, conferences, and seminars related to HPC to stay up-to-date with the latest advancements, tools, and techniques in the field. - Engages with the broader HPC community through online forums, social media, and open-source projects to share knowledge, seek advice, and collaborate on shared interests. - Undertakes personal projects, such as optimizing algorithms for parallel processing or experimenting with new architectures to enhance their practical skills. - Seeks out mentors, both within academia and industry, to gain insights and guidance on building a successful career in HPC. ##### Skills and Aspirations: This persona is proficient in programming languages such as C/C++, Python, and perhaps CUDA or OpenCL for GPU programming. They have a knack for breaking down complex problems into manageable components and devising optimized solutions. The Ambitious Student Researcher aspires to publish research papers in top conferences and journals, contribute to open-source HPC projects, and eventually pursue a career in academia, industry research, or technology leadership roles. ##### Challenges and Growth: While their technical skills are strong, the Ambitious Student Researcher may face challenges in balancing coursework, research commitments, and personal projects. Time management and maintaining a work-life balance are areas where they can continue to develop. Additionally, networking and communication skills are crucial for this persona, as collaborations and connections within the HPC community play a pivotal role in their growth. Overall, the Ambitious Student Researcher persona embodies a passion for pushing the boundaries of technology, a dedication to continuous learning, and a desire to shape the future of High Performance Computing. ## 8. LUMI Intro lesson - **Priority:** 1 **Background:** - For some time there's been a discussion within EuroCC to develop "canonical" HPC-intro material - Now, a collaboration between NCC-Finland, CSC, ENCCS, NCC-Czechia, NCC-Poland and LUST team will develop this material - geared specifically towards LUMI but general enough for other EuroHPC systems - The material should be of self-learning type, and should contain videos - It will be based on HPC Carpentry material: https://carpentries-incubator.github.io/hpc-intro/ **Todo:** - participate in meetings with LUMI collaboration - develop material at https://lumi-supercomputer.github.io/lumi-self-learning/ - first task: finish porting from Carpentry markdown to sphinx-lesson markdown - after that: adapt and improve material - record videos that will be embedded within the lesson material, inside each episode - perhaps give a massive online workshop at some point ## 9. New lessons ### Deep learning intro: - **Priority:** 3 - Carpentry incubator lesson: https://carpentries-incubator.github.io/deep-learning-intro/ - Thor and Hossein taught it once and it went well - does this lesson fit with ENCCS roadmap? - we could use it for outreach! attract new participants thanks to AI hype and use opportunity to tell them about HPC and ENCCS - Martin to consider to adopt and teach it - **Priority:** ### C++ lesson - **Priority:** 3 - possible people involved - Yonglei Wang - Jonas Lindemann - Sandipan Mohanty - Johan Kristiansson - existing material: - https://www.fz-juelich.de/en/ias/jsc/news/events/training-courses/2023/hpc-cplusplus - Sandipan was going to look into open sourcing his training material - Thor to connect Yonglei and Sandipan for future collaboration - https://hpc-portal.eu/node/1798 - https://hpc-portal.eu/node/1792 - https://hpc-portal.eu/node/1555 - Johan can contribute: - Eigen - python bindings - pybind11 - gtest - we can't really prioritise this project right now because of everything else - ask Sandipan to give first workshop with us - consider adopting and building on Sandipan's material if he open sources it ### Performance programming: **Priority:** 1 - Developers: Karl-Filip, include also Wei Z - todo: - Karl-Filip and Wei Z meet and plan - decide on structure, add material, add exercises - https://www.adlibris.com/se/bok/introduction-to-high-performance-computing-for-scientists-and-engineers-9781439811924 - https://www.amazon.se/Intro-High-Performance-Computing-Hager/dp/0367221306/ref=monarch_sidesheet - set date for workshop - week 46, November 13-17 - consider memory aspects - timing, performance measuring, overhead, instrumentation - dynamic memory aspects? ## 10. Instructor training **Priority:** 2 All ENCCS staff who teach or develop training material should attend [instructor training workshop](https://enccs.se/events/best-practices-in-online-and-in-person-training/) and/or go through [instructor training material](https://enccs.github.io/instructor-training/). ## 11. CoE training collaborations **Priority:** 3 **Background:** - We have worked with many CoEs in the past around training workshops and material - All have been successful and many CoEs are interested in repeating it - But we don't have enough resources or time to work with them all **Todo:** - first priority: focus on new CoEs - CEEC CoE (plasma at exascale) - Plasma-PEPSC (plasma simulations) - both coordinated from KTH - write to Niclas and Stefano and get ball rolling - second prio: work with other CoEs that want to work with us, when the topic is relevant to our target groups - idea: "CoE month" or "CoE spring" next year - for ~1-2 months, we organise multiple CoE workshops - involve other NCCs in this? - use EuroHPC - coordinate with CASTIEL ## 12. Publish lessons on Zenodo **Priority:** 3 **Background:** - purpose: give authors credit, adhere to FAIR principles, make training material citable - CodeRefinery is currently doing this for their lessons: https://github.com/coderefinery/documentation/pull/270 **Todo:** - construct authors' list into CONTRIBUTORS file, based on GitHub contribution stats - start with lessons that are fully mature, connect them to Zenodo and make GitHub releases - obtain DOIs and perhaps list them somewhere. Report them as EuroCC deliverables