![all-hands-meeting](https://hackmd.io/_uploads/rJmVuZrTa.jpg) <p style="text-align: center"><b><font size=5 color=blueyellow>ENCCS All-Hands Meeting - Training Session (250131)</font></b></p> **Contents of this documents and quicklinks**: [TOC] ## <span style="background-color: cyan">1. Python HPDA retrospective</span> ### 1.1 Reflections from participants ==the second episode (efficient array computing) was quite packed== - keep the current format - numpy: teaching 20 min + exercises 10 min - pandas + scipy: teaching 20 min + exercises 10 min - we may consider to separate numpy and pandas/scipy into into two episodes - numpy: : teaching 30-35 min + exercises 15-20 min - pandas + scipy: teaching 25-30 min + exercises 15-20 min ==May be more useful- If we can have more practical examples comparing the performance of python with Fortran/c (in terms of speed)== - a new episode or some materials/exercises for this topic? ==more description for the parallel computing episode== - some terms and concepts were rather new to beginners, we might should have a more detailed introduction, like threads and processes - exercises will be further improved ==it would then be very helpful if you provided a few real-world examples as extra material== - maybe we can consider to use real-world examples instead of generating data ### 1.2 Survey results (including those from previous workshops on 2022-05-18 and 2023-09-05) ==What did you like best regarding event organization? Where should we improve?== - Having it over multiple days and only for half a day helped to have enough time to go through the content smoothly and also work on it individually in the evenings, before the next session. - The topic in this course should be split into two or more courses. Maybe Numba should be a course by itself. - it would be helpful to have a list of all modules, libraries, etc, that might be needed during workshop. This way those who do not use your package/LUMI but rather stick with the preferred one at local computer do not loose time installing things during sessions. - maybe you should include the GPU programming as this is very import - I like the format with two sessions in a day. ==What did you like best about the lesson material, exercises and teaching? Where should we improve?== - The exercises were well structured and organized. Having the workshop over multiple days also helped to cover content without hurrying through it. - It would be great to have even more real life examples. - I think you should explicitly ask participants to read the material before each day. - I will address the lesson materials in the welcome email - The course material was best. It would be perhaps a good improvement to include a base of a shortcut tips and tricks as a summary: what to use when you're not sure how to approach problem. - Some of the material was difficult to understand as someone without a computer science/engineering background. Maybe add some extra boxes explaining some of the terms and concepts? Several exercises were too difficult to practice the actual concept. - ==I liked the "slower" session, when we had time to do the exercises during the session.== ==Which topics would you be most interested in learning about in future training events?== - 220518-Python-HPDA - 1. Cython; 2. Advance scientific data analysis with python; 3. How to write a small but well organised scientific application in python/C/C++; - **Machine learning** - sound/language processing - generative? (something like deep fakes) - Tips and tricks for effective application of taught libraries, rather than general introduction and comparison - **ML/AI**, large scale data storage/databases. cassandra or such DBs, distributed storage (Lustre, S3 such), performance optimization, how to use them from python with performance optimization. - **Custom deep learning things**, like multiple inputs and outputs and input of varying sizes - GPU parallelization, more on optimising python code / using numba or cython. Even on using other languages like Fortran or Julia. - High computation performance for specific fields: **Machine learning and computer vision** - I would like to deepen my knowledge in numba. Maybe some course in advanced visualization techniques. - **Machine learning for geospatial data** - 230905-Python-HPDA - Looking forward to Julia courses - Access to cluster and practice jobs in python in a supercomputer cluster is mandatory.... - **Artificial Neural Network (ANN)** - Python HPDA on different platforms specificities - 250121-Python-HPDA - **gpu programming, machine learning** - **Would actually appreciate one more day where you talk about GPU** - A use case I don't see covered frequently is profiling and parallelising when using python packages such as **interpetml**. What should one do if they can't change the packages easily but they need to speed up their code? - I would suggest extending course content a little bit but I feel that most of the topics related to computing were covered ==Any other comments?== - **Breakout rooms** are always a challenge in this type of training sessions. Many people don't like to talk a lot or collaborate with strangers, and that is understandable. Maybe participants should be asked to raise their hands if they want to be sent to a breakout room to work together with other people and the rest should stay quiet in the main room. ### 1.3 Personal opinions for teaching and organizing events - it is better to have workshop on morning session (9:00-12:00 or 9:30-12:30) - YL will send lesson materials in welcome email and participants can go through lesson materials generally before workshop to see which topic will be covered for each day - balance teaching and exercise, avoid leaving all exercises to the end (for long episodes) - for a session with 50 minutes (XX:00-XX:50) - 1 round teaching-exercising (maximum 2 rounds) - lecture session can be 25-30 min and exercise session can be 20-25 min - for a session with 80 min (XX:00-XX+1:20) - there can be 2 lecture sessions and 2 exercise sessions - 1st round, lecture 20-25 min then exercise 15-20 min - 2nd round, similar arrangement depending on teaching contents - instructor for each episode (except for the 1st episode) should say a few words for a general description about the current episode +1 Francesco - correlations between episodes - a short recap about each episode when it ends, maybe some description about exercises (Ashwin) +1 Francesco - at least two people for each episode, one as instructor and the other to provide support/helper +1 Francesco ### 1.4 Expansion of Python workshop Expansion to three workshop: https://hackmd.io/@yonglei/python-workshops - 1. Python HPDA - If we split them, we could also cover more things about big data storage and retrieval (S3, databases, tips on parallel file systems...) (Francesco) - 2. Python HPC - it would be beneficial to have comparison of all parallelization methods and best use cases as a summary - like [an overview of common data formats](https://enccs.github.io/hpda-python/scientific-data/#an-overview-of-common-data-formats) - Someone in the feedback mentioned having a KB of tips and tricks, sounds interesting! (Francesco) - 3. Python ML/DL **Publication for the lesson material** - Zenodo - JOSE ## <span style="background-color: lime">2. Arrangement for workshops and webinars</span> <iframe src="https://calendar.google.com/calendar/embed?height=500&wkst=1&ctz=Europe%2FBerlin&src=NWQ5NWNiNWI4ZWQ1ZDhmZjBkNDliNDVlMjIyNDQ3ZTQ2MjAxMDY2NDZmYTMxZjhjY2VkMjRhZWVmZGRlMjZkZUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t&color=%23F6BF26" style="border:solid 1px #777" width="800" height="500" frameborder="0" scrolling="no"></iframe> - Feb. 04-07, Julia High-Performance Data Analytics - Mar. 03-07, MultiGPU Train-the-Trainers Course - Yonglei and Ashwin will teach basics of deep learning (~ 3h) - [intro to deep learning](https://enccs.github.io/deep-learning-intro/) - [schedule](https://docs.google.com/document/d/1ztkd5I2k40QetHLwKdnOw4d6Ub_BsrR2epV2dt0wV3E/edit?tab=t.0) - Mar. 12, Training Hackathon - Mar. 18-20, EuroHPC Summit - Mar. 25/27, Practical Intro to Machine Learning - Apr. 08-09, NVIDIA N-ways Bootcamp - Apr. 29, Practical Intro to GPU Programming with CUDA - Apr. 30, ENCCS Industry Days - May 12-16, Introduction to Deep Learning - May 27-28 (Jun. 3-4), NVIDIA AI for Science Bootcamp - there will be hackathon at Sept./Oct. - can we attend this event as participants(?) - Jun. 17-18, NVIDIA MultiGPU Bootcamp - Jul. 09-10, NVIDIA AI Multinode Profiling Bootcamp Waiting list: - workshops - [OpenFOAM](https://enccs.github.io/openfoam/) (April?) - contact Karim from NCC-France - CEEC CoE - contact niklas again - ==Week 20, intro to deep learning== - contacting NCC Romania for detailed dates - basics of deep learning and two/three application cases - CR workshop in KTH? (streaming using youtube?) - April, May, potential collaboration with Hyperight - A one-day workshop with Frank - OpenACC and using a graphic interface to manage input/ouput from compilation - webinars - ==Week 13, Practical Intro to machine learning (YW)== - ==Week 18, Practical Intro to GPU Programming with CUDA (YW)== - Practical Intro to GPU Programming with OpenACC (==???=) - MoroccoHPC webinars - Array computing using Python (???) - Robert Luciani -- Julia and HPC, or general AI topics - Ashwin – Using MLFlow on LUMI - ==Johan – ColonyOS (April)== - reminder johan for a specific date - Francesco – Julia and ML - Thor – some general EuroHPC - Introduction to supercomputing for AI (???) ## <span style="background-color: magenta">3. MOOC project</span> ## <span style="background-color: orange">4. Training hackathon and reorganization of Github repositories</span> [**Working on the github repos**](https://hackmd.io/@yonglei/enccs-github-repos) :::danger :::