# NSF Project Report - Cybertraining 2022 # Accomplishments ## What are the major goals of the project? Our project goal is to effectively and equitably build the computational workforce needed for the next generation of dark matter experiments. We proposed doing this by: * Working with our individual collaborations and the collaborative team of this proposal to unify training. * Having experts within our groups develop missing material across a range of expertise levels. * Having two workshops per year—which are geographically distributed and representative—to consistently train new junior scientists, leading to the ability for them to run their own workshops in the future. ## What was accomplished under these goals and objectives? This year we ran our first in-person workshop during the SNOWMASS community planning event in collaboration with IRIS-HEP ("DANCE/CoDaS workshop)". The composition of students was the community that we were targeting. See https://indico.cern.ch/event/1151329/timetable/ for the agenda and the attached documents for survey and workshop assessment. Our goals are: 1. Identifying shared educational and training needs in the dark matter community - The DANCE/CoDaS workshop brought multiple experimental physics communities together: dark matter, neutrino, astrophysics, and high-energy. - The positive student response to the DANCE/CoDaS workshop in collaboration with IRIS-HEP supports the idea that students have a need for computational skill training across the dark matter field and share much of this need with neutrino, astronomy, and high-energy physicists. 3. Broadening Adoption of Advanced CI - For training at a level above the summer school, Peibo An and Victoria Pillazio both created instructional materials showcasing ML methods. - Chris Tunnell led the DANCE/CoDaS workshop organization, which featured X ML sesssions - Chris Tunnell led an open sesseion where students chose their own ML project and ask questions of the ML experts in attendance. - Amy Roberts led an open discussion session where students identified their analysis goals and what CI was necessary to get to their final plot. 5. Integration of CI Skills into Curriculum/Instructional Material Fabric: - The Zenodo community DANCE-Edu hosts tested training materials (https://zenodo.org/communities/dance-edu/?page=1&size=20) - Peibo An's and Olivia Piazza's works are publicly available through the above zenodo link - Chris Tunnell's tutorial on convolutional neural nets is publicly available on github: https://github.com/DANCEOrg/DANCE-EDU-CNN-Tutorial_March2022 Please see the attached documents for the workshop evaluation. ## What opportunities for training and professional development has the project provided? Twenty early-career researchers participated in a workshop to learn basics of python library use and machnie learning. In addition, early-career scientists new to SuperCDMS (five in the last year) have all used the tutorials created previously under this grant. ## Have the results been disseminated to communities of interest? If so, please provide details The in-person training at Snowmass was communicated and accessible to anybody in the dark matter and neutrino community. This material is still online at https://indico.cern.ch/event/1151329. We hosted a visiting scientist Fellowship for early career reserachers to visit an institution. The idea is that they would learn a skill over the course of 10 days, then communicate this to a new community that they were returning to. At this point, we have had two fellows who gave their own trainings to groups of 10 and 50 people, respectively. Please see the Changes/Problems section for details on what we plan to do in the upcoming year. ## What do you plan to do during the next reporting period to accomplish the goals? We plan to run another cross-experiment workshop. We plan to continue the fellows program, with an emphasis on core knowledge that is necessary for research in the field yet difficult to obtain. Examples are the fundamentals of data acquisition systems and methods like MCMC that are useful for estimating parameter uncertainties. In addition, we plan to work with QUBES, IRIS-HEP, and the Science Gateways Community Institute to create a website that makes it easy to advertise and find material. # Impact ## What is the impact on the development of the principal discipline(s) of the project? The materials produced so far provide a friendly entry for early career scientists into the fundamental knowledge of the field, particularly the limit-setting tutorial. We have found no other publicly-available materials like this, which suggests a substantial amount of "hidden knowledge" that scientists must learn individually through massive struggle with the literature and/or legacy code bases or through discussion with experts in the field. This often slows down scientists new to the field, places a burden on experts that the small groups in Dark Matter can rarely afford, and gives nearly every group the role of gate-keeper. One way we can measure success is - do early-career scientists who are struggling locally to get the information they need know they have another option for learning this material? Can they find what they need? ## What is the impact on other disciplines? There is overlap in training materials needed for the dark matter field and other compute-heavy scientific disciplines, specifically software engineering skills that bridge between the basic Software Carpentry introduction and those needed by scientists working on experiment code. Therefore, working on gaps in the training materials for the dark matter community is an opportunity to address the gaps in software engineering training across multiple scientific discplines. The challenge here is to coordinate with already-existing efforts. So far we have identified IRIS-HEP as another project in this space and they have been enthusiastic about collaborating. ## What is the impact on the development of human resources? People participating in workshops gain concrete software engineering and analysis skills. In addition, fellows that have prepared material have learned about online publishing tools like github and Whole Tail. ## What was the impact on teaching and educational experiences? Making hidden knowledge available has had a positive effect on educational experiences. For example, a high-school student who attends a school with a limited science program has recently joined my group. He began by working through the Software Carpentry curriculum, then began looking through the materials created by this grant and the IRIS-HEP software engineering materials together. This has given him sufficient background knowledge to begin reading papers and also looking through analysis code, making it possible for him to begin working on an active analysis project. He may have been successful at getting to this point, but only with many meetings with me - he's so far been too shy to reach out to other collaborators, not atypical even for graduate students! It would have taken him much longer, and it would have taken significant time investment from me. I asked him to join the group because I knew I had material to give him that would save me time and make it possible for me to mentor him within my scheduling constraints. ## What is the impact on physical resources that form infrastructure? Nothing to report ## What is the impact on institutional resources that form infrastructure? Nothing to report ## What is the impact on information resources that form infrastructure? Nothing to report ## What is the impact on technology transfer? Nothing to report ## What is the impact on society beyond science and technology? Nothing to report ## What percentage of the award's budget was spent in a foreign country? None # Changes/Problems ## Changes in approach and reasons for change To better share materials between dark matter collaboration, we plan to create a website based on the initial QUBES model, a website and organization created to share computational biology training materials. The QUBES website was originally supported by the Science Gateways Community Institute and is particularly easy to maintain with addition of new materials. We will reach out to QUBES and also to IRIS-HEP, which hosts educational material for computational high-energy physics, to determine if we can re-use their website infrastructure for our material. The materials created by the fellows have been useful for training in my collaboration and we plan to have another call. We are going to also recruit for interest in making lessons that would be suitable for cross-listing on the IRIS-HEP lesson page, which is gaining visibility. Examples are: using MCMC for parameter width estimation using python libraries like emcee and the BAND framework and version control skills that are needed for feature developers like pulling frequently from the main branch. Finally, the in-person workshop was fantastic and we plan to run an in-person workshop at the upcoming SuperCDMS collaboration meeting. This workshop will be primarily limited to one collaboration but will give us the opportunity to test out and identify improvements for existing materials. ## Actual or Anticipated problems or delays and actions or plans to resolve them The return to travel has been slow; now that people are gathering again, it's clear that in-person workshops that support remote attendance are an important aspect of the training we need to do. Unforeseen circumstances have resulted in UofH no longer being able to participate in this project, where UofH will not use their Y3 money for this award to NSF (which was the year that the money went to them for workshops) and any unspent other funds. Each institute has two key roles so the impact on the project has two parts: material development and hosting a workshop in a certain year of the award. UofH will subaward the unused balance to Rice University, or return the money to NSF if this fails Material development: Due to personnel turnover, UofH will not be able to develop material in Y3 (and part of Y2) of the award, which means that the total material development for advanced cybertraining. However, this has had only a small impact on Y2 as partnerships with other entities (CoDaS/FIRST-HEP) and our pay-it-forward fellowship model during the pandemic has resulted in more total material enabled by the project than we expected. This can be seen in our ability to run a multiday workshop consisting of trained analysts. Rice and UDenver will continue with material development as planned in Y3. UofH will subaward their material development budget to Rice who will use it to expand neutrino-oriented material toward larger neutrino experimental communities such as DUNE. Workshops: Due to the pandemic, we had Y1 underspend where the rate of spend for smaller workshops with our pay-it forward fellowships was naturally lower than a big workshop in terms of participant support. The UDenver workshop funds are mostly spent and Rice is partially spent. UofH will send their PS funds to Rice to organize workshops toward the end of this award (or they will return the money to NSF) ## Changes that have significant impact on expenditures The pandemic travel restrictions resulted in underspend of participant support. We were going to cycle the payment of workshops between collaborating institutions in this project, where Denver covered the last workshop and Rice will pay for the next one. However, we both have been paying for visiting fellows where we are rate limited in this since each trip requires a lot of individual training. ## Evaluation Our goal main for the project was hosting advanced cyberinfrastructure training, where our flagship of this reporting period was a summer school co-located with Snowmass. For our workshop DANCE/CoDaS@Snowmass in the summer of 2022, we performed an intake and outtake assessment. ⁃ For the intake assessment, we collected information on the participants and their backgrounds. ⁃ We had 27 people apply to the training where we accepted 23, where those not accepted either were too inexperienced or were accepted at another similar school. ⁃ The applicants consisted 80% were graduate students, mostly a year or two in, and 20% were postdoctoral researchers. ⁃ The scientific backgrounds of the participants were largely from neutrino, astroparticle, and similar particle physics experiments, which represented an opportunity to train students from subfields not typically trained by existing schools in related subfields. ⁃ In terms of knowledge, we followed survey best practices by asking how often participants used certain technologies, such as C++, Python, parallel programming, machine learning tools, version control systems, and others. Participants were given the options of once per day, week, month, year, or never. This is better than asking participants if they are good at a skill. We also asked what the longest program the participant had written was and what it did. We also requested their Github ID (if applicable) to review previous work. ⁃ As this is an advanced training, more than two thirds of applicants used C++ and Python daily. Most participants did parallel programming at least once per month. There was a wide range of backgrounds with regard to machine learning, with it being evenly split between daily, weekly, monthly, and yearly/never. ⁃ We did not explicitly collect diversity information related to self-certifying if the participant belonged to an underrepresented group, however 5 students during the training communicated their interest in getting mentorship from Dr Aaron Higuera -- paid off this grant -- to discuss the challenges of being Hispanic in physics (this is outside our official scope though this mentorship did happen). ⁃ Outtake: We had 20 responses in our outtake assessment, where the 3 missing responses were due to either COVID or travel conflicts. ⁃ The full response is shared as a separate document. ⁃ Scale: 1 is insufficient and 5 is excellent ⁃ Overall program of the school: 50% "4" very good and 50% 5 "excellent" ⁃ • • • • hands-on experience on python/ML was great. • This was an excellent training. I feel like I have learned a lot during this last week! All the instructors were great, the materials excellent. •