HackMD - Collaborative Markdown Knowledge Base

## The goal The goal of my ECA was to develop a scalable, high quality machine-learning-based data reconstruction method for neutrino experiments that utilize Liquid Argon Time Projection Chamber detectors, or LArTPCs in short, including the Short Baseline Neutrino program and Deep Underground Neutrino Experiment. ~~## Reco chain in a nutshel Steps of physics data reconstruction is naturally hierarchical: starting from a low-level signal processing, identification of indibidual particle trajectories, inference of particle-level information, then finally event-level information inferred from correlations between multiple particles. ~~ ## Progress in the past A full reconstruction chain has been completed by the end of the 3rd year. In this 4th year, we have introduced capabilities to quantify calibrated uncertainties for key physics outputs. Just like a traditional reconstruction method, the chain consists of multiple modules of ML algorithms that perform individual reconstruction tasks. The whole chain combines these algorithms in a hierarchical manner based on domain physics knowledge. The full chain was presented at the NeurIPS workshop for physical science in 2020, and individual algorithms have been published in multiple PRD papers. Some of them received editor's pick and DOE science highlights. ## Accessible research To make our work easily accessible and reproducible, we published all elements of our work including the code repositories, software containers, trained ML models, and all of datasets used for training and analysis. Documentations are in the form of a paper and data repository wiki page. This is a first public dataset for LArTPC experiments. The use of this dataset reached outside neutrino physics and includes computer science projects funded by NSF and DOE. ## Collaboration building Along this line, I led much effort in developing a ML community within neutrino physics and connect with the rest of HEP and beyond by organizing international ML schools and workshops. Outcomes include inter-experimental collaborations and the workshops such as Neutrino Physics and Machine Learning, which was a satellite workshop of Neutrino2020. At SLAC, I took a leading role to form the ML Initiatives group at SLAC which now hosts ML school annually. It's a great pleasure to see those who attended as students 3 or 4 years ago are leading ML projects in experiments and industries. ## Last year in ECA Our current focus is integration of the developed ML techniques into the experiments including MicroBooNE, SBND, ICARUS, and DUNE. Our software is scalable and will enable the use of the HPC centers for data reconstruction. In parallel, we are optimizing the ML algorithms for each experiment. This includes optimization of physics models in simulation so that it can represent real data with accuracy and precision. Discrepancies between data and simulation has an impact on the performance of our reconstruction methods because our they are trained on simulation. An effective solution to this challenge has huge impact in neutrino experiments, and is a part of my future projects. ## Differentiable simulator One solution to this is to automate the process of tuning detector physics models. We have developed a differentiable LArTPC detector simulator which can be used for simultaneous optimization of detector physics parameters using control dataset. This can autoamte detector physics analysis, which will dramatically reduce time and human effort in an experiment. Moreover, this software can be used either directly or indirectly to solve an inverse imaging problem for LArTPC detectors. Namely, we can unfold the detector physics effects from image of neutrino interactions. It allows a direct comparison of near-far neutrino interactions at the event-by-event level. This way of doing analysis has been always dreamed but never been done in our community. If successful, this will be the first of its kind. ## Foundation Models Finally, we are exploring a new direction to use Foundation Models in neturino experiments. Foundation Models can be laregely trained directly on real data and thus much more robust against the same challenge. Moreover, Foundation Models are aimed to learn all useful features to represent data, and can be a common base for all data reconstruction tasks, and it may be shared across experiments. This is particularly impactful for small experiments that may not have resource to develop and optimize a large ML models. There is strong interest in the research of Foundation Models in the AI community, and HEP dataset and physics knowledge brings unique elements to accelerate the HEP-for-AI research.