Metatrain hackathon

# Metatrain hackathon GOAL: make it easier to use and develop metatrain NO NEW FEATURES, refactor and documentation ## Organization - Tue, 10am: explain the concept & distribute work - Tue, 1pm: pizza - Tue, 5pm: wrap-up - Wed, 10am: start, stand up, distribute new work as needed - Wed, 1pm: burgers - Wed, 5pm: wrap-up ## Needs to happen before the hackathon - figure out remote attendance? - zoom with breakout rooms - pizza remote? - merge scaler PR - maybe the new dataloader? ## Tasks ### Review needed - https://github.com/metatensor/metatrain/pull/792 # this one is waiting for #802 to add forces - [merged] https://github.com/metatensor/metatrain/pull/798 - https://github.com/metatensor/metatrain/pull/799 [name=Sofiia] - https://github.com/metatensor/metatrain/pull/801 - [merged] https://github.com/metatensor/metatrain/pull/802 [name=Joe] - [merged] https://github.com/metatensor/metatrain/pull/804 ### Code tasks - [name=johannes will not investigate further, we don't care for now] Look into repository size and try to reduce it - apparently the documentation is included - some branches contain big data files, like `src/metatrain/experimental/phace/modules/physical_LE/eigenvectors.npy` (15mib) on `origin/hartmut-cgnet,origin/hartmut-transformer,origin/phace,origin/phace-hartmut` or `tests/resources/water_4fs.zip` (11mib) on `origin/flashmd-with-pet` - [name=Rohit] overall repo size is large, ~124MB - Deprecate NanoPET [name=joe - done] - [name=johannes - done] Remove old PET architecture - [name=joe - done]Update architecture to use new CompositionModel - [name=joe - done] and then remove the old one - [name=joe - done] Remove the old loss functions - [name=Rocco] Check why CSCS CI takes more than 1h when the main one takes ~10min - :white_check_mark: Reduce number of `pytest` workers - :white_check_mark: Run `tox` in-memory - :clock1: Ensure CUDA is used by the `tox` tests; ARM+CUDA CI is too brittle - :white_check_mark: Updated `torch` and CUDA versions - :clock1: Add check that GPU is used - [name=Sofiia, in review] Collect and show code coverage for architectures - Then try to improve it as relevant - Change tests using `mtt train` & co to run in the same process instead of `subprocess.run()`. Maybe some `mtt_run(*args)` function? [name=Guillaume] - [Setup regression tests on CSCS](https://github.com/metatensor/metatrain/issues/705) - Look through all code in `utils`, and add/improve docstrings [name=Pol] and [name=Paolo] will do - Update type hints to use builtin types: `list[int]` instead of `List[int]` [name=Qianjun - to do when there won't be merge conflicts] - [name=johannes wondering what's going on - done, waiting for 786 by rohit] Only train models once when building docs? - [name=johannes - done] Remove arg 'sliding_factor' from loss definitions: https://github.com/metatensor/metatrain/issues/542 - Change minimal supported Python to v3.10 now that 3.9 is EOL ### Documentation tasks - Overall proposed new structure: - Installation - Getting started - Quick Start - Configuration and Units - Available Architectures - Tutorials - Beginner - Advanced - Concepts and Design - Citing - FAQ - Developer documentation - Create tutorial categories (begginer/intermediate/expert) - [name=Hanna] and [name=Cesare] will do this - Rethink "getting started" section: Move tutorials from getting started to begginer tutorials: like split Advanced Base Configuration into maybe maybe into "choose a device and precision", "Run a reproducible training run", "use wandb for training logging" etc. - [name=Hanna] and [name=Cesare] can handle the restructuring of tutorials (both tasks above) - Add new tutorials: - training a MLIP from scratch - [name=Qianjun, done] - link to the cookbook fine-tuning example from metatrain - comparing different architectures on a single dataset - visualization - [name=Markus] - run metatrain with lammps? or at least link to it somewhere <= should be a link to the metatomic examples - [name=Egor]data validation with parity plots for energies and forces - Go over the hyper parameter reference and improve it (at least for PET/SoapBPNN) - [name=Raymond] and [name=Alessandro] will handle hyperparameter doc improvement for SOAP-BPNN and LLPR -- DONE - Create training decision tree (see below) - [name=Rohit] consider the super fancy https://twinery.org - [name=Rohit] finds the state of interactive JS storytelling to be quite ugly, between sphinx-needs (draft PR) and the "story driven" renpy js / inky / twinery / monogatari / etc which seem to be geared at a.. different set of people... - maybe even just a sphinx-design based dropdown / tab thing... - am experimenting with a revealjs style presentation per user story thing - TOX has a flowchart -> https://tox.wiki/en/latest/user_guide.html - [name=Philip] Documentation page about the supported data formats - Look at the overall doc organization and decide on changes to be made - Create FAQ/troubleshooting section - [name=Cesare] and [name=Hanna] will create this and first questions, then feel free to edit - Add questions to FAQ - Restructure the examples directory. - [name=Philp], [name=Cesare], and [name=Hanna] will look at this on 8.10.2025 moring or afternoon. Let's see - [name=Rohit] would like to help [name=Rohit] has several questions about the design and CLI guidelines, but these are wider restructres, e.g. using `uv` or something to dispatch so dependencies of architectures are decoupled from the main set, or using `pydantic` for schema based option / validation / documentation. `tox -e tests` takes a long time and large resource usage, can we have a smaller subset. - [name=Rohit] will update contributing - in particular, how do we want people to tell us / contribute that they use `metatrain` ? - xref `Contributing.rst`: "The first and best way to contribute to metatrain is to use it and advertise it" ### Maybe? - metatensor.org landing page - [name=Michele] first working version - DNS recors broken and then fixed - Google search console property created - move metatensor docs to docs.metatensor.org/metatensor/? - move metatrain documentation to docs.metatensor.org/metatrain/ ## Needs to happen AFTER the hackathon - Anything not done needs to be an issue ----------------- # Random notes ## Users stories - I'm a new student, and I want to go from `ssh kuma` to training start in 1h - I'm an expert, I want to find the name of parameter to change it myself - I'm a model developer, I have a finished and published model I want to put in metatrain - (NOT CURRENTLY A GOAL) I'm a model developer, I have an idea about a new model I want to develop with metatrain ### New documentation #### categorize tutorials into beginner/advanced - getting started - tutorials - beginner: - train new MLIP from scratch [name=Qianjun, done] - fine tune PET-MAD [name=Markus] - advanced - compare architetures on a given dataset - hyper-parameters sweeping [name=Rohit] - advanced concepts - multi-gpu training [name=Qianjun] #### Currently supported data inputs, help people prepare data - how to prepare input data from dft data: create a xyz file with array field info field etc - Maybe use a DiskDataset if datatse is BIG #### model training decision tree - What do you want to do? - Run MD - Predict properties of a dataset - Train a new potential - What’s your target? - Energy only - Energy + forces - Long-range properties (dipoles, charges, etc.) - What’s your training data situation? - I have no data :( - I have a small dataset (<10k structures) - I have a large dataset - What resources do you have? - CPU only - Single GPU - Multiple GPUs / HPC cluster - Map to recommendations: based on the answers, guide them to: - Use case: e.g. “Fine-tune an existing pretrained water potential” - Suggested method: e.g. “Try architecture X with descriptor Y” - Next step in docs: link to tutorial or config template. #### FAQ, how to troubleshoot common issues - Training diverges - data creation - convergence issues - GPU out of memory

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.