owned this note
owned this note
Published
Linked with GitHub
# Metatensor-models
Plan for unifying metatensor-based models
> **note:**
>
> mlelec would provide another set of tools to define models, used as a dependency for some models in metatensor-models
## Existing stuff
### MLIPs
- https://github.com/lab-cosmo/equisolve (public)
- Linear models with numpy, energies and forces
- Converters from ASE to metatensor for properties
- RMSE on metatensor
- blocks to build a model
- https://github.com/abmazitov/torch_alchemical (public)
- Metatensor wrappers for linear layers, activation functions, graph convolutions, capable of doing forward pass directly on metatensor.torch.TensorMap objects
- Features calculators from either torch_spex or rascaline [only torch_spex at the moment]
- Power Spectrum, Behler-Parinello Power Spectrum and Alchemical models based on torch_spex Spherical Expansion
- Convenient training tools from Pytorch Lightning (allows for training callbacks, model checkpoints, multiple-gpu training, etc.)
- Automatic logging of the experiments with Weights & Biases
- TorchScript compatibility
- https://github.com/Luthaf/alchemical-learning (public)
- only alchemical models
- train model with either linear algebra or gradient descent
- very bad, don't use it
- https://github.com/bananenpampe/H2O/tree/move-rascaline (public)
- SOAP/LODE/RS BPNN with rascaline-torch (energies, forces)
- roughly oriented on Schnet (aggregation, feature, response torch layers)
- and torch
- atomic properties (chemical shieldings)
- i-pi driver
- uncertainty quantification (UQ)
- UQ biased PES
- uses pytroch lightnig to simplify training
### Electronic structure
- https://github.com/jwa7/rho_learn (public)
- torch/metatensor equivariant global model, wrapping linear/ arbitrary nonlinear block models
- torch dataset + dataloaders, L2 loss for scalar fields and arbitrary-rank tensors
- Parsers + calculators for FHI-aims, specifically for RI fitting of scalar fields
- End-to-end predictions (i.e. integrated w/ AIMS)
- Examples for learning tensors and fields
- Assumptions: input = ASE frames, reps w/ rascaline, targets in angular basis, train w/ torch by gradient descent
- https://github.com/curiosity54/mlelec (private)
- Tangential stuff to unify electronic structure o/p (multicenter) including Hamiltonian stuff/interface with electronic structure, i.e. self contained codebase to compute desired target, train, and provide the output back to the elec code (with focus on PySCF for now)
- interface with other (non acdc) models
- torch based
- https://serfg.github.io/pet/
## Goals & non-goals
We want something for two classes of users: ML developers who want to create new models/new architectures; and ML users who want to train existing models/architectures on new datasets.
We will have two libraries:
- Building blocks for ML developers, living in the metatensor repository
- End to end models for ML users, living in [metatensor-models](https://github.com/lab-cosmo/metatensor-models)
Both will follow all the usual software engineering practices:
- pip installable
- conda installable
- documentation
- tests
- examples
- CI
- *etc.*
### Building blocks library
#### Goals
- Make prototyping of new models easy
- Provide most of `torch.nn`
- Provide kernel & linear models
- Support both linear algebra and gradient descent for training
#### Non-Goals
- Contains code used by a single application/model. This will live either in metatensor-model (for stable stuff) or in standalone repositories
### End to end architecture and model training library
Nomenclature: the `architecture` is defined by the code in `forward()` and the training procedure, while the `model` is an architecture trained with a spefific dataset (i.e. a model is something that can be used as-is for MD, an architecture needs to be trained first).
#### Goals
- Use existing architectures with new system/datasets
- Allow us to market our models/architectures in conferences, have a single place to point people to if they want to try our models
- Invariant & equivariant predictions
- Per-structure and per-atoms predictions
- Allow external contributions of new architectures with clear guidelines
- Provide pre-trained models (as github release artifacts); automatically download them on user request
- Does not require users to write any code to train a model
- Allow less technical users to train models and compare different models
- Make it clear which models are stable/experimental
- we will include stuff used in a paper
- we will include stuff in development for a paper
- we will **not** include tools to build new models
- Some facilities to compose existing architectures together, not necessarly easy to use by end users
- Minimal building blocks to allow someone else to build active learning on top of this repo
- regressions tests with 0/1 epoch training + fixed seed random weights, only for non-experimental models
- per-architecture dependencies management:
- Each architecture comes with a `requirements.txt` or `requirements.py` defining the dependency for this model. User can install metatensor-models alone (and get import error when trying to use some models); or install metatensor-models + the dependency for some architectures:
```
pip install metatensor-models
pip install metatensor-models[allegro]
pip install metatensor-models[allegro,mesh_lode]
pip install metatensor-models[allegro_mesh_lode]
```
- it must be possible to install all architectures at the same time, meaning two different architectures can not have incompatible dependencies
### Non-goals
- Anything that is not PyTorch
- Jupyter notebooks
- Reproducing results from papers
- Architectures that are not usable for research (i.e. small building blocks for larger models, such as atomic composition)
- Full active learning loop
## Requirements/technical decisions
### Building blocks
#### Make prototyping of new models easy
- Simple custom dataloader & dataset (prototype in: Joe's rho learn)
- support both loading from memory & from disk
- support both pre-computed features & systems as input
- loading arbitrary number of data in each minibatch
- Loss functions
- MSE/MAE/L2/...
- **PROTOTYPE** loss on values and gradients
- Export metatensor System data to torch-geometric format
- Building blocks that can handle equivariant properties
#### Provide most of `torch.nn`
- Wrapper for something similar to ModuleDict, applying the module to blocks one by one
- Some well used models should also be directly provided (mainly `LinearModel`)
#### Support both linear algebra and gradient descent for training
- Option to solve with numpy export to Torch?
- Option to solve with Torch directly for GPU support
- Explicit user choice for one or the other linalg backend
### End to end models
#### Use existing models with new dataset/system/research topic
- consistence API boundary: input & output are well defined
#### Allow us to market our architectures in conferences, have a single place to point people to if they want to try our architectures
- give each architecture a name
- Have example on how to train model with a specific architecture, export it & use it
#### Invariant & equivariant predictions
#### Per-structure and per-atoms predictions
- architectures needs to define what properties they can train on & what they can output
#### Store architecture & way to train this architecture into a model together
- two files per architecture, one for the architecture itself, the second one containing a trainer
#### Allow external contributions of new architectures with clear guidelines
- Document + stupidly easy example of how to add architecture & what's required (tests, comments, docs, …)
#### Does not require users to write any code to train a model with existing architectures
- Single `train.py config.yml` script than handle all architectures
- Single `eval.py model.pt dataset.xyz --output excel --output chemiscope --output txt,csv` command
- + tools to do this in
- Extracting properties from XYZ/AIMS/VASP/... to `TensorMap`
- Extracting systems data to `metatensor.torch.System`
- Convert from cartesian `torch.Tensor` to spherical `TensorMap`
- Integration with TensorBoard/Weights & Biases/...
- Everything should still work wtihout any of these tools
- Workflow for training:
```
pip install metatensor-models[allegro,mesh-lode]
cd where-the-training-data-and-logs-and-anything-else-should-live/
vim config.yml
train-metatensor-model config.yml
# create checkpoints folder + log file + log to stdout
export-metatensor-model config.yml --epoch=100
# export the best model from checkpoint
cp exported/my-model.pt lammps-simulation-folder
cp -r exported/my-model/torch-extensions lammps-simulation-folder/torch-extensions
```
#### Allow less technical users to train models and compare different models
- How does user interact with the code?
1. single script for the repo w/ configuration
2. import from python
3. ~~one script per model~~
#### Make it clear which models are stable/experimental
- we will include stuff used in a paper
- we will include stuff in development for a paper
- we will **not** include tools to build new models
- have an `experimental` folder for new architectures being beta-tested
- have a `deprecated` folder for old stuff that will be removed if it becomes a maintenance issue
#### Some facilities to compose existing architectures together, not necessarly easy to use by end users
- you can import from one architecture into another one, no additional structure requirements
#### Minimal building blocks to allow someone else to build active learning on top of this repo
- initially only making the trainer script importable, long term to be determined
## Milestones
Team leads for the libraries:
- `metatensor-learn`: Arslan, Alex (1/2), Joe (1/2)
- `metatensor-models`: Philip, Filippo
### Minimal Viable Product
#### Metatensor-learn
Decide name
Python package scaffolding
ModuleMap (i.e. ModuleDict working on TensorMap blocks) ([prototype with tests](https://github.com/lab-cosmo/equisolve/blob/main/src/equisolve/nn/module_tensor.py))
- Dataloader, once metatensor #405 is merged
- Neighborlist computation via `rascaline` ATM
Folder structure
```
src/metatensor/learn
data/
__init__.py
dataset.py
dataloader.py
nn/
__init__.py
module_map.py
utils/
__init__.py
collate_fn.py # A utility for a dataloader
tests/
test_dataloader.py
test_module_map.py
```
DEADLINE: Christmas 2023
- Alex
- Arslan
- Joe
#### Metatensor-models
END GOAL of MVP: train on QM7, get a %RMSE from CLI
Python package scaffolding
Single architecture: SOAP Power Spectrum + BPNN
Corresponding trainer code
Read energies from ASE to TensorMap
Convert positions+cell from ASE to System (re-use rascaline for now, will be removed)
Single train script + config + checkpoint
- look into hydra
Basic evaluation script
Use `rascaline.torch.System` in-place of `metatensor.torch.atomistic.System` until [metatensor#405](https://github.com/lab-cosmo/metatensor/pull/405) is merged
Folder structure
```
bin/
train-metatensor-model.py
eval-metatensor-model.py
export-metatensor-model.py
src/metatensor/models/
utils/
__init__.py
...
experimental/
bad-model/
worst-model
next-sota??/
actually-its-wrong/
first-model/
__init__.py # empty
model.py # MODEL_CLASS = Something
trainer.py # MODEL_TRAINER = Something
defaults.yml # Default values for hypers
requirements.py
tests/
regression_tests.py
unit_tests.py
second-model/
__init__.py
model.py
trainer.py
defaults.yml
requirements.py
combined-model/
from ..first_model import ModelPart1, ModelPart2
from ..second_model import ModelPart1, ModelPart2
requirements.py # re-export from the two others
```
DEADLINE: Christmas 2023
- Filippo
- Philip
- Matthias
### Second milestone
#### Metatensor-models
Per architecture CI setup (dynamically generate jobs for each architecture in Github Actions)
Look into lightning
Write documentation on what can go in experimental & what's required to move out of experimental