# AtomsBase.jl 0.4.x Revision Proposal These notes summarize discussions with several people about a potential revision of `AtomsBase.jl` that may become v0.4.x. We are now collecting community feedback before making a PR to implement those revisions. ## Short Summary - Rework boundary conditions interface, introduce a cell type, remove all usage of infinities - Introduce a "chemical element" / "atom id" type that is `isbits` and therefore computationally efficient but sufficiently flexible for most use cases. - Slim down the core interface, in particular make dynamic getters such as `system[:position, i]` part of an extended interface. - Introduce setters as a second extension to the interface. - Move the reference implementations to an unexported submodule. #### Note on the reference implementation There is a tension between providing an interface and a reference implementation. If the reference implementation is exposed then it will be taken as the *de facto* standard. Shortcomings in that implementation reflect poorly on the `JuliaMolSim` ecosystem. There seems to be general agreement to move the reference implementation into a submodule (and not exported) or to a different package (e.g. `AtomsBuilder.jl` or entirely separate). However, an exposed or recommended reference implementation would strengthen the eco-system. This point need not be settled in the current revision but hearing opinions on it would be welcome. ## Acronyms, Notation - AB = AtomsBase.jl - AC = AtomsCalculators.jl - `sys` is always an `AbstractSystem` - `x` is always a particle of unspecified type - `LT` is a unitful float type (length type) - `MT` is a unitful float type (mass type) - `D` is a dimension in which particles are embedded (usually `D = 3`) ## Minimal (Read-Only) Interface A type or package can be said to implement the `AtomsBase.jl` interface if it provides the following minimal functionality. ```julia # access system properties # (see extended interface for use of cell) length(sys)::Integer bounding_box(sys)::NTuple{D, SVector{D, LT}} periodicity(sys)::NTuple{D, Bool} # access particle properties # i may be Integer, AbstractVector{<: Integer}, Colon position(sys, i)::SVector{D, LT} particle_id(sys, i) # type of particle, e.g. ChemicalElement mass(sys, i)::MT # atomic_mass = mass becomes an alias # atomic_number, chemical_symbol, etc can be derived # global particle properties positions(sys) particle_ids(sys) masses(sys) ``` #### Notes - This is sufficient to provide all information required to perform a large number (but not all!) of atomistic simulations we envision. A simulation code can simply read the information into whatever format it wants to use internally. - Replacing `position(sys)` with `positions(sys)` etc is suggested due to readability; `position(sys)` suggests that a single position is read. - I do not think that bounding-box and periodicity should have arithmetic defined on them, hence tuples and not vectors. This could be revisited. - `isinfinite` is removed since a single boolean is dangerous. Every molecular simulation code I know simply uses `pbc = (T/F, T/F, T/F)` i.e. specifies periodic/infinite in each direction. It could be re-introduced via `!any(periodicity(sys))` but ideally to be deprecated. - The rationale to replace `atomic_mass` with `mass` is the desire to keep the interface general. If used for other particles or coarse-grained atoms then `atomic_mass` becomes the wrong concept. - Replacing `atomic_number` etc with `particle_id` does two things: (i) generalizes; (ii) indicates that a categorical variable is returned. Then a `ChemicalElement` is returned then `atomic_number`, `atomic_symbol`, `chemical_element` are defined for that type and can be used instead, e.g., ```julia atomic_number(sys, i) = atomic_number(particle_id(sys, i)) ``` - Question: should `velocity` and `momentum` be added to this minimal interface, or should it be part of an extended interface (see section at the end) ## Cell We propose to introduce a computational cell interface and provide one or two reference implementations: ```julia struct PCell # ... cell vectors and PBC flags end # a cell that is infinite in all directions struct OpenSystem end ``` The proposed interface is simply ```julia cell = get_cell(sys) set_cell!(sys, cell) ``` An `AbstractSystem{D}` with a cell `cell` should be a subtype of ```julia SystemWithCell{D} <: AbstractSystem{D} ``` Defaults for `bounding_box` and `periodicity` can be provided, e.g., ```julia bounding_box(sys::SystemWithCell) = bounding_box(get_cell(sys)) periodicity(sys::SystemWithCell) = periodicity(get_cell(sys)) ``` ## `ChemicalElement` and `Atom` We propose to define (some variants of) ``` struct ChemicalElement Z::UInt16 # atomic number N::UInt16 # number of protons (isotopes) id::UInt32 # other flexible info end struct Atom{D, T} id::ChemicalElement position::SVector{D, T} m::T end ``` Usage of those types would be opt-in but recommended. These are both `isbits`, provide the information needed for many use-cases, and can be used to share information across packages in a unified way. The current `Atom` type with `Dict` is too flexible for most use-cases and very inefficient. ## Extended Interface: Setters For a variety of use-cases a setter interface will be useful, e.g., in `GeometryOptimization`, `AtomsBuilder`, ... For each system property `prop` there should be ```julia set_prop!(sys, p) ``` And for each particle property `prop`, e.g. `prop = position` there should be ```julia set_position!(sys, i, r) set_position!(sys, inds, Rs) set_positions!(sys, Rs) ``` Like all aspects of the interface, the setter interface is opt-in. #### Notes There was also a proposal to have mutating options, e.g., ```julia new_sys = set_position(sys, i, r) new_sys = set_position(sys, :, Ps) new_sys = set_positions(sys, Ps) ``` In principle this could be considered, but it would create (in my view) too much additional complexity. Instead, we could provide a recommended convention for constructors, building new systems from existing ones, e.g., ```julia NewSystem(oldsystem; positions = Ps) ``` I believe something like this is already implemented but should be looked at carefully and could be reviewed and clearly documented (in what way is this part of the interface?) during the PR. ## Extended Interface: Flexible/Dynamic Getters and Setters This is the least clear aspect of the interface, and some discussion and evolution about it may be unavoidable. The current AB interface suggests alternative accessors via symbols. The recommendation for the next version is that this part of the interface becomes more of an implementation detail but that we will document it regardless to aim for some uniformity across implementations. Suppose that a system `sys` has a property `:prop` (e.g. particle properties such as `:position` or system properties such as `:periodicity`) then ```julia sys[:prop] # for system properties sys[:prop, i] # for particle properties sys[:prop, inds] # for particle proprties sys[:prop] = ... sys[:prop, i] = ... ``` allows access to this property. To provide a list of all accessible properties one can use ```julia keys(sys) # return list of symbols / keys has_key(sys, :prop) ``` More fine-grained control can be provided via ```julia system_keys(sys) particle_keys(sys) ``` For example `periodicity` is a system property, while `position` is a particle property. #### Note on use-cases The main use-case for this extended interface is to enable additional data to be stored without needing to extend the interface. This can e.g. be useful when structures are read from or written to files, to store training data for ML models in datasets, or to store meta-data about a simulation inside a structure. ## Future Extensions In a future PR I suggest that we reserve a longer list of standard properties, such as - `position` - `velocity` - `momentum` - `mass` - `charge` - `charge_dipole` - `energy` - `spin` - ... Then for each such property we provide prototypes for analogous getters and setters. The documentation should also provide clear specification of the expected behaviour e.g. output types, units, etc. It should moreover be clearly documented what functions are considered "core" interface and extended interface. I do not recommend including this in the next update. In [`DecoratedParticles.jl`](https://github.com/ACEsuit/DecoratedParticles.jl) (DP) I am experimenting with an efficient implementation of such a general interface. To add new properties, one simply "registers" them with the package and the accessors get generated automatically.