GROMACS public API (preliminary discussions)

# GROMACS public API (preliminary discussions) Sebastian, Joe, and Eric to meet 7 June: https://kth-se.zoom.us/j/64841600243 Joe coordinating follow up to * https://docs.google.com/presentation/d/1jDezKC2FbNm_FGHh3cXo2jF8-W843K8wZO1IbiiBQLc/edit?usp=sharing * open questions: what are we building and how, who are we building it for, and who is "we"? * follow up via email. ## Notes Review from Joe: * there is some idea that things should be more decentralized * don't know to what extent things are different versus just that there is a change in rhetoric. Sebastian's timeline: * July: start coding for enhanced sampling in GROMACs with a REST2 extension * #4431 * deliverables targeting beta release, with continued work through the end of the year * September through December: * effort available for collaborative API work * starting next year (January): * high level enhanced sampling stuff #4432 * low level resource and input/output/state management Joe: * out June 23 until August 14 * fall: LUMI best practice guide * porting strategy * NBLib port to AMD GPUs Other internal stuff: * Mark and Erik _may_ be working on things that might motivate interfaces in some aspects Joe: "Two criteria for removal: no longer maintained or in the way." Theoretically, some decision-making has been delegated, but we're not sure how that will pan out. We can try to announce some concrete plans and see what happens. ### User feedback * most support will be for API features that allow access to parts of GROMACS that are not accessible another way. * more access to force calculation would be appreciated, but the ability to read/gather calculated forces is not easily achievable in the near term (either refinement of ModularSimulator external access, or some other internal "role" would need to be established in the library.) ### Lowest-hanging fruit Handles to simulation/trajectory "state" for input, output, snapshot enable a lot of methods: MC/MD, exchange, trajectory forking, etc. How do we deliver this in a way that is sustainable and accepted? ## Prep MEI requested agenda items: * arrange next meeting/follow up (2 min) * overview: What do the statements from Erik and Paul mean? What has changed on the Stockholm side (if anything)? (5 min) * What is the scope of proposed work? (15 min) * Who has applicable funded deliverables and what are they? (What are the constraints on available effort?) (5 min) * Logistically, what collaboration tools will we use in the current phase? What sort of tracking and reporting can/should/may we use? (5 min) * Can we identify some targets / milestones? What are our metrics for success? (15 min) ## To do Clarify * What is different between this and previous efforts/products (libgromacs, libgmxapi, libnblib), if anything? * stakeholders (and funding) * participants * attribution, licensing, branding(?)(, continuity?) * scope * functional goals * technical constraints * process * collaboration * design * approval * development * review * testing * release * maintenance * **continuity** * timeline ## Stakeholders * contributors of developer time * KTH * PDC * UVa * funders * NIH [R01GM115790](https://reporter.nih.gov/search/bEtVcD7duU6iwYAKDXQMiw/project-details/10145987) (UVa, possible renewal) * NSF [1835780](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1835780) (UVa, possible renewal) * Sebastian's funding? * Cathrine's funding? * Joe's funding? * BioExcel? * others? * GROMACS core developers * Traditional GROMACS users (uninterested in tradeoffs of expected CLI syntax/behavior/performance) * Traditional "GROMACS legacy" API users * external contributing developers / collaborators * Kasson Lab * Colvars? * Mimic? * QM/MM people? * research users (authors of scientific software not intended for distribution) * third-party authors of dependent software projects * users of third-party GROMACS clients / extensions ## Participants Who do we expect to be contributing * design effort? * design review and approval? * software development? * code review? * early use? * infrastructure maintenance? * long term software maintenance? ## Scope - [ ] Replace/subsume/extend NB-Lib - [ ] Replace/subsume/extend libgmxapi - [ ] Replace/subsume/extend libgromacs - [ ] Establish one or more new binary libraries - [ ] Introduce new tools / technologies (versus) - [ ] Repackage the library functionality for external access (to allow introduction / application of new tools / technologies) - [ ] Update versus supplement library implementation - [ ] functional frameworks like TAF or MDModules - [ ] development frameworks like Options, Serializer, Notifier. i.e. expressing component interfaces, plus UI helpers - [ ] file I/O - [ ] stdio (and logging, etc) - [ ] program control flow - [ ] various abort and hard exit conditions - [ ] global variables; resource initialization/finalization - [ ] reentrancy; reusability - [ ] New module(s) - [ ] New namespace(s), binary targets, header paths ## Functional goals Possible high level functional goals---which may be accepted, rejected, or more granularly scoped---will guide further planning. - [ ] Be able to implement `gmx` command line entry point. (and subcommands/tools?) - [ ] Be able to implement `MDModule` or other library interfaces. - [ ] Be able to implement `mdrun` (`gmx::MdRunner::mdrunner()`) - [ ] Be able to implement `do_force()` - [ ] Extend MD integration loop (e.g. gmxapi pluggable restraints) - [ ] Provide the C++ interface against which Python package is written - [ ] Be able to manage multiple simulations from a client process (library is reentrant, stateless, or has a _complete_ specification for managing program/resource state), and/or MPI-linked array of processes(?) (resource-sharing API) - [ ] Define a unified representation (or protocol) for simulation input. - [ ] Define a unified representation (or protocol) for simulation results. - [ ] Allow a handle for an optimizable snapshot of simulator state. ## Other goals - [ ] Allow sustainable software to be built against installed GROMACS. - [ ] Allow sustainable software to rely on forking GROMACS. - [ ] Recruit core developers. - [ ] Recruit a more extensive and active community of library users and contributors. - [ ] Find sustainable software enhancement processes. - [ ] Provide effective documentation for the right target audience. ## Technical constraints - [ ] zero-copy data exchange - [ ] (we **must** acommodate distributed data at several scales, but we *should*) *facilitate* data locality optimization - [ ] facilitate task-based parallelism ### Client build support How much support do we want to provide or not provide for client software building? (This could affect the scope of supported architectures/environments, and tightness-of-coupling to tool chains.) For comparison, take a look at these documentation entry points. * https://grpc.io/docs/languages/cpp/quickstart/ starts with build system and installation details * https://zeromq.org/get-started/ notes the distinction between the core library and its low-level API, vs. various choices of tool kit implementing the high level API specification. * https://zguide.zeromq.org/docs/chapter1/ * https://github.com/zeromq/libzmq/tree/master/doc -> http://api.zeromq.org * https://pybind11.readthedocs.io/ * [Installing the library](https://pybind11.readthedocs.io/en/stable/installing.html) * [Build Systems](https://pybind11.readthedocs.io/en/stable/compiling.html) * https://abseil.io/docs/cpp/ ## Process: Collaboration GitLab branch? fork? separate subproject? Kan-ban board? Scrum? Wiki? Issue tracking? Google docs? HackMD? Slack? How Agile do we want to be? Are there constraints on the openness of the project or the scope of recruitment or outreach? ## Lessons learned, or priciples to be applied ### NB-lib insights? ### Suggestions from Eric Irrgang - [ ] the installed headers should represent the specification, which should be minimal, but complete - [ ] fully specified types - [ ] heavily focus on free functions to establish a stable specification quickly - [ ] No external dependencies _in the interface_ without a strong ABI guarantee. (see footnote) - [ ] Allow template header interface to bridge between a more stable high level interface and a less stable low level interface, where necessary for forward motion or for user convenience. - [ ] C linkage - [ ] default hidden symbols - [ ] _start_ with the API documentation, and go from there. - [ ] Use a feature-based versioning API (with monotonic feature versions) until semantic versioning makes sense. - [ ] Aim for a quarterly "release" schedule to force reevaluation of milestones, to provide a regular schedule for synchronizing project tracking information, and to remind us to document progress - [ ] Regression test from day 1. #### footnote on dependencies in the interface 1. Headers for interacting with the installed library should not support cases in which a symbol definied outside of those headers might be referenced by binaries that have different definitions for that symbol. 2. Especially with external namespaces (e.g. `::std`), we need to be careful about aliases that might translate to different ABIs in the library's and client's (or other client dependency's) tool chain. In other words: * respect the One Definition Rule, * avoid brittle assumptions about the ABI that will be produced. Note that this is easy enough to achieve. * Some potential dependencies _do_ have strong ABI specifications. * Dependencies that don't can be wrapped in types defined in our headers. * Objects that are created, dereferenced, and destroyed in a single binary are generally out of scope of the above admonitions. * example: a header-only layer (or tool kit compiled into client code) * example: opaque types accessible only to the client, only to the library, or only to some other API, such as in a class member type or a template instantiation that is used on only one side of the API boundary. * example: private members that we know will be directly accessed and deallocated in the same binary where they were allocated * example: temporary objects that are not passed directly across library boundaries Examples of problematic scenarios: * `MPI_Comm` is almost always an alias. What it aliases can be completely incompatible across MPI implementations. * Much of the (top level of the) `::std` namespace is actually aliases. The aliases are different in different standard library implementations (causing linking errors). In the worst cases, even the actual symbol may have a different ABI with two tool chains or set of build options on the same machine. Example: If the gmxapi Python package is built against GROMACS built with one tool chain, but the `libgmxapi.so` dynamically loaded at run time was built with a different tool chain, you get something like this: ``` >>> import gmxapi._gmxapi Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: dlopen(/.../gmxapi/_gmxapi.so, 0x0002): Symbol not found: __ZN12gmxapicompat11readTprFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE Referenced from: /.../gmxapi/_gmxapi.so Expected in: /path/to/gromacs/lib/libgmxapi_mpi_d.0.3.1.dylib ``` where ``` $ c++filt __ZN12gmxapicompat11readTprFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ``` gives ``` gmxapicompat::readTprFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ``` We can potentially deal with this (in a somewhat rigid way) by trying to have rigorously matched tool chains for client software. But we can't do anything about other dependencies the client may have, which may leak definitions for `::std` (or other libraries) into the process space. It's just easier if we don't create that problem for ourselves or our users. * Don't leave ourselves vulnerable to namespace polution by other client dependencies. * Don't leak symbols that aren't part of our public API specification (hidden symbols). * Don't use binary interfaces we can't trust. * Don't try to pass objects across binary interfaces unless we control their definitions and know how they will be used (where they will be dereferenced, where they will be destroyed and deallocated) *update (mei)* I found an old email conversation with Roland Schultz, who considered that it would be too inconvenient to keep `::std` out of the API entirely. He reasons: > I don't think it is a problem with the compiler but that you have to make sure that the same standard library is used and with the same setting (in particular _GLIBCXX_USE_CXX11_ABI). You can either provide for the user cmake scripts to make providing the same standard library flags as GROMACS easy. Or you could recommend using the same compiler (not because you need the same compiler but then picking the same std library is easier). Or you can recommend them to use something like conan which has that automated. However, this does not seem to address the case in which a client package has dependencies on both gromacs and some other C++ based library, and both are already provided (built) in the user's computing environment.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.