digital-twin-working-group
We have quite a few digital twins at BlockScience and Danilo pionered the format of the cadCAD version of this. The steps of the execution logic are:
It may be very helpful in the long run to come to a consensus on the format of digital twins. As well, there may be a good way to integrate it within cadCAD or cadCAD tools for ease of use with other users. Standardized digital twins make other people's learning curve lowered.
As well, a standardized model can make it much easier to build out things like the integration testing which Sean/Emanuel are looking to work on for the filecoin digital twin.
Summary: Defining a general best practice type of data hierarchy can lead to cleaner code and support easier use of both documentation + testing.
Using the Filecoin Digital Twin as an example, the hierachy used there was the following:
This might seem excessive but by doing it like this we can document each part of the pipeline and as well build out tests which confirm whether our data is incorrect because of an error with the data pull (a field could be wrong), the processing of the data (maybe dropping null values causes certain data to be ejected that should not be), or at the aggregation level (if you do a left join but one dataset is missing dates then it will drop out the data from the other query).
Summary: Defining a set of jupyter notebook documentation guides to build with examples/templates would lead to a known expectation of what to build. A tenative set of guides would be: data, PSUBs, and inputs/parameters. Every digital twin would get these as templates to be built out as the model is iterated on.
This can be better defined after intial conversations but examples for data/PSUBs can be found within the digital twin of filecoin. These documentations would be live in the sense that it shows descriptions of what is going on, produces the source code for functionality using inspect and then has live examples of using each piece so that a user can see how the underlying functionality works.
Summary: An object-oriented approach could allow for clear designation of what functionality should be passed into the model. The caveat is it might reduce flexibility.
The idea behind a class for the digital twin would be that either an instance could be created or another class could inherit from the main class and then functionality would be filled in/different pieces of the digital twin could be tested individually.
For example, one input to this might be the data function(s) that pull in the data for the model. Another input could be setting the mapping of notebooks to build into html. With that kind of set up, the components would be self-contained and then the execute function of the class would be calling steps 1-6 which also could easily be called individually when working through different pieces of the model.
Summary: Defining a set of tests we want built for models and then the process to set up integration testing (testing which will only approve branches that pass tests) can be beneficial to avoid breaking of models.
There is active work being done on this by Sean/Emanuel in regards to building out integration testing for the filecoin digital twin which they can speak to. The basic idea is to have data functionality tests on both data pulls and data processing as well as tests for mechanisms used in the PSUBs.