Feasibility of a Block-Fidelity Filecoin Digital Twin

# Feasibility of a Block-Fidelity Filecoin Digital Twin ###### tags: `Digital Twin` ## Executive Summary Refactoring the current daily-fidelity Digital Twin (dfDT) into a block-fidelity DT (bfDT) is possible but not necessarily feasible, as the resource commitment over a period of at least one quarter is estimated to be significant. ## Motivation The primary goal of creating a bfDT is to surface a closer representation of actual Filecoin ecosystem dynamics by the DT, incorporating block-level state variables and their associated changes, propagation mechanisms, and dependencies. As a motivating example, it was observed in the development of the Dynamic Batch Balancer that a dfDT could not accurately represent the Base Fee updating algorithm at daily fidelity, as the Base Fee is updated block-by-block. Instead, a historical representation was utilized in order to aggregate up to daily fidelity, introducing a source of uncertainty in the dfDT extrapolation that would have been largely mitigated with a bfDT. Providing a bfDT would allow the Base Fee to be updated precisely as in the actual ecosystem, capturing the intra-day volatility of Base Fee movements (as it moves often in factors of 100 to 1000 times over a matter of hours). Volatility capture is important because it directly impacts the volume of FIL transaction, and can be used to calibrate a bfDT more effectively than is possible at daily fidelity. ## Challenges The primary challenge in the _development_ of the bfDT will be in addressing the acquisition and ingestion of block-level data into the digital twin. This will require focus on refactoring existing data, adding new block-level data, and providing a data infrastructure capable of processing refactored and new data. Moreover, the digital twin's model will need to be updated, in order to provide scope for block-level dynamics (cf. the Base Fee discussion above). The primary challenge in the _implementation and execution_ of the bfDT will reside in ensuring that the updated model reflects what is intended via an updated testing path, and that the execution infrastructure is capable of handling the increased fidelity of a bfDT. At the very least, moving to a bfDT increases the number of timesteps by a factor of 2,880 (a daily data interval vs. a 30-second block interval), and hardware requirements will need be adjusted accordingly if timely simulations are to provide desired insights. Innovations such as platform-agnostic digital twin containerization can also be developed and incorporated into the execution infrastructure, to help stakeholders access and run the bfDT while at the same time providing a verifiable execution workflow. ## Project Overview The following section provides the proposed work streams, broken down into: 1. *summary* of the work stream; 2. *difficulty*, i.e. technical and contextual expertise required; 3. *time required*, i.e. the bandwidth commitment required; 4. *necessity*, i.e. the immediacy of the required work stream in the creation of a viable bfDT. For the latter three categories, qualitative rankings of "Low", "Medium" and "High" are used--these should be read as relative to the resources to be allocated to the work stream, e.g. a "High" ranking for difficulty and time required are correlated with a high investment of existing resource capacity to that stream. It is not known at present how to translate qualitative rankings into exact person-hours of work---but it is expected that a "Low" time and complexity would allow for the existing BlockScience resources to engage in other tasks concurrently, while a "High" time and complexity would, as noted above, likely involve near-or-at capacity resource investment until task completion. Moreover, it will be noted that most of the work streams have both "time required" and "necessity" as "High", indicating that this project has a trajectory with a large resource investment into components that are required for a bfDT to achieve a quality threshold commensurate with the current dfDT. Proposed intermediate steps taking place along this trajectory would include: 1. **Proof of Concept (PoC)**: A block level implementation with a selection of a few policies and behaviors, with no expectation of matching historical data with a high level of fidelity. The goal would be to ensure that model changes are internally consistent, and testing would involve e.g. run speeds and memory constraints to understand scale restrictions on e.g. hardware requirements. Data ingestion would be limited to flat files (e.g. CSV) and data pre-saved rather than pulled 'live' from the ecosystem. 2. **Minimum Viable Product (MVP)**: Following the PoC, the conversion of all current simulation model policies and behavior to block level versions would be performed. The focus would then be on ensuring that historical and backtested results are of a high fidelity at the block level. Contemporaneous development of the required data and hardware infrastructures would be performed to ensure this high fidelity system can be assessed. 3. **Final bfDT**: A fully integrated bfDT would extend the MVP to provide extrapolation measures in addition to backtesting results, allowing simulation scenarios and 'what-if?' questions to be addressed. The data and hardware infrastructures would be optimized and tuned to the requirements for extrapolation, and the possible incorporation of live data would extend the functionality of the bfDT to a true "operational digital twin" capable of being run side-by-side with the Filecoin ecosystem. ## Proposed Work Streams ### *Data* #### Data refactoring - **Summary**: Current data pulls are based upon daily data. Specifically, they're aggregated from the block-level data through summations and averages. For a bfDT, the data would need to be refactored to be at _block_ (epoch-by-epoch) level. - **Difficulty**: Low - **Time required**: Low - **Necessity**: High #### Data acquisition - **Summary**: To get the actual high level of fidelity, there is really detailed data that is needed. Specifically, keeping track of the Collaterals and the Fees are the pieces on which having epoch-level data represents the biggest gain in terms of representation accuracy. Keeping track of the Sector States would also be benefited from epoch-level data. This data would be extremely computationally expensive, a lot of it would be about getting to sector level actions and parsing through and this data is of the order of hundreds of TB. Sacrificing this probably means that the high fidelity goes with it. - **Difficulty**: High - **Time required**: High - **Necessity**: High #### Data infrastructure - **Summary**: The data infrastructure runs all of the processing and curation of the data for a DT. Local storage will not be enough to hold the data needed for a bfDT. In addition, previous experience indicates that 3rd-party data sources (e.g. Sentinel) have had data pruning issues. Operational costs can be a consideration: exact figures are unknown, with lower estimates revolving around 250 USD/mo. - **Difficulty**: Medium - **Time required**: High - **Necessity**: High ### *Model* - **Summary**: At block fidelity, there will be a rewiring that would need to be performed at every cadCAD substep. Additional substeps would also likely be required at block level. Compute speed is dependent upon the number of substeps and hardware requirements will need to be specified once the model is rewired. - **Difficulty**: Medium - **Time required**: High - **Necessity**: High ### *Implementation* #### Testing - **Summary**: Testing is a requirement for any significant refactoring project, particularly as changing from a dfDT to a bfDT is a novel challenge. Integration testing is a priority here. - **Difficulty**: Low - **Time required**: Medium - **Necessity**: High #### Docker integration - **Summary**: Containerizing cadCAD in e.g. Docker provides stakeholder platform-agnostic operability in addition to a stable environment for e.g. 3rd-party library requirements and integration testing (cf. above). - **Difficulty**: Low - **Time required**: Medium - **Necessity**: Low