May 18th Filecoin Indexing

# May 18th Filecoin Indexing ## Agenda 1. Any questions/discussion for schema? 2. Review Milestone 2 progress ## Progress 1. Database abstraction (moreso part of Milestone 3) 2. And partial implementation for Postgres (pgx and sqlx drivers) 3. And partial implementation for CSV output 2. Extractor interfaces 3. Extractor implementations * Historical - done * Live/head-tracking - in-progress ## TODO * Finish live/head-tracking extractor implementation * Snapshotting service * Finish testings and documentation * Figure out how best to extract internal HAMT nodes (HAMT nodes that contain a link to another node and not a value/bucket entry) * state.Tree expoxed by Lotus only provides an iteration interface for leaf nodes (`tree.ForEach(func(addr address.Address, act *types.Actor) error{}`) * Same for adt.Map implementation: (`ForEach(ctx context.Context, f func(k string, val *cbg.Deferred) error)`) * Possible to address this without changing tree interface/new tree implementation by backing the adt.Map with an adt.Store that uses our ipld.blocks table as its backing blockstore * But this breaks from the extractor interface used everywhere else... ## Extractor interface The goal is to use go-ipld-prime to develop an extensible system for indexing *any* ipld.Node. Using the ipld.Node `MapIterator` interface, we can traverse all fields in an ipld.Node, automatically extracting the field names and values and indexing them into the database. To do so we need to map the field names to column names, this mapping will not always be one-to-one and this adaption is managed by the database abstraction. This has admittedly added significant complexity for this "one-off" implementation, but the goal is that this can be reused as we go on to index other chains as IPLD (vulcanize/laconic's internal intent). Another goal of this interface is for the extractor for each type (table) to function independently, such that we can compose together and run different subsets of the suite of extractor-indexers. This adds some additional complexity and overhead to each individual extractor as each extractor must return complete data models for each table, it can not rely on the calling code to fill in missing data/associations. For example, when extracting Receipts for a range of epochs. We not only need to return the receipt ipld.Nodes themselves, but also their related "block_cid" and "message_cid" (cannot be derived from the receipt itself and would be very expensive for the calling code to rediscover the association- would basically have to repeat the lookup process that the extractor code already performed). ## Notes Look into https://github.com/filecoin-project/go-hamt-ipld/pull/103 We will need a concurrent iterator, look into adding the required interface ontop of the above