Indexer Ideas

Index.py

This script currently (on a polling loop every ~1 second):

  • queries block table in postgres to find last_processed_block
  • queries blockchain to find latest_block_height
  • iterates through each block height btw last_processed_block and latest_block_height, running process_block() to index each block into postgres
    • writes to: block, tx, msg, msg_event, msg_event_attr

Some ETL process

Keeping "retirements table" as our use case in mind

Want to write to a table that looks like:

Table name: retirements

Required data field	Location in GetTxsEvent
1	retirement date	(Postgresql/block)
2	project id	(Postgresql/msg_event_attr for newer batch_ids, otherwise REST API)
3	batch id	(postgresql/msg_event_attr)
4	credit class id	(postgresql/msg_event_attr)
5	amount retired	(postgresql/msg_event_attr)
6	issuer of the batch (REST API request)
7	batch start date (postgresql/msg_event_attr)
8	batch end date (postgresql/msg_event_attr)
9	retirement location	(postgresql/msg_event_attr)
10	project location (REST API request)
11	retired by (postgresql/msg_event_attr)
12	retirement reason (postgresql, tx.memo or event)

Three approaches:

  1. single ETL batch job that retirements table that sources data from postgres and REST API on every row that is written
  2. Two separate tables
    a. event_retire which is a pure SQL ETL (INSERT INTO SELECT)
    b. retirements which polls new rows in event_retire and joins them with extra data from a REST API to populate retirements table
  3. single retirements table sourcing only from postgres, and all non tx/event data is queried from the client

In any above case, we will need to answer questions like:

  • Manage how much data has been processed from each ETL process
    • setting up a last_processed table, which store "table_name" and last processed block height for each ETL process OR
    • do some joins / queries to figure that information out

Appendix

Example SQL that can help us generate some kind of table:

INSERT INTO SELECT chain_num, block_height, tx_idx, msg_idx, MAX(CASE WHEN key = 'owner' THEN value ELSE null END) as owner, MAX(CASE WHEN key = 'batch_denom' THEN value ELSE null END) as batch_denom, MAX(CASE WHEN key = 'amount' THEN value ELSE null END) as amount, MAX(CASE WHEN key = 'jurisdiction' THEN value ELSE null END) as jurisdiction, MAX(CASE WHEN key = 'reason' THEN value ELSE null END) as reason, (SUM(CASE WHEN key = 'amount' THEN 1 ELSE 0 END) > 1) has_duplicates from msg_event_attr where type like 'regen.ecocredit.v1.EventRetire' AND block_height >= '$START_HEIGHT' AND block_height < '$END_HEIGHT' group by 1,2,3,4