# Indexer Ideas ## Index.py This script currently (on a polling loop every ~1 second): - queries `block` table in postgres to find `last_processed_block` - queries blockchain to find `latest_block_height` - iterates through each block height btw `last_processed_block` and `latest_block_height`, running `process_block()` to index each block into postgres - writes to: `block`, `tx`, `msg`, `msg_event`, `msg_event_attr` ## Some ETL process Keeping "retirements table" as our use case in mind... Want to write to a table that looks like: ``` Table name: retirements Required data field Location in GetTxsEvent 1 retirement date (Postgresql/block) 2 project id (Postgresql/msg_event_attr for newer batch_ids, otherwise REST API) 3 batch id (postgresql/msg_event_attr) 4 credit class id (postgresql/msg_event_attr) 5 amount retired (postgresql/msg_event_attr) 6 issuer of the batch (REST API request) 7 batch start date (postgresql/msg_event_attr) 8 batch end date (postgresql/msg_event_attr) 9 retirement location (postgresql/msg_event_attr) 10 project location (REST API request) 11 retired by (postgresql/msg_event_attr) 12 retirement reason (postgresql, tx.memo or event) ``` Three approaches: 1. single ETL batch job that `retirements` table that sources data from postgres and REST API on every row that is written 2. Two separate tables a. `event_retire` which is a pure SQL ETL (`INSERT INTO SELECT`) b. `retirements` which polls new rows in `event_retire` and joins them with extra data from a REST API to populate `retirements` table 3. single `retirements` table sourcing only from postgres, and all non tx/event data is queried from the client In any above case, we will need to answer questions like: - Manage how much data has been processed from each ETL process - setting up a `last_processed` table, which store "table_name" and last processed block height for each ETL process OR - do some joins / queries to figure that information out ## Appendix Example SQL that can help us generate some kind of table: ```sql= INSERT INTO SELECT chain_num, block_height, tx_idx, msg_idx, MAX(CASE WHEN key = 'owner' THEN value ELSE null END) as owner, MAX(CASE WHEN key = 'batch_denom' THEN value ELSE null END) as batch_denom, MAX(CASE WHEN key = 'amount' THEN value ELSE null END) as amount, MAX(CASE WHEN key = 'jurisdiction' THEN value ELSE null END) as jurisdiction, MAX(CASE WHEN key = 'reason' THEN value ELSE null END) as reason, (SUM(CASE WHEN key = 'amount' THEN 1 ELSE 0 END) > 1) has_duplicates from msg_event_attr where type like 'regen.ecocredit.v1.EventRetire' AND block_height >= '$START_HEIGHT' AND block_height < '$END_HEIGHT' group by 1,2,3,4 ```