# Indexer Ideas
## Index.py
This script currently (on a polling loop every ~1 second):
- queries `block` table in postgres to find `last_processed_block`
- queries blockchain to find `latest_block_height`
- iterates through each block height btw `last_processed_block` and `latest_block_height`, running `process_block()` to index each block into postgres
- writes to: `block`, `tx`, `msg`, `msg_event`, `msg_event_attr`
## Some ETL process
Keeping "retirements table" as our use case in mind...
Want to write to a table that looks like:
```
Table name: retirements
Required data field Location in GetTxsEvent
1 retirement date (Postgresql/block)
2 project id (Postgresql/msg_event_attr for newer batch_ids, otherwise REST API)
3 batch id (postgresql/msg_event_attr)
4 credit class id (postgresql/msg_event_attr)
5 amount retired (postgresql/msg_event_attr)
6 issuer of the batch (REST API request)
7 batch start date (postgresql/msg_event_attr)
8 batch end date (postgresql/msg_event_attr)
9 retirement location (postgresql/msg_event_attr)
10 project location (REST API request)
11 retired by (postgresql/msg_event_attr)
12 retirement reason (postgresql, tx.memo or event)
```
Three approaches:
1. single ETL batch job that `retirements` table that sources data from postgres and REST API on every row that is written
2. Two separate tables
a. `event_retire` which is a pure SQL ETL (`INSERT INTO SELECT`)
b. `retirements` which polls new rows in `event_retire` and joins them with extra data from a REST API to populate `retirements` table
3. single `retirements` table sourcing only from postgres, and all non tx/event data is queried from the client
In any above case, we will need to answer questions like:
- Manage how much data has been processed from each ETL process
- setting up a `last_processed` table, which store "table_name" and last processed block height for each ETL process OR
- do some joins / queries to figure that information out
## Appendix
Example SQL that can help us generate some kind of table:
```sql=
INSERT INTO SELECT
chain_num, block_height, tx_idx, msg_idx,
MAX(CASE WHEN key = 'owner' THEN value ELSE null END) as owner,
MAX(CASE WHEN key = 'batch_denom' THEN value ELSE null END) as batch_denom,
MAX(CASE WHEN key = 'amount' THEN value ELSE null END) as amount,
MAX(CASE WHEN key = 'jurisdiction' THEN value ELSE null END) as jurisdiction,
MAX(CASE WHEN key = 'reason' THEN value ELSE null END) as reason,
(SUM(CASE WHEN key = 'amount' THEN 1 ELSE 0 END) > 1) has_duplicates
from msg_event_attr
where type like 'regen.ecocredit.v1.EventRetire'
AND block_height >= '$START_HEIGHT'
AND block_height < '$END_HEIGHT'
group by 1,2,3,4
```