changed 2 years ago
Published Linked with GitHub

Indexer missing events report

The teritori indexer has various chances to miss events since it's inception, here's the explaination of what it does, what happens and what are the ideas for fixing it

The goal

The teritori indexer is an event sourced system that sequentially reads all transactions of the teritori chain from genesis in order to populate a database with aggregated data that can't be properly queried from the chain and smart contracts directly

The naive flow

    1. Choose a chunk_size (max number of blocks queried and processed in one database transaction)
    1. Read or initialize a field in the database that keeps track of the height of the next block to process (height_cursor)
    1. Declare chunk_end = height_cursor + chunk_size
    1. Query the latest_height from a teritori node, if this height is lower than chunk_end set chunk_end = latest_height + 1
    1. Start a database transaction
    1. Use the tendermint tx_search rpc to query a page of transactions contained in blocks in the range [height_cursor, chunk_end[. See https://docs.tendermint.com/v0.34/rpc/#/Info/tx_search and https://docs.tendermint.com/v0.34/app-dev/indexing-transactions.html
    1. Process transactions in the page
    1. Go back to 5) until there is no transaction left in the chunk
    1. Upate the height cursor
    1. Commit the database transaction
    1. Go back to 3) until killed

We use tendermint tx_search because cosmos' GetTxsEvent (tx_search wrapper) is broken in our version of cosmos-sdk. See https://github.com/cosmos/cosmos-sdk/issues/11538

Problems

Too much txs

If we were to query all transactions in the chain the indexer would be too slow to replay the whole chain

Fortunately, the tx_search rpc allows to filter on events so we can request only cosmwasm related transactions and thus greatly reduce the number of transaction to process

Limited queries

While the tx_search call allows some filterting of transaction, it's query language is very poor and does not allow to set multiple filter, so we can't ask for cosmwasm AND aidrop messages, thus we could not implemented the Claim aidrop quest

Load balanced nodes desync

With the naive implementation, if the teritori endpoint is load balanced, nodes behind the load balancer might not all be at the same height
Due to this, there is a race condition between the height query and the tx_search queries when:

  • the indexer queries a first node that is ahead for the latest_height
  • then queries another node for the txs until that height but this node doesn't have these txs yet and will return an empty chunk
  • the height_cursor will be updated to the latest_height and txs will be lost

We fixed this by using a websocket client that ensures we connect to the same node for all queries. A side benefit of using a websocket client is that we reuse the connection

Tendermint tx indexer race condition

There is a race condition if we query the height and txs between the moment that the node has processed a block and the moment that the tendermint tx indexer has finished to index transactions. See https://docs.tendermint.com/v0.34/app-dev/indexing-transactions.html for details about the tendermint tx indexer.

We did not found a proper way to fix this for now, so we decided to wait one block until we query transactions to decrease the chances to fall in the race condition

Ideas

Use the psql tx indexer

Tendermint provides another implementation of the tx indexer that populate a postgresql database so we can do advanced SQL queries instead of using tx_search

This would be the optimal solution as it would grant us a lot more freedom on how we query transactions (and thus implement the Claim airdrop quest for example)

We started a node that uses the psql indexer but it's very slow and is estimated to finish replaying the chain in more than a month

Patch tendermint to be able to query the tx indexer height

This would fix the tendermint tx indexer race condition but not the limited filter problem

Keep a tx hash cursor

We could

  • keep the height and hash of the last transaction processed by the teritori indexer
  • query transactions from this height
  • ignore transactions until the one after the last processed transaction hash

This would fix the tendermint tx indexer race condition but not the limited filter problem

Select a repo