GMX Data Pipeline

# Integration Engineering Exercise: GMX Data Pipeline ### What are we given? 1. hash (= tx-hash) 2. block_number 3. block_timestamp 4. log_index 5. hash_log_index (=hash||'\_' \|| log_index) 6. address (=GMX Vault CA "0x489ee077994B6658eAfA855C308275EAd8097C4A") 7. protocol (="GMX") 8. contract (="Vault") 9. event_name (UpdatePosition, IncreasePosition, LiquidatePosition, DecreasePosition, ClosePosition) 10. args: -> case = UpdatePosition: key, size, collateral, realisedPnl, averagePrice, reserveAmount, entryFundingRate -> case = IncreasePosition: fee, key, price, isLong, account, sizeDelta, indexToken, collateralDelta, collateralToken -> case = LiquidatePosition: key, size, isLong, account, markPrice, collateral, indexToken, realisedPnl, reserveAmount, collateralToken -> case = DecreasePosition: fee, key, price, isLong, account, sizeDelta, indexToken, collateralDelta, collateralToken -> case = ClosePosition: key, size, collateral, realisedPnl, averagePrice, reserveAmount, entryFundingRate Remarks: Positions are identified by their keys but we don't necesarilly generate a new key for every new position. As an example if an USD collaterlised ETH long of account X is liquidated the new position that it will open will have the same key, therefore we cannot use just keys to identify the trades effectivly. Additionally, an account cannot have more than one position that match in every 4 aspects mentioned. The new position will be regarded as the change to the existing position. Every account has at most $|\text{collateral Tokens}|\cdot |\text{Index Tokens}|\cdot 2$ many keys.![Screenshot 2023-12-21 at 00.01.50](https://hackmd.io/_uploads/BkavGxWwa.png) Size/size delta refers to (change in) notional size. Collateral refers to value of the collateral in USD. RealisedPnl refers to pnl in USD. Average price is the weighted entry/exit price. ReserveAmount corresponds to available amount in tokens taht can be used for the positions. EntryFundingRate is the funding rate when the postion is updated/opened. Fee refers to the fee payed for the position. Price corresponds to the market price when the order got executed. IsLong is a simple boolean that implies the direction. Account refers to the address that took the trade. IndexToken is the address of the token that is traded. CollateralDelta refers to the change in collateral in USD. CollateralToken is the address of the token that is used as collateral. MarkPrice is the price at which the liquidation took place. ### How to determine trades based on event logs? There are multiple events in the logs that refers to the same transactions and the same "position update". ![Screenshot 2023-12-20 at 22.57.36](https://hackmd.io/_uploads/SJR87JWDa.png) As an example for the hash 0x70ea1e4178ae897aaee2d9190be0375f03a08279ab391020a12d61b8e0eb0ada we have two events in the log. One being simple IncreasePosition and the other, followed by the IncreasePosition: UpdatePosition which updates the parameters such as average price and pnl based on the change in the position. ClosePositions follow DecreasePosition where size of ClosePosition = sizeDelta of DecreasePosition. ![Screenshot 2023-12-20 at 23.42.00](https://hackmd.io/_uploads/HJB66ybPT.png) The important observation that we will use is the fact that number of keys per account is strictly bounded. Here is how we can build each trader profile as shown on perps.ai and additionally implement efficent way to process new events: 1. We sort the SQL queries the following way to make it easier to differentiate between trades: `select args->>'key' as key, event_name, block_timestamp from protocols.arbi_gmx_r order by args->>'key', block_timestamp;`![Screenshot 2023-12-21 at 01.06.13](https://hackmd.io/_uploads/Hy7KWWbDp.png) 2. This form makes our job way easier because events that constitutes a trade are now sequential and each consecutive row is related. 3. Our job is to find the indices set $I$ such that $\forall i:i\in I$ denotes the start event (row) of a trade. As long as we have the starts, we also have the events that ends the trades, they are simply the events with the same key as corresponding start events and the last events before the start event. 4. Now assume we are analyzing events of each key independently in order. As mentioned, ClosePositions always comes right after DecreasePositions, and sizeDelta and size in argument match. LiquidatePositions are independent and we can identify by the key in the argument which position the liquidation corresponds to. Therefore a position for a key becomes active when we have IncreasePosition followed by UpdatePosition. A position closes for a key when we have DecreasePosition followed by ClosePosition, or when we have LiquidatePosition. Also, when we have DecreasePosition such that sizeDelta $\approx$ size without being followed by ClosePosition, we can still interpret that as position being closed. This way we can find determine $I$ by simple loop over the events. 5. The main issue with the order described in part 1 is that the trade_no is no longer obvious but fortunatly as long as we have $I$ we can simply sort the events in $I$ by args->>'account' and block_timestamp and then we can simply determine trade_no for each trade by simply setting it to 0 and incrementing for each elemnt of the same account and setting to 0 whenever we start operating on events of different accounts. Volume can be determined by summing up the sizeDelta of the events in between the rows $\{r_{i},r_{i+1}\}\subset I$. last_size_token is set to $0$ for trades that are inactive, else it is simply the fraction between size/average_price that is reported by the last UpdatePosition event. last_size_usd is the size reported by the last UpdatePosition event. collateral is the collateral reported by the last UpdatePosition event. leverage is size/collateral reported by the last UpdatePosition event. realised_pnl is realisedPnl reported by the last UpdatePosition event. pct_profit is realisedPnl/collateral reported by the last UpdatePosition event. total_fees can be determined by summing up the fee of the events in between the rows $\{r_{i},r_{i+1}\}\subset I$. max_collateral is max collateral reported by UpdatePosition for each interval in $I$. max_size is max size reported by UpdatePosition for each interval in $I$. open_blocks are simply $\forall i\in I:\text{block_number}(i)$. As described last events can be determined easily after determining start events thus close_block can be determined in a similar way to open_block if the position is closed else it is undefined. open_ts and close_ts can be determined once again similar to close_ts. entry_price is the price reported by the first UpdatePosition in each interval. close_price is the price reported by the last non-UpdatePosition event in interval. last_avg_price is reported by the last UpdatePosition event in an interval. liquidation_mark_price is defined if position is liquidated and can be extracted from the mark_price field of the LiquidatePosition event. is_open is false if we detect position closing the way we described in 4. last_update_ts/last_update_block is ts/block_number of last event in an intervall. market can be determined by a custom mapping for each index_token. is_long is simply can be found by checking the args->>'isLong' field of any even in an intervall. All events in intervall have the same isLong field since keys determine intervalls and keys depends on isLong fields. #### The overall complexity to setup the database based on past events is $O(n\log n)$ where $n$ is the number of events. ### Updating DB Based On New Event Logs When a new event pops up, we check the address, if the address is already in the DB, we can determine which account, direction, collateralToken, indexToken the key of event correpsonds to and we can directly decide if it implies a new trade, an end of a trade or change of a trade. If it is a new trade or end of a trade, the steps are trivial, else it is either Increase-/DecreasePosition followed by UpdatePosition. We then need to update the fields in our own DB using the fields (+args) of UpdatePosition. If the address is unique, it tirvially implies new trade and we follow the steps described in pervious points. This way we can update our database in $O(\log n)$ times whenever a new event pops up. If we implement the trigger the right way, we can simply insert the past events sequentialy to our database without having done any preprocessing as explained in the previous part. The runtime complexity would be once again $O(n\log n)$ but the implemantation would be considerably easier.