IDKit Statistics - Under the hood

# IDKit Statistics - Under the hood ## Intro This is a comprehensive guide describing how the statistics of the application works, how the pieces fit together, and the concepts behind it to protect future devs from breaking stuff. #### The big picture When a Transaction is being closed (ended, completed or expired), we trigger a number of wind down mechanisms, `Stats/Compute` being one of them, which will create and store analytics data for that transaction in the `Stats/Buffer` table. The statistics from a transaction and the transaction are stored separately so they can live independently from each other. In the event that transactions are deleted, we will still have the analytics data associated with them. It is also important to note here that a statistics record has no tie/connection/reference to the transaction from which it was computed ––at the exception of being under `transaction` resolution. (Will be explained further below) Following that, we have the `Stats/Downsample` which is scheduled to run every 1min. Downsampling is the process by which we aggregate the statistics data by period to reduce the overall size of the data. Once downsampled, we store them in the `Stats/Main` table. ###### For example: > If 3 transactions were closed today, *Compute* will create 3 records that all have various data. However, when running analytics reports for the week, we do not need to know that it's from 3 different records. Instead, we will add up the data from all 3 and store them as "the statistics of that Tuesday" into a single record. (This is explained further in the *Downsample* section below.) The aggregated data is what we consider the final form of the analytics data, and that which the `Stats/Query` lambda relies on to show analytics data to the IDKit frontend. #### Where Rebuild comes in The need to create `Stats/Rebuild` emerged from the fact that the requirements for the Reports module changed and we wanted to keep track of a new set of information we previously did not track. All the records in the `Stats/Main` table should have a consistent schema so we needed to be able to get the app to essentially re-generate all the analytics data it had up until now to accomodate to the added columns. While the *Rebuild* is not part of the main flow (can only be triggered from local), it is still an important element of the architecture and we will likely have to run rebuilds in the future ––although the less often the better. ### Compute Stats/Compute is the lambda function which is responsible for calculating the various statistics from the Transaction that has been closed. The Compute function is called from the Transaction/Completion lambda (the one responsible for "closing" transactions) although that could change in the future. It is called with the Transaction record and the cause for clausing in the request body. The function takes the transaction, and scans through the object properties to extract the ones relevant for analytics purposes. It also does a number of simple manipulations (which can be seen in the `Compute` class) Once it done creating the statistics record, it is stored in the `Stats/Buffer` table with the `resolution: transaction`. #### What's a resolution? Resolution is the way in which we aggregate the stats data and serves as half of the primary key in the DynamoDB. The possible values are: * month * day * hour * minute * transaction A record with the `hour` resolution will contain the aggregated analytics for that hour for a given tenant (or tenant group like resellers, global, etc...). This is the same for all resolutions except for `transaction` which is the smallest unit. It refers to stats for only 1 transaction. ### Downsample Stats/Downsample is the lambda function which is responsible for aggregating the `Stats/Buffer` data into time-based resolutions, which is their final form and store them into the `Stats/Main` table. It is triggered in two places: * Automatically from a scheduled event ran every 1min * In the `Stats/Rebuild` when we are recreating the stats records /!\ There is different behaviors based on the caller /!\ The same manipulations are performed on the records but the difference is in the *source* of the records. ##### Scheduled Runs During scheduled runs, downsample will attempt to fetch new records since the last run. ##### Rebuild During rebuild, downsample will use the records it has received in the request body. Downsample computes all resolutions in this order: ``` - minute - hour - day - month ``` For each resolution, it takes the records and downsamples them to the next greater resolution. Once the records have been downsampled, it will delete the souce records that were used to make the aggregation. ###### For example: > First step is to grab all the records with `resolution: hour` for all hours of the day today. > Then we aggregate that data, and store it into a record with `resolution: day` of that date. > Once that aggregation is successful, we delete the records of `resolution: hour` after a set delay. (2 weeks in particular for `hour` resolutions) ### Query Stats/Query is the lambda function which is responsible for fetching the statistics data from *Stats/Main* table and format it for consumption in the frontend of the IDKit portal app. Like the name indicates, it fetches the data from the `Stats/Main` table, shapes it in a way that is easily consummed in the frontend and return that response We use it for 2 reasons: * Fetch analytics & usage data * Fetch data for billing reporting In these two cases we fetch the same or almost the same data. The main difference is in the primary key and index we use. When fetching for billing purposes we use the billing index. ### Rebuild Stats/Rebuild is the ~~lambda function~~ script which is responsible for regenerating the analytics data from the transactions data present. This is how it works: * Pause the `Stats/Downsample` scheduled runs * Delete all the `Stats/Buffer` and `Stats/Main` table data * Fetch all the transaction records * Send them all to the `Stats/Compute` lambda for processing * Fetch all the records from `Stats/Buffer` table * Send those records to `Stats/Downsample` lambda for final storage * Resume the `Stats/Downsample` scheduled runs