Chain Monitor Design Paper v0.2

# Chain Monitor Design Paper v0.2 #### by Georgios Delkos May 2020 georgios.delkos@certik.io ### Intro The chain monitor is a collection of micro services that help the system gain all the necessary information and release alerts or actions on demand. Rather than just monitoring the chain and providing reports or alerts on changes on it, our system takes it a step further allowing the user to also gather information about malicious actions and prepare or achieve early mitigation of new attacks. In the following sections, we will present the user workflow, the modules, and explain the metrics and usage of those modules. ### Chain Support As different chain protocols provide different metrics and attack surfaces the chain monitor needs to adapt to those different scenarios. The main difference between protocols are the rules that can be applied to the monitor to record events. While the wire listener remains the same and keeps track of the incoming requests, the rest of the system, the metrics and the rest of the modules shift automatically to the protocol. ### The User Workflow The workflow of the system should be very simple, based on a central dashboard where all the functionality is accessible to the user and at a high level, it should look like a combination of 3 steps (this is all taking place on the system while the user is already registered and logged in). In the first step, the user is prompted to create a new monitoring project for his chain by choosing the chain type from a list of supported chains and name it. Next, he needs to upload his binary(here we should validate the binary hash) and chose settings about the number of nodes that he wants to spawn and the location of those. On the second step, the user will be prompted to set up some initial rules for the monitoring system, depending on the system type(PoW, PoS) different sets of rules are provided by the team as starting points. The user has the chance to tweak the given rules, add or remove from those at will to suit his needs. Those rules are derived from the metrics section and are tweakable by values(over the threshold, percentage change, on/off) by the type of the value. Finally, the last third step is about alerting and actions, where the user is prompted to give the system information about the alerting type for each scenario and additionally set the system to take action regarding the issue. This should be a tiered system that the user will choose for each severity, the desired way to receive alerts, and take actions. The tiers are: * [Extreme] * [High] * [Medium] * [Low/Info] ### System Modules: #### The Spawner The spawner takes the binary(node) of the user and spawns on demand and in collocation several replicas that are all monitored. The goal is to easily give the user the ability to monitor different parts of the network and have more accurate results for the complete network. This system should be represented with an upload function and a set of settings for the number of replicas to spawn and the location of them. #### The Chain Watcher The chain watcher is the module that records all the on-chain information that the node receives from the network. It receives all the data that the node gets from the networking stacks and all the outputs that the node provides. Metrics presented by the chain watcher are the following #### [Chain] ##### Blocktime * Current block time * Average block time per X blocks * Percentage change in block time per X blocks ##### Hashrate(PoW) * Current hashrate * Average hash rate per X blocks* * Percentage change in networks hash rate per X blocks* * Hashrate Distribution * Complete hash rate distribution monitoring with information about participants ##### Difficulty * Difficulty * Average difficulty per X blocks* * Percentage change in difficultly per X blocks* ##### Block propagation * Block signer * List of block signers with all the relative info per X blocks* * Continuous block propagation from a signer ##### Block size * Current block size * Average block size per X blocks* * Percentage change in block size per X blocks* ##### Fees * Fees usage * Average fee usage per X blocks* * Percentage change in fees per X blocks* ##### Reorgs * Number of reorgs and the number of blocks that are reorganizing * The average number of reorgs per X blocks* * Percentage change in reorgs per X blocks* ##### Uncle Blocks * Current uncle blocks count * Average uncle blocks per X blocks* * Percentage change in uncle blocks per X blocks* ##### Orphan blocks * Current orphan blocks * Average orphan blocks per X blocks* * Percentage change in orphan blocks per X blocks* #### [Transactions] * Number of transactions per block * Number of transactions per hour * Number of transactions per X blocks* * Number of average transactions per X blocks* * Percentage change in transaction count per X blocks* * Complete transactions list * Large transactions** * Non validated transactions with info about the nodes that broadcasted them * Average transaction cost in the current block * Average transaction cost per X blocks* * Percentage change in average transaction per X blocks* * Total transaction value in BTC/USD (more pairs) * Total average transaction value per X blocks in BTC/USD * Percentage change in average transaction value per X blocks in BTC/USD #### [Nodes] * Active nodes in the system * Average active node per X blocks* * Percentage change in active nodes per X blocks* * Complete nodes list including all information * New nodes in the system count * Complete full log from each node in the system #### [Node System] * Cpu usage * Memory usage * Storage usage * Network traffic in / out * Network status #### [Network] * Network latency * Average network latency per X blocks* * Percentage change in network latency per X blocks* * Tiered representation of the nodes depending on latency #### [Smart Contracts] * Smart contracts count * Smart contracts deployments * Average smart contracts deployment per X blocks* * Percentage change in smart contracts deployment per X blocks* * Complete list of contracts and usage to them #### [Validators] * Number of validators * Number of new validators per X blocks * Percentage change in validators count per X blocks* #### [Addresses] * Number of active addresses on the system * New Addresses the last X blocks* * Unique Addresses on the system #### [Chain economincs] * Total supply * Inflation * Rewarding list with the complete list of validators rewarded * Slashing list with the complete list of the validators slashed #### [Errors] * Complete monitoring of errors derived by the debug and stderr of the node * Errors pass though reporter module and get shorted for reporting #### [System Health Overview] * General view of the network with the most important values *dependent on the chain type ** over threshold ### The Wire Listener The wire listener is the listening module of the system that attaches to the node port and records all the incoming unrecognized requests to the port. This gives the user an insight into what the malicious users are trying against the node. We consider this functionality of great importance since the information that represents is of the highest value. Being able to monitor and understand what malicious users are trying against a live system can lead the user to many early mitigations since he will know what the malicious actors are going after and what they are aiming in real-time. The module records those requests and passes them to the reporter module shorted so that they can be presented more accurately for evaluation to the user. Step by step: * Malicious user sends unknown request to the system. * Record unknown request if it targets a critical system. * Move it to the evaluation system. * Evaluate via the automated system what kind of issue the request presents. * Record it to the incident table in a human readable form with clear information. ### The Reporter The reporter module takes as input data from the other modules and it is responsible for creating human-readable reports on demand for the user. Input is represented in JSON format and the reporter understands the severity of the issue from the JSON itself. The output is represented via the dashboard in a well presented shorted fashion and gives the user the ability to download them. The user can query the reporter for old reports and all kinds of reports that the system provides like the health report and so on. ### The Chain Health Module The chain health module is a module that creates and runs a health check over the system on-demand or periodically. It contains sets of default rules that those network health checks are performed. Those rules can be tweaked, removed, or added by the user. The module pushes the data to the reporter module after the completion of the health check. ### The Alerts Module The alerts module represents the module that alerts the user via email, message, or any other API that it can leverage to deliver the alerts. It contains functionality for the user to create custom alerts depending on the issue and the severity of the issue. It can inform a list of contacts that the user can add and it can also tier those users for better management of the alerting system. ### The Action Module The action module represents the module that given an issue that will act based on the user settings. This can include chain actions if possible(Sudo chains) or off-chain actions as informing exchanges for the situation of the chain. ### The Default Rules The default rules represent a set of combinations of events that are already considered as malicious. They could be a combination of known issues in a given time frame that indicate a certain attack. All those collections can be fully customized by the user to fit his needs and additionally they can be routed to any alert or action. The team will update those in regular basis. ## Chain Specifics Rules ### Proof of work #### Important metrics * Hashrate * Difficulty * Chain Re-orgs * Block Propagation * Transaction Propagation * Network Latency * Peer Count * Wire Listener ### Attack vectors and settings #### DDoS [Events] * Increment on the network latency * High resources usage on nodes * Slow block propagation * High available nodes going offline #### 51% Attack [Events] * Increment on the global hash rate * A big reorg on the block chain * A single user proposing a big amount of blocks as the legit chain * Big transactions (most probably a deposit to an exchange) #### Sybil Attack [Events] * Increment on the peer count * New peers are closely tied in topology * New peers are proposing a different chain #### Eclipse Attack [Events] * Increment on the peer count * New peers are closely tied in topology * New peers are proposing a different hash for the latest block #### Consensus Delay Attack [Events] * Increment on the block broadcasting, mostly invalid blocks * Slow block propagation * Loss of peers #### Spam Attack [Events] * Increment on the invalid requests on the network * Slow block propagation * High count of transactions * High resources usage on nodes ### Proof of stake ### Important metrics * Total Supply * Circulating Supply * Validators Count * Validators Weight * Network Latency * Peer Count * Rewarding * Slashing * Node Health * Wire Listener While proof of stake networks do not have the wide attack surface of the proof of work mechanisms, still attacks do exist, mostly in the form of denial of service or excaution and monitoring specific metrics will allow the user to detect them as soon as possible. The monitoring of weight concentration in the network consensus might also give the user a clearer view of a malicious actor in the network trying to accumulate voting power. Finally the wire listener will help the user detect and analyze new tries against the system and early mitigate an issue by updating or fixing those components under attack.