The following are used to monitor and mitigate risks involved with automated trading systems: ## Performance #### Monitoring - Portfolio Net leverage (total notional long, minus total notional short, divided by net equity (margin + unrealized PnL)) - Portfolio Gross leverage (total notional long, plus total notional short, divided by net equity) - Portfolio PnL - Live vs theoretical PnL - Instrument/Asset PnL (Identify which asset or instrument are losers, due to idiosyncratic properties, such as volatility, mean reversion, rug pull, tick size, ...) - Return after fill at different frequencies (Identify bad quoting) #### Alerting - Alert on certain net, gross or PnL limits - Alert on certain asset underperforming ## Infrastructure: #### Monitoring - Trading instance health (memory, cpu usage, syscall usage, GC time) #### Alerting - Trading instance heartbeat (ping) - Trading instance running out of memory - Trading instance high on CPU usage #### Safeguards - Backup instance with whitelisted ips for emergency closing ## Execution #### Monitoring - Order per minute per instrument - Volume per minute per instrument - Order expiration rate - Live vs theoretical portfolio allocation (portfolio distance metric) - Order slippage: Difference between the expected price of a trade and the actual price at which the order is executed - Trade fill rate: Percentage of orders that are successfully filled - Time to execution: Amount of time it takes for an order to be executed once it's placed - Trade size distribution: Look at the distribution of trade sizes - Trade error rate: Monitor the ratio between successful order placement request and errors #### Alerting - High or low instrument volume - High or low instrument order placement - Biased order placement - High order expiration rate - High distance from theoretical portfolio - Out of distribution trade size - High trade error rate #### Safeguards - Execution layer enters in reduce only if individual position breaches a net/gross exposure soft limit - Execution layer reduces position if it breaches a net/gross exposure hard limit - Soft/hard limits depend on coin liquidity and volatility - soft/hard limits on margin utilization, enter in global reduce only mode (soft) or in global position reduction mode (hard) - Reduce-only kill switch - Open-order kill switch ## Market #### Monitoring - Market volatility - Asset/instrument volatility - Asset correlations #### Safeguards - Block execution or reduce-only on abnormal market activities - Block execution or reduce-only on abnormal instrument activities ## Exchange #### Monitoring - Exchange API disconnection or non-responding - Exchange API latency - Instrument state (post only, ...) #### Alerting - Alert on high latency - Alert on high failure rates #### Safeguards - Rebuild positions live using fills and compare periodically with position snapshot