📋 JIP-3 - Structured Logging Codes based on JAMSNP

# JIP-3 - Structured Logging Codes based on JAMSNP ### JAM Network Monitoring For [JIP-3: Structured Logging](https://hackmd.io/@polkadot/jip3) monitoring a JAM Network (tiny to full), I suggest we reuse [JAMSNP](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-129-state-request)'s exact encoding method to the fullest extent possible for both sends and receives, to support rapid implementation and publication of a network's logs, with the following trivial modifications: * the code used for responses is the request code minus 128 but the notion of "sender_id" is replaced with "receiver_id" * UP0 uses 130 for sending Handshake and 2 for Announcement Any changes going from JAMSNP to JAMNP, including codec encoding of the JAM representations, would be expected to follow in JIP3, making telemetry be a "whatever you do in QUIC, you send to the telemetry server". This enables each teams with a working tiny testnet to simply share their entire logs with another team prior to working with them by simply providing their telemetry server (and their genesis state, which includes the validator set). Each team can confirm that the other team has completely compatible messaging and proceed to adjust implementations until they are compatible. In addition, this can be used as a prequisite to enter into a larger network, including the "full" JAM Toaster. | Request (Code + Name) | Request | Response | |------------------------|-----------|---------------------| | [CE 128: Block request](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-128-block-request) | `128: Header Hash ++ Direction ++ Maximum Blocks` | `0: [Block]` | | [CE 129: State request](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-129-state-request) | `129: Header Hash ++ Key (Start) ++ Key (End) ++ Maximum Size` | `1: [Boundary Node]`, `[Key ++ Value]` | | [UP 0: Block announcement](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-129-state-request) | `130: Handshake` | `2: Announcement`; Metadata: `Slot` `Microseconds to author block` | | [CE 131/132: Safrole ticket distribution](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-131132-safrole-ticket-distribution) | `131 (or 132): Epoch Index ++ Ticket (Epoch index should identify the epoch that the ticket will be used in)` Metadata: `Microseconds to generate ticket` | — | | [CE 133: Work-package submission](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-133-work-package-submission) | `133: Core Index ++ Work-Package`, `[Extrinsic...]`; Metadata: `WorkPackageHash` `Microseconds to refine WP` | — | | | [CE 134: Work-package sharing](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-134-work-package-sharing) | `134: Core Index ++ Segments-Root Mappings`, `Work-Package Bundle` ; Metadata: `WorkPackageHash` `Microseconds to Encode Bunde` | `6: Work-Report Hash ++ Ed25519 Signature`; `Microseconds to refine WP` | | | [CE 135: Work-report distribution](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-135-work-report-distribution) | `135: Guaranteed Work-Report`; Metadata: `WorkPackageHash` | - | | | [CE 136: Work-report request](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-136-work-report-request) | `136: Work-Report Hash` | `8: Work-Report` | | [CE 137: Shard distribution](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-137-shard-distribution) | `137: Erasure-Root ++ Shard Index`; Metadata: `WorkPackageHash` | `9: Bundle Shard`, `[Segment Shard]`, `Justification` ; Metadata: `WorkPackageHash` `Microseconds to Generate Justification` | | | [CE 138: Audit shard request](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-138-audit-shard-request); | `138: Erasure-Root ++ Shard Index` Metadata: `WorkPackageHash` | `10: Bundle Shard`, `Justification` ; Metadata: `WorkPackageHash` | | [CE 139/140: Segment shard request](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-139140-segment-shard-request) | `139/140: [Erasure-Root ++ Shard Index ++ len++[Segment Index]]` | `11/12: [Segment Shard]`, (for 140) [Justification] `WorkPackageHash` | | [CE 141: Assurance distribution](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-141-assurance-distribution) | `141: Assurance` `Slot`, `HeaderHash` | - | | [CE 142: Preimage announcement](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-142-preimage-announcement) | `142: Service ID ++ Hash ++ Preimage Length` | - | | [CE 143: Preimage request](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-143-preimage-request) | `143: Hash` | `15: Preimage` | | [CE 144: Audit announcement](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-144-audit-announcement) | `144: Header Hash ++ Tranche ++ Announcement`, `Evidence` | `—` | | [CE 145: Judgment publication](https://github.com/zdave-parity/jam-np/blob/main/simple.md#ce-145-judgment-publication) | `145: Epoch Index ++ Validator Index ++ Validity ++ Work-Report Hash ++ Ed25519 Signature` Metadata: `WorkPackageHash` | `—` | Since JAMSNP already has each message transmitted with the size of the message followed by the message content, multi-part telemetry message (those shown with a ", ") are already well-handled. If the telemetry endpoint has an error for some part of received data, it should simply ignore the rest of the data and record a warning. This is simpler than devising frame separators, escaping the separators if they appear in the content etc. ### Code Usage * 0-31 - Reserved for JAMSNP/JAMNP (some used above) * 32-63 - Reserved for implementers * 64-95 - Reserved for GRANDPA 1.0 + 2.0 Finality * 96-112 - Reserved for BLS / Beefy * 113-127 - Reserved for future JAM protocol components * 128-159 - Reserved for JAMSNP/JAMNP (some used above) * 160-191 - Reserved for implementers * 192-223 - Reserved for GRANDPA 1.0/2.0 * 224-240 - Reserved for BLS/Beefy * 241-255 - Reserved for future JAM protocol components #### Grandpa * 192 VoteMessage * 193 CommitMessage * 194 NeighborMessage * 195 Catchup * 196-207 RESERVED FOR GRANDPA #### Syncing * 208 Sync Start * 209 Sync Finish * 210 Warp Sync Start * 211 Warp Sync Finish * 212-223 RESERVED FOR Syncing #### BLS/Beefy * 224 BLS Signature Publication * 225 BLS Aggregate Signature Publication * 226-240 Reserved for BLS/Beefy #### Other * 241-255 RESERVED for future JAM sub protocols. ## Distributed Tracing The following timing of key JAM processes could be self-reported along with the above: * 0 - Block announcement should include timing to _author_ the block * 131/132 - Ticket generation * 133/134/145 - Work package refinement + auditing * 137/138/139/140 - DA encoding and Justification generation * 209+211 - Syncing operations * 224/225 - BLS Signature Generation A simple 4 byte integer prefix representing the number of microseconds will be sufficient, but for syncing the units may be best be milliseconds or seconds. The telemetry server can then usefully pipe this into Jaeger tracing (or similar) open-source visualization to show connected spans of computation. However, for this purpose it is highly desirable to instrument work package hash (133/134/145/137/138/139/140) and header hash (0/192/193/224/225) so the telemetry instrumentation doesn't have to do any JAM-specific indexing to support visualization of a "trace" for any workpackage hash (visualizing refining and auditing) or a header hash (authoring, validating, finalizing). ### Key Performance Bottlenecks Instrumenting the following known performance bottlenecks may be usefully considered. * Erasure Decoding and Encoding (see RFC-139 - Faster Erasure Coding) - Time to fetch import segments - Time to fetch bundle shards - Time to reconstruct work package bundle * Block operations: - Time to Validate a block - Time to Assure data - Time to read state during accumulation - Time to write state post-accumulate - Time spent on each service in ordered accumulation * Key event Failures - Timeouts of work packages ### JAM Implementer Leaderboard With benchmark workpackages + services, a implementer leaderboard should be developed, focussing on KPIs central to meeting M3/M4 "Kusama-performance" and "Polkadot-performance" -- where we need guidance on what this means exactly so it falls out of the telemetry server dataset.