Provenance Marks vs. Cryptographic Event Logs

# Provenance Marks vs. Cryptographic Event Logs Wolf McNally and Christopher Allen © 2025 Blockchain Commons [Provenance Marks](https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2025-001-provenance-mark.md) and [Cryptographic Event Logs](https://digitalbazaar.github.io/cel-spec/) both deliver decentralized, tamper-evident methods to link data changes over time. ## Cryptographic Event Logs (CEL) CEL stores a sequence of “events” in a JSON or CBOR log. Each event has: - A reference (`previousEvent`) to the prior event’s hash. - Witness signatures that prove the event existed at a specific time, without revealing the data’s content to the witnesses (“oblivious” signatures). - An “operation” field (create, update, deactivate) and a “proof” array for multi-party attestation. CEL prescribes a specific and opinionated structure: you embed your update or data reference in the `event` object, store multiple witness signatures in the `proof` array, and keep chunking references in `previousLog` to avoid gigantic files. This is a robust approach for multi-witness validation and chunked distribution of large logs. ## Provenance Marks (PM) PM aims for a minimal, flexible mechanism for anchoring authenticity. Each PM is a compact binary structure that commits to: - A current random `key` (drawn from a secret seed’s PRNG). - A `hash` that depends on the next key (which remains hidden until the next mark). - A shared chain identifier (`id`), sequence number, and date stamp. - An optional open-ended `info` field that can hold arbitrary dCBOR data. A key difference is that PM pre-commits to the next mark. In other words, each mark’s `hash` includes the *undisclosed next key*, so no one (including the author) can alter the chain order without invalidating previously published marks. That’s not something CEL, with its mono-directional “previousEvent” references, can enforce. The pre-commitment to the next key allows each new mark to be published with the revelation of the key used in the previous hash taking the place of a cryptographic signature, providing security while keeping provenance marks light-weight in both storage and computational cost. ## Replicating CEL’s Features with PM Despite PM’s minimal default structure, it can implement everything CEL does by embedding rich metadata in the `info` field: 1. **Multi-Witness Signatures**: Collect third-party signatures that attest to a hash of the key to be revealed in the soon-to-be-published mark. These are oblivious signatures, and can be returned as signed Gordian Envelopes that contains the signer's identity and timestamp. You then include those Envelopes in the `info` field Envelope. Because `info` is included in the image used to generate the mark's hash, each mark cryptographically commits to those external attestations. 2. **Data References and Chunking**: PM can store the same references to large external data (URI lists, media types, etc.) in `info`, either as raw data or metadata. Collections of sequential marks can be chunked into log "pages". The difference from CEL is that baseline PM doesn’t prescribe how you do it; you define the chunking logic in `info`. 3. **Multiple “Create/Update/Deactivate” Operations**: PM is content-agnostic. Store a “type” or “operation” field in `info` that describes whether this mark is creating, updating, or deactivating the tracked asset. PM is flexible enough to let you replicate CEL’s entire operation-based approach. ## Additional PM Strength: Pre-Commitment CEL references the prior event hash, but does not automatically commit to the next event’s key. By contrast, a PM always includes a forward hash binding the next mark’s unrevealed key into the current mark—meaning you can’t slip in an event out of order or shuffle the chain without invalidating prior marks. This pre-commitment is a distinctive security feature CEL doesn’t inherently replicate. The pre-commitment to the key means that unlike CEL which inherently requires third-parites to attest to each new event, PMs are "self-attesting", in that only the seed-holder can generate the next mark. In addition, to be valid the next mark in the chain must have the next monotonically-increasing sequence number and a date equal or later than the previous mark. The main guarantee here is sequencing, not date verification, which if desired can still be accomplished with witnesses (described above). ## Witness Mechanisms Both PM and CEL aim to create tamper-evident records of changes over time, but they take different approaches to preventing history revision through external verification. CEL builds witnessing directly into its protocol, requiring dedicated services that provide oblivious signatures - signing cryptographic hashes without seeing the underlying content. This creates a need for a new class of service providers whose incentives and sustainability need careful consideration. While these services could potentially charge fees or build reputation in the ecosystem, the model requires explicit answers to questions about who pays for the service and who ensures witness reliability. PM takes a different approach, leveraging public transparency as its primary defense against history revision. The specification recommends publishing mark chains to existing public repositories like GitHub, where their visibility makes any attempt to rewrite history detectable. This piggybacks on infrastructure that already has aligned incentives - platforms whose core business is reliable data storage and access. While specialized PM chain hosting services could emerge, the system doesn't require them, as it can utilize existing platforms whose incentives for reliability and availability are already well-established. This difference highlights an interesting architectural choice: CEL embeds its trust mechanism in the protocol itself through active witnesses, while PM achieves trust through passive witnessing via public infrastructure, whether existing platforms or potential future specialized services. The PM approach offers more flexibility in how witnessing is implemented while potentially providing a more sustainable model by leveraging existing incentive structures. This design choice reflects PM's philosophy of minimal core mechanisms that can be extended and implemented in various ways, contrasting with CEL's more prescriptive approach to how trust should be established and maintained. # Architectural Complexity The contrast in architectural complexity reveals a fundamental difference between the two approaches. PM operates with minimal required components - essentially just needing a seed for key generation and existing public infrastructure like GitHub for transparency. It achieves its security goals without requiring coordination with third parties or maintenance of witness services. While PM can accommodate additional complexity through its flexible `info` field, this remains optional and can be added only when needed. In contrast, CEL requires several interconnected components to function: controller keys must be maintained, multiple witness services must remain active and honest, and there needs to be ongoing coordination with these services to obtain signatures. The witness services themselves introduce additional complexity, as they must maintain their own keys, provide reliable API endpoints, and ensure consistent uptime. This is reflected in CEL's more elaborate JSON/CBOR structure with its prescribed fields. # Recovery Mechanisms When it comes to recovery scenarios, the two approaches again show a marked difference in their level of preparation. PM builds recovery directly into its core specification through a deliberate seed rotation mechanism. This provides a clear, documented process for transitioning to new seeds, whether in response to potential compromises or as part of regular security maintenance. Importantly, this rotation capability allows for updating secrets while maintaining the chain's integrity and forward-binding security. CEL, on the other hand, takes a less structured approach to recovery. While it acknowledges the possibility of key compromise and vaguely references the potential use of recovery keys, it doesn't specify concrete recovery mechanisms in its specification. Instead, it leaves these critical concerns to be handled at the application level. This means there's no standardized process for handling compromised controller keys or witness keys, potentially leaving systems vulnerable or requiring the recreation of entire chains in worst-case scenarios. ## Conclusion The comparison between Provenance Marks and Cryptographic Event Logs reveals two distinct philosophies in approaching tamper-evident data authentication. CEL provides a comprehensive, prescribed solution with built-in witnessing and a structured format for event logging. This approach offers immediate utility but comes with the operational complexity of maintaining witness services and coordinating with multiple parties. PM takes a minimalist approach, providing a lean core mechanism that can be extended as needed. Its innovative pre-commitment feature offers stronger ordering guarantees than CEL, while its reliance on existing infrastructure for transparency potentially offers greater sustainability. PM's built-in seed rotation mechanism also demonstrates a more thorough consideration of long-term maintenance and recovery scenarios. The key insight is that both systems ultimately solve similar problems but make different trade-offs. CEL optimizes for immediate completeness with its prescribed structure, while PM optimizes for simplicity and flexibility with its minimal core and extensible design. For applications requiring a full-featured, ready-to-use solution, CEL provides a robust framework. For those valuing simplicity, flexibility, and stronger ordering guarantees, PM offers a more adaptable foundation that can grow with evolving needs while maintaining a smaller operational footprint. The choice between them ultimately depends on specific requirements: whether the overhead of maintaining witness services is acceptable, how important built-in recovery mechanisms are, and whether the application benefits more from a prescriptive structure or the flexibility to evolve custom solutions over time.