Considerations: Best Practices for On-Chain Identity

# Considerations: Best Practices for On-Chain Identity Over the last 9 months, an important and productive outgrowth of prototyping and design conversations in Verite has been the On-Chain Best Practices Working Group, which has met weekly to compare notes on VC implementations, identity token implementations, and all the mechanics of registries and records intended for on-chain consumption. What follows is an overview of a few heuristics that we've landed on as independent criteria for assessing and designing end-to-end system connecting realworld identities to on-chain pseudonyms. Think of them as orthogonal axes, which might be more or less important in a given situation or use-case. We do not believe all of them can be perfectly "solved for"; if anything, identity is a realm of trade-offs, where scoring high marks on one axis usually requires cutting corners on another. ## Axis X: Granularity of Data Points Too many conversations about on-chain identity take a *binary* view of "personally identifiable information", in which any given piece information either "personal" or "opaque", i.e., not-personal. That isn't [how adtech and industrialized digital surveillance works today](https://techcrunch.com/2019/07/24/researchers-spotlight-the-lie-of-anonymous-data/), so it shouldn't be how we conceptualize the privacy threat model; all static data, particularly "strong" data, is a "data point" and any strong linkage between datasets in different contexts builds a graph expanding out to every other data point linked to *those* data points... For that reason, we have discussed onchain markers as additive, i.e., adding an [immutable] data point to all the others which would be included in an NSA, chainalysis, or adtech profile about a person if its linkage to the rest of the profile were probabilistically/inferentially linked. In this way, what matters is not the binary personal/impersonal, but how useful it would be to a *probabilistic* linking mechanism, which is what matters in privacy modeling. This is often called "coarseness" of information, i.e., saying as little as possible, and differentiating a pseudonym as little as possible from the millions of others in a system. ![Swirlds Labs](https://hackmd.io/_uploads/ryUDJBU1h.png) Keith from Swirlds Labs made this amazing diagram illustrating coarseness as a spectrum for a presentation about on-chain identity. The further left, the less damaging and de-pseudonymizing. Anywhere on this spectrum, you're making tradeoffs, but that is the nature of engineering: there are no silver bullets in identity, or in law. ## Axis X: Audit Trails One way to use maximally coarse data on-chain is to combine it with data elsewhere at time of validation of that data. Whether this validation uses cryptographic Zero Knowledge approaches or just secure channels and trusted intermediaries, they share a basic requirement with all KYC use-cases-- in a situation of sufficiently severe retrospective scrutiny (federal court, major lawsuit discovery, etc), all decisions and information flows will need to be replayable from archives. To keep on-chain decision-making records coarse, they *MUST* link to off-chain records of finer-grained information and decisions in a non-repudiable way. Put another way, privacy and pseudonymity can only be reconciled with data custodians acting under some kind of fiduciary obligation. Some versions of this are contractual and involve lots of humans in the loop (who pick up the phone when judges, those other humans always in the loop, issue court orders); other versions imagine impersonal trusted parties, data vaults with a glass panel on which is written, "break in case of emergency". To make this tangible in Verite's [sample implementation on github](https://github.com/centrehq/verite/), there are `verificationRecords` serving as on-chain stubs for off-chain processes, and there are `verificationResults` which are like events in a log of business-logic decisions. Whichever comes first is linked from the other by UUID or hash, and in either case the holder of the latter is always linked from the former. This is the crucial breadcrumb trail allowing on-chain data to be coarse because a trusted intermediary can be queried for the "real records". For more on how auditability works across architectures, see the "Replayability" sections under each architecture in [our recent RWoT paper](https://github.com/WebOfTrustInfo/rwot11-the-hague/blob/master/draft-documents/onchain_identity_verification_flows.md) on on-chain identity architectures. ## Axis X: Accountability There is a reciprocal relationship between auditability and accountability: building one without the other is quite hard. An audit trail that is completely replayable end-to-end helps each "step" in a chain be precisely timestamped and verifiable-- which can be really dangerous or scary to an intermediary that doesn't have their precise responsibility and liability clearly defined, lest that verifiable (i.e. cryptographically non-repudiable!) evidence of their actions become evidence in a lawsuit or regulatory penalty. On the other hand, knowing that every *other* actor in a chain of reliance or quality assurance, with accountability clearly defined for each party, can be very reassuring against the vast liabilities entailed by regulatory liabilities per-customer or per-counterparty. In any real-world chain of reliance, confidentiality, and/or quality assurance, digital or otherwise, there are some basic failure modes to be considered and "papered over," i.e. contractually defined for all parties. These include one entity in the chain (a service provider, a process, a custodian) going out of business, or going insolvent, or having their control frozen by law enforcement intervention. In the pseudonymous cryptocurrency environment, additional failure-modes to be considered: 1. If one link in the chain of reliance were "anon", i.e. an unconventional legal entity without classic accountability anchors in case of legal emergency, we have to add the "rugpull" variant, which can be harder to mitigate than a bankrupcy or insolvency. While anon founders and all-anon protocols appear on any DeFi leaderboard, many actors in the reg-tech space have opted to limit their risk by excluding them entirely from the scope of their solutions. 2. Regulation or case law can change suddenly, making a chain of dependencies suddenly less viable, sometimes even *retroactively* in cases of transactions already processed. "Sunset periods" and spin-down plans are worth including in both technical designs and contracts. 3. Hard forks-- while the state of a given blockchain is often the "source of truth" for economic systems, most blockchain runtimes and consensus mechanisms allow for forks, after which two copies of that chain's state and all the contract and accounts it contains, right down to identity tokens. 4. Even without a fork, most deployed smart contracts can be "bricked" (rendered incapable of intervention or upgrading), or "drained" (losing operating capital or collateral by hacks or other forms of attack exploiting their hard-coded behaviors). 5. On a more basic level, the private keys, hardware signers, or other mechanisms for controlling accounts can be lent, lost, or stolen. While blockchain engineering tends to *assume* they won't be, this can be called happy-path engineering; failure to maintain or prove identity assurance across sessions, or provide other proofs of continuous offchain control, are in some cases valid triggers to suspend or alter the terms of reliance up and down the chain. Architecting the legal system that can handle all of these possibilities and permutations can be just as complex as architecting the technical system... and as much in latter case as in the former, pseudonymity and control structures based on assymetrical cryptography layer on many additional complexities. ## Axis X: Consent-by-Design In some ways, the intense power and information assymetries of Web2 made "privacy by design" the watchword of ethical software design in the 2010's. As web3's "transparent by default" and in-the-open design patterns have inverted many of those patterns, however, the urgency of informational asymmetries has taken a backseat to UX concerns. Furthermore, as the user-dapp and user-blockchain interaction is in many ways legal *terra incognita*, with so many "at your own risk" and "there is no warranty" disclaimers at play, not just consent but *meaningful* consent has become crucial to all web3 design. One strong example of this is NFTs - anyone who has purchased NFTs has watched in horror as their wallet fills with unwanted additional NFTs put there by aspiring entrepreneurs, scammers, spammers, and even phishermen trying to get the NFT crowd to click dubious links. The Ethereum blockchain has seen so much airdrops-as-advertising, ["dust attacks"](https://blockworks.co/news/defi-web-apps-block-users-hit-by-tornado-cash-dust-attack), and even [malicious ENS registrations](https://cointelegraph.com/news/mark-cuban-issues-burn-notice-on-offensive-ens-domain) in recent years that some even question if consent needs to be built in at lower levels of the permissionless protocol before attempting to onboard the next billion users. In the context of on-chain artefacts like non-transerable tokens that mark a wallet as controlled by a Verite account-holder/credential-holder, the classic "on-chain" proof of consent is a signature, particularly what Ethereum calls an "EOA" signature (i.e. manually-requested, manually-authorized, wallet-driven) signature. Like any other EVM-compatible ERC21 NFT, whether or not they are "burnable" (i.e., removable from chainstate and thus "from the wallet") and by whom is determined by the individual NFT smart contract; Violet.co, for example, has advocated publicly for any and all identity tokens, whether transferible or not, to be burnable by both issuer and holder, as well as making their own identity tokens mutually-consented as a pre-condition of issuance. ## Axis X: Forward-Secrecy - Better versus Best Account Abstraction patterns?