--- robots: noindex, nofollow --- # NSF Project Pitch: Gordian Envelope (2/24) ## First Page ### 1. Email ChristopherA@lifewithalacrity.com ### 4. Phone Number 510-908-1066 ### 5. Company Name Blockchain Commons ### 6. Company ZIP Code 82009 ### 7. Company State WY ### 8. Company website (if applicable) https://www.blockchaincommons.com/ ### 9. Please pick the SBIR/STTR topic that best fits your project's technology Cybersecurity and Authentication --- ### 10. Is this Project Pitch for a technology or project concept that was previously submitted as a full proposal by your company to the NSF SBIR/STTR Phase I Program – and was not awarded? NO ### 11. Has your company received a prior NSF SBIR or STTR award? NO ### 12. Does your company currently have a full Phase I SBIR or STTR proposal under review at NSF? NO --- ## 13. The Technology Innovation. (Up to 500 words) The Gordian Envelope project introduces hashed elision of data as a method to preserve privacy. This technology was created as a response to the increased release of data onto public networks, including health records, educational records, verifiable credentials, shipping records, and many other types of data, much of it confidential and/or proprietary. Completely releasing data, which is the current default methodology, causes notable problems such as: disclosure, where more information is released than was needed; secondary use, where data is put to a use other than it was originally intended; and correlation, where data from separate places is combined to give composite information about a person, business, or other entity. These problems all negatively impact privacy and can threaten a person or other entity in a variety of ways, potentially leading to undesired public revelation, discrimination, financial threat, or even physical threat. To resolve these problems, a method needs to simultaneously: minimize transmitted data; prove the existence of that minimized data; and maintain valid digital signatures over that data when needed (for use cases such as credentials), whether some of it was minimized or not. The innovation to do this is a method we called "hashed data elision", which is the heart of our Gordian Envelope structured data format. Elision means removing some datums from a package, so that the reduced content properly matches the needs of a recipient. The technological innovation is to maintain a cryptographic hash of all data, whether it's been elided or not, using a trusted hash methodology such as SHA-256. Those hashes are then arranged into a Merkle Tree, which hashes together hashes of leaf data until a single root hash is made to cryptographically represent the entire tree. Any digital signatures are then made over the root hash, rather than the original data. This ensures signature validity even if some data is later elided (because the hashes remain). As an alternative to full elision, data can instead be encrypted, once more with hashes of the original data kept in place. This allows for recovery of the original data in certain circumstances while preserving validation even without the cryptographic keys. Though the components of Gordian Envelope are known technologies, their combination in this fashion is not: hashed data elision opens up new possibilities to normalize data protection and privacy. Even greater innovations are possible when hashed data is used to create "inclusion proofs" of data content or when data subjects are protected by embedding root hashes of their data into large data structures to create "herd privacy". These more expansive possibilities are the most unproven and risky parts of the innovation and what requires the most additional study to fulfill the project, as discussed below. ## 14. The Technical Objectives and Challenges. (Up to 500 words) A draft of Gordian Envelope was published as an Internet-Draft through IETF (https://datatracker.ietf.org/doc/draft-mcnally-envelope/). However, additional work is needed to turn it into a more encompassing data-protection specification that would be advantageous to a variety of industries. The five largest technical challenges are: 1. Integration with Data Providers. Many industries have requirements served by hashed data elision. We have already produced use cases for some (https://www.blockchaincommons.com/introduction/Gordian-Envelope-Use-Cases/), such as a set of case studies and presentations that we wrote for healthcare, which describe how to maintain confidential health data while simultaneously sharing it with providers and even making it available for public health needs. However, more are needed. The challenge is to work with data providers from a variety of industries. We have highlighted shipping and software release as two important fields for additional study. The goal is to collect knowledge from these and other industries, write use cases to reflect it, and incorporate the resultant requirements into either revisions of Gordian Envelope or discussions of best practices. 2. Work with Standards Groups. Standardization is required for Gordian Envelope to be viable for many commercial uses. This will also enable discussions of the methodology, which will allow us to further improve it. 3. R&D for Data Security. The security of hashes is well understood but additional R&D is required to ensure the overall security of hashed data elision. The largest known issue is correlation: as currently designed, it's possible for multiple holders, each of which possesses data that's elided in different ways, to combine them. There may be other issues. The goal for this research is to determine whether there are solutions that can resolve security issues without undercutting the fundamental advantages of hashed data elision and to incorporate them into improved specs or best practices. 4. R&D for Inclusion Proofs. Hashed data elision allows a holder to elide data and then to prove that some datum exists by revealing the datum and a path of hashes that leads from the Merkle Tree root to the datum. Alternatively, a third party can prove that a datum exists in a hashed Merkle Tree if they know the structure of the datum and its Merkle branch. However, there are wider possibilities. Can multiple datums be simultaneously proved? Can data be proved that lies at different levels of the Merkle Tree? Can data be proven from a branch that has unknown leaves below it? The goal for this research is to build on new use cases created in conjunction with data providers and to determine which possibilities for inclusion proofs are viable and which are not and then to revise Gordian Envelope or write best practices, as needed. 5. R&D for Herd Privacy. Herd privacy allows large amounts of data to be sent out together, with individual datums only revealed if the subject desires (at which point they reveal inclusion proofs). The goal for this research is to better understand herd privacy and its advantages and to ensure that Gordian Envelope maximizes its possibilities. ## 15. The Market Opportunity. (Up to 250 words) Data providers or holders are our customers. Their pain point is a potential data breach, which can cost them reputation, reparation, and lost data. The liabilities can quickly stack up, particularly as new legislation puts increasing value on data protection. Any reduction of data content reduces these liabilities. Specific examples include: * HEALTHCARE SYSTEMS. Data elision can be used to offer protection of data required by HIPAA. It can also minimize data seen by insurance companies or less-trusted providers, while still offering proof of care or relevant data. Herd privacy can be used to share data with public health studies. Signed, encrypted data can import data from wellness trackers. * SHIPPING & LOGISTICS SYSTEMS. Data elision can maintain confidentiality of shipments while inclusion proofs and signatures can verify contents upon receipt. Though our customers will be businesses with strong commercial incentives to adopt hashed data elision, the biggest beneficiaries (and thus the largest societal impact) will likely be for data subjects. For individuals, this might mean protection from discrimination (because appropriate data such as a resume won't be correlated with discriminatory data such as age or race), from financial harm (because data won't be correlated together to allow identity theft), or from physical harm (because an online identity won't be correlated together with a real-life address). Today, the ability to correlate personal information using computers is an ever-increasing disaster. The extant problems of today could soon become much worse. Hashed data elision is a solution. ## 16. The Company and Team. (Up to 250 words) Christopher Allen is the architect of Gordian Envelope. Christopher is also the co-author of the IETF TLS standard, foundational to secure online commerce. As of 2021, 63% of the top million web sites used TLS. As of 2022, 76% of companies used TLS for communication with remote employees. Christopher's work on TLS proves his ability to shepherd a standard into the widest commercial use possible. Christopher was also a performer on the DHS S&T SBIR (Contract #HSHQDC-16-C-00061; Program Manager: Anil John) that laid the foundation for Decentralized Identifiers (DIDs) to become globally adopted. He is also the co-author of the IETF standard for DIDs. This proves his ability to work with the NSF on a specification of global importance and to bring it to successful conclusion. Wolf McNally is the lead researcher and engineer for Gordian Envelope. Wolf is a skilled developer who is the co-author of the Gordian Envelope specification as well as the co-author of the deterministic CBOR specification, which we expect to become an IETF RFC this year. Wolf is also the creator of the Lifehash specification, a digital hash method meant to improve the accessibility and recognition of digital assets. He's also developed a large suite of cryptographic digital-asset libraries and applications. Shannon Appelcline is the lead technical writer for Gordian Envelope. Shannon is an experienced writer with several non-fiction publications in both the technical and non-technical fields. His focus is on making complex technical processes easy to understand and thus improve their usage and penetration. ## Bonus Question: How is This Innovative? Gordian Envelope builds on mature technologies such as Merkel Trees and cryptographic hashes, but it's definitely not just an evolutionary development. It offers several major innovations that make it a quantum leap forward, differing from traditional uses of hashes to validate or checksum data. These innovations include: * VARIABLE LEVELS OF CORRELATION. By default, hashed data is uncorrelatable, since hashes are a one-way function. Envelope data can be made more correlatable by sharing information about the structure and contents, allowing a third party with existing knowledge to verify its existence even after it's been elided; or it can be made less correlatable by salting the data to deny its recovery unless the secret of the salt is shared. * PRACTICAL USE OF DATA MINIMIZATION. Though data minimization has increasingly been the focus of Internet RFCS (IETF 6973) and general regulations (GDPR, CCPA), general elision of data has not been practical due to its destruction of data context. Envelope turns this real-world need into reality with a system that preserves the authenticity and integrity of the data. * HOLDERS ELISION OF VALIDATED DATA. Traditionally, only issuers of signed data have been able to elide it, though some newer technologies allow subjects to do so. Gordian Envelope allows any data holder to elide. As an example, this can allow a company that holds credentials for an employee and wants to share those credentials, perhaps as part of a bid, to so without sharing the PII. It supports decentralized control of data. * INCLUSION PROOFS FOR DATA. This is the most far-freaching and innovative aspect of the technology. Holders can generate inclusion proofs prior to the elision of data (or even just knowing the contents of data) that allow for proof of the existence of that data even after elision. This allows for the publication of elided data and then for that publication to stand as a root of trust for the future revelation of that data, either publicly or privately. It also allows for the creation of powerful herd-privacy systems where only a root hash is published. Individual participants can then choose to reveal their content or not. These innovations speak to many of the challenges of this work. The usage and viability of herd privacy is largely untested, so we will be breaking new ground with its use. Similarly, the ability for any holder to elide validated data and for issuers to create variable levels of correlation are largely untested. We expect to find and overcome challenging issues while perfecting these systems. But these innovations also speak to the competitive advantages of Gordian Envelope. There are few viable methods for data minimization despite increasing requirements. No mainstream system offers Holder-based elision of validated data. No system considers how to both enable and deter correlation. Finally, no one is offering herd privacy, which is growing in importance as larger swaths of data are released onto the internet (or even held privately).