(Draft Blog Post) Verifiable Health Records: Setting [W3C] Verifiable Credentials on [HL7] FHIR

# (Draft Blog Post) Verifiable Health Records: Setting [W3C] Verifiable Credentials on [HL7] FHIR ## Background: Health data APIs are here With widespread adoption of Electronic Health Records, patients have increasingly reliable access to summary healthcare records through online portals. Many provider-hosted portals today also offer API access through HL7 FHIR; and given this year's expanded requirements from the Office of the National Coordination for Health IT, all certified EHR vendors will offer FHIR APIs by 2022. Convenient API access paves the way for patients to aggregate their own health data and share these data downstream -- making it easier to seek a second opinion, share vaccination records, or send health history to a new care team. As API access expands, we have a critical opportunity to improve API functionality in tandem. In this post, we'll explore functionality that helps patients act as trusted intermediaries. ## Patients as trusted intermediaries With today's FHIR implementations, data that land in a consumer-controlled app can be shared, but it's hard to achieve any guarantee of authenticity. For example, when a patient saves records from a onehealthcare provider and forwards these records to a new provider for a second opinion, there's no way to tell if the data provided are complete, or whether specific details have been omitted or altered. This model works just fine for most healthcare scenarios, where despite any technical conrtrols, a foundation of social trust must exist between patient and provider. But in some use cases, like a parent storing a child's vaccination records and sending them along to a school, there's a stronger societal need to pass along not just healthcare records, but *authenticated, tamper-proof* records -- in other words, *verifiable data* that a recipient can tell is genuine. (Similar to proof of vaccination for participation in school activities, there's growing societal interest in how patients can share history of COVID-19 infection, recovery, and immunity with various people and organizations. Given the rapidly evolving scientific and social perspecties, we won't discuss these issues further in this blog post, but we wanted to highlight the relevance of COVID-19 use cases for patient-mediated exchange of verifiable healthcare data.) ### Making clinical data "verifiable" One common way to make data "verifiable" is to have the author of the data provide a digital signature alongside the data, computed using the private portion of an asymmetric (public/private) keypair. Verifying parties can establish the authenticity of the data by checking the signature against the author's public key. The nice thing about such signature schemes is that the data can pass through any number of intermediaries without interfering with the recipient's ability to validate the signature. For such schemes to work, the recipient *does* need some reliable way to determine a sender's public key, and many protocols exist for key "discovery" (e.g., in healthcare, the Direct Project defines a secure e-mail protocol where a sender's keys can be discovered via DNS query on the sender's domain). (In this blog post, we'll leave key discovery out of scope, but the approaches we'll explore here are designed to work hand-in-hand with the W3C Decentralized Identifiers specification, allowing anyone to publish keys through open, secure, and globally availabile protocols.) ## Leveraging W3C Verifiable Credentials Data Model The W3C Verifiable Credentials 1.0 Data Model (VC) specification provides a standard way to create a signed **credential** (e.g., a set of relevant clinical data). The credential is created and signed by an **issuer** (e.g., a health system responsible for maintaining clinical records), then passed along to a **holder** for storage (e.g., a patient keeping records in a mobile health app). Then, when the holder chooses, they can present the credential to a **verifier** (e.g., a school administrator confirming vaccination history). ![Figre 1, from W3C Verifiable Credentials Data Model specification](https://i.imgur.com/RSna0uY.png) In the VC data model, an issuer asserts specific "claims" about a "subject", wrapping these claims up into a credential. (Often the subject is same person as the holder described above, but for example if credentials for a child's immunization history are managed by a parent, then the child would be the subject of the claims, and the parent would be the holder of the credential). The details of claims data models are, by design, out of scope for W3C; the VC spec is designed to be used for claims in scenarios that range from retail to healthcare to finance and beyond. So if we want a way to make healthcare-specific claims, we need to look toward domain-specific healthcare standards. HL7 Fast Healthcare Interoperability Resources (FHIR) is a natural fit because it provides detailed, concrete data models and JSON syntax that can be layered on top of (or, if you prefer a different metaphor, slotted into) the VC model. ## Designing VC schema for healthcare Perhaps the most important factor when designing a schema for Verifiable Credentials is the decision about which specific data elements to include in the credentials. We can think about data elements in a healthcare credential as belonging to two catetories: **clinical content** and **identity**. Which elements should be required, allowed, and prohibited? In general, once a VC is offered to a holder, the holder can only present the entire VC as a unit; there's no way to break the credential into smaller pieces for selective disclosure while preserving the integrity of the signature. There's a trade-off between the amount of data baked into a VC and the privacy afforded at the time of presentation. As such, a guiding princinple is that the issuer should create "minimal" VCs, and potentially create many of them, so that the holder can mix and match them at the time of presentation, proving what's necessary while preserving privacy. Another way to say this is that **a VC should follow the "just enough" principle, including just enough data to serve its purpose**. (More formally, this is often called "minimum necessary".) This principle highlights the importance of understanding the purpose of a VC; it's not just a heap of data, but rather a specifically curated set of claims designed to be presented in certain contexts. To make this concrete, let's consider the example of an immunization record. It would be possible to roll up a patient's entire immunization history in a single Verifiable Credential, providing an easy way to demonstrate many vaccinations at once. But it's more flexible to provide smaller, discrete VCs -- so that, for example, if a medical student needs to demonstrate a history of Varicella immunization, she can present the relevant vaccines or serology results without incidentally including extraneous details about other immunizations. (Note: Zero-Knowledge Proofs offer novel capabilites that allow claims in single credentials to be mixed and matched more easily, but support for ZKPs is still nascent, and as such, we'll leave ZKPs out of scope for this blog post.) To create a VC using FHIR resources, we'll need to make decisions about which clinial data to include in the pacakge, and then how to bind the package to the subject's real-world identity. ### Packaging a relevant set of clinical data with "**FHIR Content Resources**" For a medical student with a clinical history of chicken pox (Varicella), we might choose a set of **FHIR content resources** to include `Immunization` resources that convey immunizations administered,`Observation` resources that convey immunoassay results, and `DiagnosticReports` that convey interpretations or conclusions. For best results, these resources should be mapped to common profiles such as the [US Core](http://hl7.org/fhir/us/core/), which improves consistency of data elements, identifiers, and coded terms. For example, US Core profiles ensure that Immunizations include CVX codes and Obserations include LOINC codes. Following the "just enough" principle, when we're designing a Varicella History credential, we should focus on the specific elements required for demonstrating immunity (e.g., vaccinations administered, tests performed and clinical results obtained, as well as effective dates and times and overall interpretations). ### Binding the VC to a subject's identity with "FHIR Identity Resources" For a VC to be useful, it must include some information about the subject for whom it was issued -- otherwise, there's nothing to stop the holder from passing the credential along to someone else, and having that person present it instead. From a security analysis perspective, this "impersonation" behavior is an example of the kind of "threat" that systems need to anticipate and mitigate when accepting credentials from a user. Similar threats arise in some types of insurance fraud, where one person might try to receive care under another person's insurance plan. A proper analysis takes the broader context into account: what's at risk if a mistake is made, who takes on the risk, and what are the opportunity costs of *not* accepting a credential? Binding a credential to a real-world identity can be a surprisingly deep challenge, and here we'll just skim the surface. Fundamentally, presenting VCs in the real world needs to give the verifier some level of confidence that the person presenting the credential is the subject (or acting on behalf of the subject). For this blog post, we'll briefly note that one of the most useful ways to bind a VC to a real-world identity is by using the W3C Decentralized Identifiers specification, where a user generates a DID (decentralized identifier) and attaches various VCs capturing different aspects of identity, potentially from various issuers. The broad usefulness of binding VCs to DIDs holds great promise -- but there are also some use cases where directly binding FHIR content to a few simple identifiers can suffice. In our example of a medical student presenting evidence of Varicella immunity, the student already has a direct relationship with her medical school administration; binding a VC with lab results to identifiers like the student's name and phone number might suffice. Similarly, for scenarios where a VC will be presented in person, binding a VC to a photograph of the subject can go a long way to provide assurance to an in-person verifier; all the verifier needs to do is compare the physical appearance of the holder with the photo bound in the VC. Again, following the "just enough" principle, we should identify use cases for presentation and aim to include just enough links to the subject's real-world identity to enable verification. In this discussion, we'll focus on two common scenarios where **FHIR identity resources** can bind the FHIR content resources to an external identity system: * For **Online Presentation**, we expect that the consumer will present the VC to an organization that has a pre-existing relationship with the consumer, so it's sufficient to bind the VC to a real-world name and verified phone number: * `Patient` with name and phone number * For **In-person Presentation**, we expect that the consumer will present the VC at an in-person interaction alongside a physical photo ID, so it's sufficient to bind the VC to a facial image that matches the consumer's physical photo ID * `Patient` with name and a photo of sufficient quality For the use cases considered here (e.g., presenting vaccination history to a school), these identifiers (name, phone number) are unlikey to risk over-sharing, since the school will already know these names and phone numbers; including these details provides just enough information to match records between the issuing lab and the school. In other scenarios, e.g. for showing a VC to a security guard on the way into a building, including a phone number might be entirely inappropriate. There is no one-size-fits-all approach here, but it's useful to call out common scenarios to drive toward consistent practices. *Note that if we want to support both presentation contexts, **we'll need to generate two VCs (one per presentation context)**, so the consumer can share the appropriate VC for any given context.* #### Example An example FHIR `Bundle` containing one `DiagnosticReport` linking to an `Observation` resource with lab results for Varicella Zoster IgG, plus a `Patient` resource containing name and phone number, suitable for online presentation. Note in the following example that all references between FHIR resources have been converted to UUID-based Bundle-internal references. This is to avoid conveying unnecessary detail about the information source of base URLs (if any), and to focus only on the data and relationships at stake in this VC. * [See example covid19 VC](https://github.com/microsoft-healthcare-madison/health-wallet-demo/blob/master/src/fixtures/vc.json) The issuer would create a VC in this form and sign it by following the rules for creating a JSON Web Token with a signature (JWS), as described at https://www.w3.org/TR/vc-data-model/#jwt-encoding. ## Diverse presentation scenarios Above, we discussed two common scenarios for online and in-person presentation. While VCs work beautifully in combination with mobile app-based presentations, it's also possible to present them through any number of means, including direct uploads, email, or print-and-display QR codes. We'll leave the details of how these presentations are done to a follow-on blog post. ## Conclusions VCs and FHIR are a natural combination: the standards can be combined to express a complete set of details about who is asserting what about whom. As consumer adoption of FHIR grows, the VC model provides an important solution to the challenge of "how can I share data when authenticity matters?" In this post, we've explored the design considerations for using FHIR content resources and identity resources to provide "just enough" data to ensure useful verifiable credentials.