# Understanding FHIR & Provenance Resource ...in an attempt to achieve subject-level data lineage --- [toc] --- # **What is FHIR?** * Fast Healthcare Interoperability Resources (FHIR) by HL7 * Supported by major vendors (e.g., Epic, Cerner, Apple) * Standard for healthcare data exchange based on RESTful APIs * CRUD operations (Create, Read, Update, Delete) * Supports JSON & XML formats * Designed for web-based, scale * Modular data units called "resources" * Highly extensible via extensions and profiles --- > RESTful API is an interface that two computer systems use to exchange information securely over the internet. https://aws.amazon.com/what-is/restful-api/ --- ![image](https://hackmd.io/_uploads/BJ64FM9Zlg.png) [Link](https://build.fhir.org/overview-arch.html#frameworks) --- # FHIR Resources **Definition:** Modular components that represent specific healthcare concepts. See https://build.fhir.org/resourcelist.html **Examples resources:** * `Patient`: Demographic and administrative information [Link](https://build.fhir.org/patient.html) * `Observation`: Vital signs, lab results [Link](https://build.fhir.org/observation.html) * `Medication`: Medication details [Link](https://build.fhir.org/medication.html) * `Encounter`: Interaction between patient and healthcare provider [Link](https://build.fhir.org/encounter.html) --- Each resource: * Has a unique `id` * Is accessible via REST API (`GET [base]/Patient/[id]`) * Can reference other resources (`Observation.subject → Patient`) * Can be grouped in `Bundle` for transactions or documents --- # Example: RESTful API Impact on Data Lineage ## Scenario A clinical system uses a RESTful API to transmit lab test results from an EHR to a central research database. ## Data Flow 1. **System A (EHR)** sends data via RESTful API: ```http POST /lab-results { "patient_id": "1234", "test": "Hemoglobin", "result": "13.5", "unit": "g/dL" } ``` 2. **System B (Lab Aggregator)** processes the data: - Maps `"Hemoglobin"` to standardized LOINC code `"718-7"`. - Converts units if needed. - Applies business rules (e.g., discards low-confidence entries). - Forwards processed data to System C. 3. **System C (Research DB)** receives: ```json { "patient_id": "1234", "test_code": "718-7", "result": "13.5", "unit": "g/dL" } ``` ## Problem - **System C** has no visibility into: - The original value (`"Hemoglobin"`). - The transformation logic applied. - Metadata about the API call (timestamp, sender, endpoint). - **No audit trail or transformation log** is preserved. ## Impact on Lineage - Loss of: - **Source attribution** (Who sent the data? From what system?). - **Transformation traceability** (How was `"Hemoglobin"` mapped to `"718-7"`?). - **Change auditability** (What rules or filters were applied?). ## Consequences - Inability to: - Reconstruct original data lineage. - Validate or trace back results. - Ensure transparency, reproducibility, and accountability. --- # What is Provenance? **Based on W3C Provenance Ontology (PROV-O)** ![image](https://hackmd.io/_uploads/BkY9N3KZee.png) * **Agent:** Person or system initiating activity * **Activity:** Action that generated or changed data * **Entity:** Data that was used or created **Relationships:** * `wasGeneratedBy`, `wasAttributedTo`, `used` --- ## What is the Provenance Resource? **Definition:** A FHIR resource that records the **origin, authorship, and context** of another resource. **Purpose:** * Identify WHO created, updated, or deleted data * Capture WHEN and HOW data was generated or modified * Link back to original sources (e.g., scanned forms, HL7 messages) **Applications:** * Data transformation or import (e.g., from HL7 V2 or CDA) * Trust and verification in clinical decision-making * Legal audit trails --- ## How Provenance Links to FHIR Resources **Target Field (`Provenance.target`)** * Points to one or more FHIR resources whose origin it documents * Examples: `Observation/123`, `MedicationAdministration/45` **Multi-resource Support:** * Multiple resources can be tracked in a single Provenance * Useful in batch imports or bulk operations **Practical Use:** * Every POST/PUT in a system can be paired with Provenance to maintain traceability --- # Key Fields in Provenance | Field | Description | | ------------- | --------------------------------------------------------------- | | `target` | Resource(s) being documented | | `agent` | The person/system/device that performed the activity | | `entity` | Source material used in the activity (e.g., a scanned doc) | | `recorded` | Timestamp of when the provenance was captured | | `occurred[x]` | When the actual activity took place (can be DateTime or Period) | | `reason` | Justification for the activity | | `activity` | Describes the action taken (e.g., creation, update, import) | | `signature` | Cryptographic signature (optional) | --- ## Capturing Who, What, When, How **Who:** `agent.who` — Practitioner, Patient, Device, Organization **What:** `target` — FHIR resource being tracked **When:** `occurredDateTime` or `occurredPeriod`, and `recorded` **How:** `activity`, `agent.type`, `entity.role` --- ## Example of Tracing a Lab Result **Use Case:** Trace the origin of a Hemoglobin lab result (`Observation/123`) **Provenance Fields:** * **Agent:** Practitioner/Dr. Smith (`agent.who`) * **Activity:** Entry from Lab System (`activity = record`) * **Entity:** Source system `Binary/hl7v2message` or DocumentReference * **Target:** `Observation/123` **Insight Gained:** You know who entered the lab value, when, and using what source. --- ## Chained Provenance Support **What is Chaining?** * Multiple Provenance records referencing each other via `entity` and `target` * Useful when a derived resource is created from another FHIR resource or external content **Example:** * Step 1: HL7 V2 message → Provenance A → Patient resource * Step 2: Patient → Provenance B → Derived QuestionnaireResponse **FHIR Supports** via multiple Provenance instances, only **Challenges:** * Requires careful modeling * Visualization and debugging may be complex --- ## Limitations of Provenance | Limitation | Details | | -------------------- | ----------------------------------------------------------------------- | | Version Handling | Hard to create version-specific reference before creation | | HTTP Header Limits | `X-Provenance` header size may exceed server limits | | Server Inconsistency | Not all FHIR servers support recording or interpreting provenance | | Real-time Capture | Not equivalent to full real-time audit logging | | Chain Management | Tracing across multiple layers (import → transform → create) is complex | --- h/t Trinath Panda ---