# Understanding FHIR & Provenance Resource
...in an attempt to achieve subject-level data lineage
---
[toc]
---
# **What is FHIR?**
* Fast Healthcare Interoperability Resources (FHIR) by HL7
* Supported by major vendors (e.g., Epic, Cerner, Apple)
* Standard for healthcare data exchange based on RESTful APIs
* CRUD operations (Create, Read, Update, Delete)
* Supports JSON & XML formats
* Designed for web-based, scale
* Modular data units called "resources"
* Highly extensible via extensions and profiles
---
> RESTful API is an interface that two computer systems use to exchange information securely over the internet. https://aws.amazon.com/what-is/restful-api/
---

[Link](https://build.fhir.org/overview-arch.html#frameworks)
---
# FHIR Resources
**Definition:** Modular components that represent specific healthcare concepts.
See https://build.fhir.org/resourcelist.html
**Examples resources:**
* `Patient`: Demographic and administrative information [Link](https://build.fhir.org/patient.html)
* `Observation`: Vital signs, lab results [Link](https://build.fhir.org/observation.html)
* `Medication`: Medication details [Link](https://build.fhir.org/medication.html)
* `Encounter`: Interaction between patient and healthcare provider [Link](https://build.fhir.org/encounter.html)
---
Each resource:
* Has a unique `id`
* Is accessible via REST API (`GET [base]/Patient/[id]`)
* Can reference other resources (`Observation.subject → Patient`)
* Can be grouped in `Bundle` for transactions or documents
---
# Example: RESTful API Impact on Data Lineage
## Scenario
A clinical system uses a RESTful API to transmit lab test results from an EHR to a central research database.
## Data Flow
1. **System A (EHR)** sends data via RESTful API:
```http
POST /lab-results
{
"patient_id": "1234",
"test": "Hemoglobin",
"result": "13.5",
"unit": "g/dL"
}
```
2. **System B (Lab Aggregator)** processes the data:
- Maps `"Hemoglobin"` to standardized LOINC code `"718-7"`.
- Converts units if needed.
- Applies business rules (e.g., discards low-confidence entries).
- Forwards processed data to System C.
3. **System C (Research DB)** receives:
```json
{
"patient_id": "1234",
"test_code": "718-7",
"result": "13.5",
"unit": "g/dL"
}
```
## Problem
- **System C** has no visibility into:
- The original value (`"Hemoglobin"`).
- The transformation logic applied.
- Metadata about the API call (timestamp, sender, endpoint).
- **No audit trail or transformation log** is preserved.
## Impact on Lineage
- Loss of:
- **Source attribution** (Who sent the data? From what system?).
- **Transformation traceability** (How was `"Hemoglobin"` mapped to `"718-7"`?).
- **Change auditability** (What rules or filters were applied?).
## Consequences
- Inability to:
- Reconstruct original data lineage.
- Validate or trace back results.
- Ensure transparency, reproducibility, and accountability.
---
# What is Provenance?
**Based on W3C Provenance Ontology (PROV-O)**

* **Agent:** Person or system initiating activity
* **Activity:** Action that generated or changed data
* **Entity:** Data that was used or created
**Relationships:**
* `wasGeneratedBy`, `wasAttributedTo`, `used`
---
## What is the Provenance Resource?
**Definition:** A FHIR resource that records the **origin, authorship, and context** of another resource.
**Purpose:**
* Identify WHO created, updated, or deleted data
* Capture WHEN and HOW data was generated or modified
* Link back to original sources (e.g., scanned forms, HL7 messages)
**Applications:**
* Data transformation or import (e.g., from HL7 V2 or CDA)
* Trust and verification in clinical decision-making
* Legal audit trails
---
## How Provenance Links to FHIR Resources
**Target Field (`Provenance.target`)**
* Points to one or more FHIR resources whose origin it documents
* Examples: `Observation/123`, `MedicationAdministration/45`
**Multi-resource Support:**
* Multiple resources can be tracked in a single Provenance
* Useful in batch imports or bulk operations
**Practical Use:**
* Every POST/PUT in a system can be paired with Provenance to maintain traceability
---
# Key Fields in Provenance
| Field | Description |
| ------------- | --------------------------------------------------------------- |
| `target` | Resource(s) being documented |
| `agent` | The person/system/device that performed the activity |
| `entity` | Source material used in the activity (e.g., a scanned doc) |
| `recorded` | Timestamp of when the provenance was captured |
| `occurred[x]` | When the actual activity took place (can be DateTime or Period) |
| `reason` | Justification for the activity |
| `activity` | Describes the action taken (e.g., creation, update, import) |
| `signature` | Cryptographic signature (optional) |
---
## Capturing Who, What, When, How
**Who:** `agent.who` — Practitioner, Patient, Device, Organization
**What:** `target` — FHIR resource being tracked
**When:** `occurredDateTime` or `occurredPeriod`, and `recorded`
**How:** `activity`, `agent.type`, `entity.role`
---
## Example of Tracing a Lab Result
**Use Case:** Trace the origin of a Hemoglobin lab result (`Observation/123`)
**Provenance Fields:**
* **Agent:** Practitioner/Dr. Smith (`agent.who`)
* **Activity:** Entry from Lab System (`activity = record`)
* **Entity:** Source system `Binary/hl7v2message` or DocumentReference
* **Target:** `Observation/123`
**Insight Gained:** You know who entered the lab value, when, and using what source.
---
## Chained Provenance Support
**What is Chaining?**
* Multiple Provenance records referencing each other via `entity` and `target`
* Useful when a derived resource is created from another FHIR resource or external content
**Example:**
* Step 1: HL7 V2 message → Provenance A → Patient resource
* Step 2: Patient → Provenance B → Derived QuestionnaireResponse
**FHIR Supports** via multiple Provenance instances, only
**Challenges:**
* Requires careful modeling
* Visualization and debugging may be complex
---
## Limitations of Provenance
| Limitation | Details |
| -------------------- | ----------------------------------------------------------------------- |
| Version Handling | Hard to create version-specific reference before creation |
| HTTP Header Limits | `X-Provenance` header size may exceed server limits |
| Server Inconsistency | Not all FHIR servers support recording or interpreting provenance |
| Real-time Capture | Not equivalent to full real-time audit logging |
| Chain Management | Tracing across multiple layers (import → transform → create) is complex |
---
h/t Trinath Panda
---