---
# System prepended metadata

title: Understanding FHIR & Provenance Resource
tags: [RWE, CDISC, FHIR]

---

# Understanding FHIR & Provenance Resource
...in an attempt to achieve subject-level data lineage 


---


[toc]


---

# **What is FHIR?**

* Fast Healthcare Interoperability Resources (FHIR) by HL7
* Supported by major vendors (e.g., Epic, Cerner, Apple)
* Standard for healthcare data exchange based on RESTful APIs
    * CRUD operations (Create, Read, Update, Delete)
    * Supports JSON & XML formats
    * Designed for web-based, scale 
* Modular data units called "resources"
    * Highly extensible via extensions and profiles

---

> RESTful API is an interface that two computer systems use to exchange information securely over the internet. https://aws.amazon.com/what-is/restful-api/



---

![image](https://hackmd.io/_uploads/BJ64FM9Zlg.png)
[Link](https://build.fhir.org/overview-arch.html#frameworks)


---

# FHIR Resources

**Definition:** Modular components that represent specific healthcare concepts.
See https://build.fhir.org/resourcelist.html

**Examples resources:**

* `Patient`: Demographic and administrative information [Link](https://build.fhir.org/patient.html)
* `Observation`: Vital signs, lab results [Link](https://build.fhir.org/observation.html)
* `Medication`: Medication details [Link](https://build.fhir.org/medication.html)
* `Encounter`: Interaction between patient and healthcare provider [Link](https://build.fhir.org/encounter.html)

---

Each resource:
* Has a unique `id`
* Is accessible via REST API (`GET [base]/Patient/[id]`)
* Can reference other resources (`Observation.subject → Patient`)
* Can be grouped in `Bundle` for transactions or documents

---

# Example: RESTful API Impact on Data Lineage

## Scenario
A clinical system uses a RESTful API to transmit lab test results from an EHR to a central research database.

## Data Flow

1. **System A (EHR)** sends data via RESTful API:

    ```http
    POST /lab-results
    {
      "patient_id": "1234",
      "test": "Hemoglobin",
      "result": "13.5",
      "unit": "g/dL"
    }
    ```

2. **System B (Lab Aggregator)** processes the data:
    - Maps `"Hemoglobin"` to standardized LOINC code `"718-7"`.
    - Converts units if needed.
    - Applies business rules (e.g., discards low-confidence entries).
    - Forwards processed data to System C.

3. **System C (Research DB)** receives:

    ```json
    {
      "patient_id": "1234",
      "test_code": "718-7",
      "result": "13.5",
      "unit": "g/dL"
    }
    ```

## Problem

- **System C** has no visibility into:
  - The original value (`"Hemoglobin"`).
  - The transformation logic applied.
  - Metadata about the API call (timestamp, sender, endpoint).
- **No audit trail or transformation log** is preserved.

## Impact on Lineage

- Loss of:
  - **Source attribution** (Who sent the data? From what system?).
  - **Transformation traceability** (How was `"Hemoglobin"` mapped to `"718-7"`?).
  - **Change auditability** (What rules or filters were applied?).

## Consequences

- Inability to:
  - Reconstruct original data lineage.
  - Validate or trace back results.
  - Ensure transparency, reproducibility, and accountability.


---


# What is Provenance?

**Based on W3C Provenance Ontology (PROV-O)**

![image](https://hackmd.io/_uploads/BkY9N3KZee.png)


* **Agent:** Person or system initiating activity
* **Activity:** Action that generated or changed data
* **Entity:** Data that was used or created

**Relationships:**

* `wasGeneratedBy`, `wasAttributedTo`, `used`


---

## What is the Provenance Resource?

**Definition:** A FHIR resource that records the **origin, authorship, and context** of another resource.

**Purpose:**

* Identify WHO created, updated, or deleted data
* Capture WHEN and HOW data was generated or modified
* Link back to original sources (e.g., scanned forms, HL7 messages)

**Applications:**

* Data transformation or import (e.g., from HL7 V2 or CDA)
* Trust and verification in clinical decision-making
* Legal audit trails

---

## How Provenance Links to FHIR Resources

**Target Field (`Provenance.target`)**

* Points to one or more FHIR resources whose origin it documents
* Examples: `Observation/123`, `MedicationAdministration/45`

**Multi-resource Support:**

* Multiple resources can be tracked in a single Provenance
* Useful in batch imports or bulk operations

**Practical Use:**

* Every POST/PUT in a system can be paired with Provenance to maintain traceability

---

# Key Fields in Provenance

| Field         | Description                                                     |
| ------------- | --------------------------------------------------------------- |
| `target`      | Resource(s) being documented                                    |
| `agent`       | The person/system/device that performed the activity            |
| `entity`      | Source material used in the activity (e.g., a scanned doc)      |
| `recorded`    | Timestamp of when the provenance was captured                   |
| `occurred[x]` | When the actual activity took place (can be DateTime or Period) |
| `reason`      | Justification for the activity                                  |
| `activity`    | Describes the action taken (e.g., creation, update, import)     |
| `signature`   | Cryptographic signature (optional)                              |

---

## Capturing Who, What, When, How

**Who:** `agent.who` — Practitioner, Patient, Device, Organization
**What:** `target` — FHIR resource being tracked
**When:** `occurredDateTime` or `occurredPeriod`, and `recorded`
**How:** `activity`, `agent.type`, `entity.role`


---

## Example of Tracing a Lab Result  

**Use Case:** Trace the origin of a Hemoglobin lab result (`Observation/123`)

**Provenance Fields:**

* **Agent:** Practitioner/Dr. Smith (`agent.who`)
* **Activity:** Entry from Lab System (`activity = record`)
* **Entity:** Source system `Binary/hl7v2message` or DocumentReference
* **Target:** `Observation/123`

**Insight Gained:** You know who entered the lab value, when, and using what source.

---

## Chained Provenance Support

**What is Chaining?**

* Multiple Provenance records referencing each other via `entity` and `target`
* Useful when a derived resource is created from another FHIR resource or external content

**Example:**

* Step 1: HL7 V2 message → Provenance A → Patient resource
* Step 2: Patient → Provenance B → Derived QuestionnaireResponse

**FHIR Supports** via multiple Provenance instances, only

**Challenges:**

* Requires careful modeling
* Visualization and debugging may be complex

---

## Limitations of Provenance

| Limitation           | Details                                                                 |
| -------------------- | ----------------------------------------------------------------------- |
| Version Handling     | Hard to create version-specific reference before creation               |
| HTTP Header Limits   | `X-Provenance` header size may exceed server limits                     |
| Server Inconsistency | Not all FHIR servers support recording or interpreting provenance       |
| Real-time Capture    | Not equivalent to full real-time audit logging                          |
| Chain Management     | Tracing across multiple layers (import → transform → create) is complex |

---

h/t Trinath Panda

---
