# FHIR Bulk Publish Operation - Draft ### Summary of FHIR Bulk Data Operations | **Category** | **Bulk Export** | **Bulk Submit** | **Bulk Publish** | |----------------------------|--------------------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------------------------------| | **Driving Use Case** | Replace ad-hoc CSV exports with a standard API and data structures | Submit pre-coordinated datasets to an external organization | Make a large, relatively static dataset available in a standard format | | **Example** | Export clinical data on a research cohort from an EHR for analysis | Provide required clinical data to a regulatory agency annually | Post provider directory and scheduling data on the web for downstream apps | | **Cohort and data elements** | **Recipient specifies** | Provider defines | Provider defines | | **Kick-off workflow** | Recipient pull | **Provider push** | Recipient pull | | **Cardinality** | One provider to one recipient | One provider to one recipient | **One provider to many recipients** | | **Feedback channel** | Out of band | **In band** | Out of band | ### Audience and Scope This implementation guide is intended to be used by developers of backend services (clients) and data providers (servers) that aim to interoperate by sharing large FHIR datasets. The guide defines the application programming interfaces (APIs) through which a client may retrieve FHIR bulk-data files from a server. These files may be provided at an open endpoint, or may require the client to authenticate and authorize access to retrieve the data. In contrast to the [Bulk Export Operation](https://build.fhir.org/ig/HL7/bulk-data/en/export.html), the Bulk Publish operation ($bulk-publish) returns static manifests and bulk data files, and does not provide a mechanism for a client to retrieve a filtered subset of the available data. Systems that return infrequently updated reference information may wish to use the Bulk Publish operation instead of the Bulk Export operation to reduce the complexity and cost involved in hosting and providing this information. Expected use cases include the publication of provider directory information, formulary information and open scheduling slots. The Bulk Publish API does not require a FHIR server implementation, and many deployments may be a simple HTTP server that returns a Bulk Publish manifest in response to GET requests at a path that ends in `/$bulk-publish`, and a set of HTTP endpoints that serve the bulk data files referenced from that manifest. ### Security Considerations All exchanges described herein between a client and a server SHOULD be secured using [Transport Layer Security (TLS) Protocol Version 1.2 (RFC5246)](https://tools.ietf.org/html/rfc5246) or a more recent version of TLS. Use of mutual TLS is OPTIONAL. With each of the requests described herein, implementers MAY implement OAuth 2.0 access management in accordance with the [SMART Backend Services Authorization Profile](authorization.html). ### Manifest Request Request for fully static or periodically updated dataset in FHIR format. GET `[base]/$bulk-publish` - A server SHALL support retrieval of a Bulk Publish manifest through a HTTP GET request to an endpoint that terminates with a $bulk-publish segment. - A client SHOULD include the conditional request HTTP header `If-None-Match` with each request to avoid retrieving data when nothing has changed since the last request. Servers SHOULD support the use of this header. - When the `If-None-Match` value matches the current `ETag`, servers MAY return `304 Not Modified`; otherwise return `200 OK` with the manifest. #### Response - Error - HTTP status code of `4XX` or `5XX` - `Content-Type` header of `application/fhir+json` - The body of the response SHOULD be a FHIR `OperationOutcome` resource in JSON format. If this is not possible (for example, the infrastructure layer returning the error is not FHIR aware), the server MAY return an error message in another format and include a corresponding value for the `Content-Type` header. #### Response - Output Manifest - HTTP status of `200 OK` - `Content-Type` header of `application/fhir+json` - `ETag` header that changes when the manifest body changes - Body of output manifest (see below) The output manifest is a JSON object providing metadata and links to the generated FHIR Bulk Data files. These files SHALL be accessible to the client at the URLs advertised. The manifest and these URLs MAY be served by file servers other than a FHIR-specific server. The server MAY update the manifest at any time and SHALL use the `transactionTime` element to indicate when the files were generated. The response SHOULD NOT include any FHIR resources modified after this instant, and SHALL include any matching resources modified up to and including this instant. File URLs SHALL not be reused between updates unless their contents have remained the same, and files SHOULD remain available for a grace period following an update to avoid interrupting downloads that are in progress. The server SHOULD populate the `extension.updateCadence` period to indicate the frequency with which the server expects to update the manifest. Servers SHOULD set a reasonable `Cache-Control` header on the manifest (e.g., public, max-age=10) and SHOULD serve immutable files with long‑lived caching headers (e.g., public, max-age=31536000, immutable). ##### Incremental Updates The server MAY incrementally update a manifest by adding data files to the `output` array element that contain new resources and/or resources that replace versions of the resources in earlier files in the `output` array that have the same resource id. Additionally, the server MAY add files with Bundle resources indicating resources that have been deleted to the `deleted` array element (see details below), and MAY add files to the `error` array element. When generating a manifest that will be subsequently updated with these incremental changes, the server SHALL populate an `extension.epochStartTime` element. When initially published, this value SHALL have the same value the `transactionTime` element. Subsequently, adding files to the `output` array, `deleted` array and `error` array of a manifest will cause the `transactionTime` element for that manifest to advance, but the `extension.epochStartTime` value will remain the same. Periodically, the server MAY generate a manifest that is a complete snapshot of the data (a new epoch), updating the `output` array and `error` array, emptying the `deleted` array, and setting new `extension.epochStartTime` and `transactionTime` values. When a manifest is incrementally updated, apart from when it is reset to a new epoch, the order of files in the `output`, `deleted`, and `error` arrays in the manifest SHALL not change, the file contents SHALL not change, and the files SHALL remain retrievable. Servers SHALL structure the manifests such that a client can obtain a complete data set when processing a manifest by (1) inserting or updating all FHIR resources in files in the `output` array that haven't been previously processed, followed by (2) deleting all resources listed in files in the `deleted` array that haven't been previously processed. ##### Elements `operationDefinition` — 1..1 canonical Official URL for the OperationDefinition associated with the generation of this manifest. For Bulk Publish this SHALL be `http://hl7.org/fhir/uv/bulkdata/OperationDefinition/bulk-publish|1.0.0`or an operation based on this operation. `transactionTime` — 1..1 FHIR instant Indicates the server's time when the query is run. The response SHOULD NOT include any resources modified after this instant, and SHALL include any matching resources modified up to and including this instant. Note: To properly meet these constraints, a FHIR server might need to wait for any pending transactions to resolve in its database before starting an export process. `extension.epochStartTime` - 0..1 FHIR instant The timestamp when the client resets to files that incorporate deleted items and past incremental updates. When the epoch changes, the `epochStartTime` and the `transactionTime` SHALL be identical. Subsequently, the server may update the manifest output file list, delete file list, error file list. Once returned in a manifest, a file's contents SHALL not be changed and outside of an epoch change, a file SHALL NOT be removed from a manifest file list. Servers that incrementally update a manifest and periodically reset to a snapshot of the data SHALL populate this element. Servers that always return a manifest that's a snapshot of the data MAY populate this element or MAY omit this element. Within an epoch, file lists are append-only and file contents are immutable; an epoch reset establishes a new baseline. `extension.updateCadence` - 0..1 [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) The typical rate new files will be added to the manifest (e.g., "PT1H") `outputFormat` — 0..1 MIME type Format of the Bulk Data output files. If omitted, defaults to `application/fhir+ndjson`. Corresponds to the `_outputFormat` parameter in a Bulk Export operation. `requiresAccessToken` — 1..1 boolean Indicates whether downloading the generated files requires the same authorization mechanism as accessing a `$bulk-publish` manifest. Value SHALL be `true` if the file server controls access using OAuth 2.0 bearer tokens. Value MAY be `false` for file servers that use access-control schemes other than OAuth 2.0, such as downloads from Amazon S3 bucket URLs or verifiable file servers within an organization's firewall. When false, output URLs function as capability URLs. `outputOrganizedBy` — 0..1 FHIR Resource Type When resources in the output files are organized by *instances* of a resource type and include a header for each resource of the type, followed by the resource and resources in the output that contain references to that resource, that resource type SHALL be specified in this element (See <a href="#bulk-data-output-file-organization">details on this approach</a> below). Alternatively, when each output file in the output contains a single resource type, this element SHALL be omitted and an individual `type` element SHALL be included for each file in the output array. `extension.outputOrganizedByDetail` 0..1 string Narrative text that provides detail on the organizing resource listed in the outputOrganizedBy (for example, when the output is organized by organization resources for parent-level organizations). This element SHALL NOT be populated in the absence of the `outputOrganizedBy` element. `output` — 1..1 array An array of file items with one entry for each file being provided. If no files are available, the server SHOULD return an empty array. The `url` element SHALL be populated for each output item. When the manifest does not include `outputOrganizedBy`, the `type` element SHALL be populated for each item. When the manifest includes `outputOrganizedBy`, the `type` element SHALL NOT be populated. When the manifest includes `outputOrganizedBy` and resources associated with a single organizing resource span multiple files, the `continuesInFile` element SHALL be populated with the URL of the subsequent file. - `type` - 0..1 FHIR Resource Type FHIR resource type that is contained in the file. - `url` — 1..1 URL Absolute path to the file. - `continuesInFile` 0..1 URL URL of the output file when resources associated with a FHIR resource of the type specified in the `outputOrganizedBy` element in this file continue into another file. See “Bulk Data Output File Organization” below. - `count` 0..1 integer The number of resources in the file, represented as a JSON number. `deleted` — 0..1 array An array of deleted file items following the same structure as the `output` array. This array SHALL be populated with output files containing FHIR Transaction Bundles that indicate which FHIR resources are present in earlier files during the current epoch and were subsequently deleted. If no resources have been deleted, the server MAY omit this key or MAY return an empty array. Resources that appear in the `deleted` section of the manifest SHOULD also be present in the `output` section of the manifest. Each line in the output file SHALL contain a FHIR Bundle with a type of `transaction` which SHALL contain one or more entry items that reflect a deleted resource. In each entry, the `request.url` and `request.method` elements SHALL be populated. The `request.method` element SHALL be set to `DELETE`. Example deleted resource bundle (represents one line in output file): ```js { "resourceType": "Bundle", "id": "bundle-transaction", "meta": {"lastUpdated": "2020-04-27T02:56:00Z"}, "type": "transaction", "entry":[{ "request": {"method": "DELETE", "url": "Patient/123"}, ... }] } ``` `error` — 1..1 array Array of message file items following the same structure as the `output` array. Error, warning, and information messages related to the resources included in the manifest SHOULD be included here (not in output). If there are no relevant messages, the server SHOULD return an empty array. Only the FHIR `OperationOutcome` resource type is currently supported, so the server SHALL generate files in the same format as Bulk Data output files that contain FHIR `OperationOutcome` resources. The server MAY include an extension for each item in the `error` section named `countSeverity` containing an object with keys of the `OperationOutcome.severity` codes present in that file and values of the number of instances of each code. `extension` — 0..1 object The manifest reserves the name `extension`, allowing server implementations to use it to provide custom behavior and information. For example, a server may choose to provide a custom extension that contains a decryption key for encrypted NDJSON files. The value of an extension element SHALL be a pre-coordinated JSON object. Extensions MAY be added to any JSON object in the manifest. Example manifest at the epoch start: ```js { "manifestType": "http://hl7.org/fhir/uv/bulkdata/OperationDefinition/bulk-publish", "extension": { "epochStartTime": "2021-01-01T00:00:00Z", "updateCadence": "PT1H" }, "transactionTime": "2021-01-01T00:00:00Z", "requiresAccessToken" : false, "output" : [{ "type" : "Organization", "url" : "https://example.com/output/organization_1.ndjson" },{ "type" : "Organization", "url" : "https://example.com/output/organization_2.ndjson" }], "error" : [] } ``` Example manifest after first incremental update: ```js { "manifestType": "http://hl7.org/fhir/uv/bulkdata/OperationDefinition/bulk-publish", "extension": { "epochStartTime": "2021-01-01T00:00:00Z", "updateCadence": "PT1H" }, "transactionTime": "2021-01-01T01:00:00Z", "requiresAccessToken" : false, "output" : [{ "type" : "Organization", "url" : "https://example.com/output/organization_1.ndjson" },{ "type" : "Organization", "url" : "https://example.com/output/organization_2.ndjson" },{ "type" : "Organization", "url" : "https://example.com/output/organization_3.ndjson" }], "deleted": [{ "type" : "Bundle", "url" : "https://example.com/output/deleted_org_1.ndjson" }], "error" : [] } ``` --- ### Bulk Data Output File Request Using the URLs supplied by the server in the manifest, a client MAY download the generated Bulk Data files. If the `requiresAccessToken` element in the manifest is set to `true`, the request SHALL include a valid access token. See [Privacy and Security Considerations](#privacy-and-security-considerations) above. If the `requiresAccessToken` element is set to `false` and no additional authorization-related extensions are present in the manifest's output entry, then the output URLs SHALL be dereferenceable directly (a "capability URL"). A client SHALL NOT provide a SMART Backend Services access token when dereferencing an output URL where `requiresAccessToken` is `false`. The data files SHALL include only the most recent version of any resources unless the client explicitly requests different behavior in a fashion supported by the server. Inclusion of the `Resource.meta` information in the resources is at the discretion of the server (as it is for all FHIR interactions). A client SHOULD provide an `Accept-Encoding` header when requesting output files and SHOULD include `gzip` compression as one of the encoding options in the header. A server SHALL provide output files as uncompressed, with `gzip` compression, or with another compression format from the `Accept-Encoding` header. When compression is used, a server SHALL communicate this to the client by including a `Content-Encoding` header in the response. A client SHALL accept files that are uncompressed or encoded with `gzip` compression, and MAY accept files encoded with other compression formats. #### Endpoint `GET [url from manifest output element]` #### Headers - `Accept` (optional, defaults to `application/fhir+ndjson`) Specifies the format of the file being requested. #### Response - Success - HTTP status of `200 OK` - `Content-Type` header that matches the file format being delivered. For files in NDJSON format, SHALL be `application/fhir+ndjson` - Body of FHIR resources in newline delimited json - [NDJSON](https://github.com/ndjson/ndjson-spec) or other requested format #### Response - Error - HTTP Status Code of `4XX` or `5XX` ### Bulk Data Output File Organization Output files may be organized by resource type, or by instances of a resource type specified in the `outputOrganizedBy` element. When the `outputOrganizedBy` element in the manifest is not populated, each output file SHALL contain resources of only one type, and a server MAY create more than one file for each resource type returned. The number of resources contained in a file MAY vary between servers and files. When the `outputOrganizedBy` element is populated with a resource type, the output files SHALL be populated with blocks consisting of a header `Parameters` resource containing a parameter named `header` with a reference to a resource of the type specified by `outputOrganizedBy`, followed by the resource referenced in this header and resources that reference the resource referenced in the header (together a "resource block"). Each output file MAY contain multiple resource blocks and, when possible, a single resource's block SHOULD NOT be split across files. If a resource block does span more than one file, the header SHALL be repeated at the start of each file where the block continues, and the association between these files SHALL be documented in the manifest using the `continuesInFile` element in the relevant `output` array items. Resources that would otherwise be included in the data set, but do not have references to the resource type specified in the `outputOrganizedBy` element MAY be included in resource blocks that contain resources they reference, MAY be repeated in every resource block, or MAY be omitted from the data set. <div class="stu-note"> When the <code>outputOrganizedBy</code> element is set to <code>Patient</code>, servers SHOULD use the <a href="https://www.hl7.org/fhir/compartmentdefinition-patient.html">Patient Compartment Definition</a> to determine a base set of related resources to include in a resource block, though other resources may also be included. For other resource types, we are soliciting feedback on the best approach for documenting the set of resources in a resource block. Implementation Guides MAY reference a <a href="https://www.hl7.org/fhir/compartmentdefinition.html">Compartment Definition</a>, populate a <a href="https://www.hl7.org/fhir/graphdefinition.html">GraphDefinition Resource</a>, include narrative text, or use another approach. </div> Example NDJSON file when the manifest does not include `outputOrganizedBy`: ```js {"id":"p-1","resourceType":"Patient", "name":[{"given":["Brenda"],"family":"Jackson"}],"gender":"female", ...} {"id":"p-2","resourceType":"Patient", "name":[{"given":["Bram"],"family":"Sandeep"}],"gender":"male", ...} {"id":"p-3","resourceType":"Patient", "name":[{"given":["Sandy"],"family":"Hamlin"}],"gender":"female", ...} {...} ``` <a name="organize-output-by-file-example"></a> Example NDJSON file when `outputOrganizedBy` is set to `Patient`: ```js {"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-1"}}]} {"id": "p-1", "resourceType": "Patient", ...} {"id": "c-1", "resourceType": "Condition", "subject":{"reference": "Patient/p-1"}, ...} {"id": "o-1", "resourceType": "Observation", "subject":{"reference": "Patient/p-1"}, ...} {...} {"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-2"}}]} {"id": "p-2", "resourceType": "Patient", ...} {"id": "c-101", "resourceType": "Condition", "subject":{"reference": "Patient/p-2"}, ...} {"id": "o-102", "resourceType": "Observation", "subject":{"reference": "Patient/p-2"}, ...} {...} ``` #### Attachments If resources in an output file contain elements of the type `Attachment`, the server SHOULD populate the `Attachment.contentType` code as well as either the `data` element or the `url` element. If the data element is not populated and the `url` element is populated, the `url` element SHALL be an absolute URL that can be dereferenced to the attachment's content. When the `url` element is populated with an absolute URL and the `requiresAccessToken` element in the manifest is set to `true`, the URL location must be accessible by a client with a valid access token, and SHALL NOT require the use of additional authentication credentials. When the `url` element is populated and the `requiresAccessToken` element in the manifest is set to `false`, the URL location must be accessible by a client without an access token. Note that if a server copies files to the Bulk Data output endpoint or proxies requests to facilitate access from this endpoint, it may need to modify the `Attachment.url` element when generating the Bulk Data output files. ### Client Workflow ```mermaid flowchart TD initial_req(("Initial request")) initial_req --> get_req[GET /$bulk-publish] get_req --> has_update_cadence{updateCadence in manifest?} has_update_cadence -- Yes --> set_interval[Choose polling interval ≤ updateCadence] has_update_cadence -- No --> default_interval[Use a default interval] get_req --> download_all[Download all files included in manifest] download_all --> output_files[Process output array files in order. If resource with the same id has been processed, keep the last instance from the last file.] output_files --> deleted_files[Process deleted array files, removing resources] deleted_files --> error_files[Review error array files] error_files --> done(((Done))) subsequent_req(("Subsequent requests")) subsequent_req --> poll_req_sub[GET /$bulk-publish with If-None-Match = prior request ETag] poll_req_sub -- 304 Not Modified Header --> stop(((Done))) poll_req_sub -- 200 OK Header --> new_epoch{epochStartTime in manifest AND matches the value in the prior request?} new_epoch -- No --> clean_up[Remove prior data] clean_up --> download_all new_epoch -- Yes --> download_changed[Download only files not in the manifest of a prior request] download_changed --> output_files ``` #### Initial request to a server - GET `/$bulk-publish` - Download all files in `output` array and process FHIR resources, being aware that the same resource may be appear in the files more than once. In this case, the last instance of the resource in the last file in the array where it is present should be retained. - If present, download and process files in `deleted` array to remove resources. - If present, download and review contents of files in `error` array. - If `extension.updateCadence` is populated, choose polling interval for subsequent requets using a value ≤ `extension.updateCadence`. #### Subsequent request to a server - GET `/$bulk-publish` with `If-None-Match` header set to the `ETag` header value from the prior request. - `304 Not Modified` response status → nothing to do. - `200 OK` response status and `extension.epochStartTime` **is not** present in manifest OR does not match the value in the prior request: - Remove stored data - Follow steps in "initial request to server" flow above. - `200 OK` response status and`extension.epochStartTime` **is** present in manifest AND matches the value in the prior request: - Process only new tail file URLs not present in prior request: - Download new files in `output` array and process FHIR resources, being aware that the same resource may be appear in the files more than once and an earlier version of a resource may already have been processed. In this case, the last instance of the resource in the last file should be retained. - If present, download and process new files in `deleted` array to remove resources. - If present, download and review contents of new files in `error` array. #### Error handling - If any referenced file returns 404/410 while epoch hasn’t changed, the server is violating the invariant; clients MAY retry and/or alert. - If the manifest becomes temporarily unreachable (e.g., 5xx), back off and retry (exponential backoff bounded by the `extension.updateCadence`).