# Bulk Optimize Notes ## Maintenance Changes ### Add capability URL guidance to spec - https://confluence.hl7.org/display/FHIRI/Capability+URLs+for+Download+Links - When requiresAccessToken is false and no additional authorization-related extensions are present on a Bulk Data completion manifest's output entry, then the output URLs SHALL be dereferenceable directly, and SHALL follow expiration timing requirement that have been documented for bearer tokens in SMART Backend Services (specifically: "SHALL be short-lived"). - Clients MAY re-fetch the output manifest if output links have expired. - Clients MAY use the Expires header on the output response, when present, as a hint to know when capability URLs will expire. - Clients SHALL NOT provide a SMART Backend Services access token when dereferencing an output URL where requiresAccessToken is false. - As long as servers are following relevant security guidance, they MAY choose to generate output manifests where requiresAccessToken is true or false; this applies even for servers available on the public internet. ### Add page for Bulk Publish pattern - Bulk export for pre-generated bulk files - https://jira.hl7.org/browse/FHIR-43285 ### Technical Corrections - https://jira.hl7.org/browse/FHIR-44602 - clarify capability documentation guidance on resources per file - https://jira.hl7.org/browse/FHIR-43053 - show cardinalities in parameter table - https://jira.hl7.org/browse/FHIR-39538 and https://jira.hl7.org/browse/FHIR-34235 (duplicates) - operation name in Capability statement - https://jira.hl7.org/browse/FHIR-39537 - add instantiates element to Capability statement ## Potential New Functionality ### `_typeFilter` parameter - Move from experimental to optional since it's been implemented in a number of production systems including those from Cerner, Epic, Microsoft, and HAPI - Clarify guidance on which search parameters may be included - Is there a way to express this that's not version specific - Maybe exclude [Search Result Parameters](https://build.fhir.org/search.html#modifyingresults)? - Cerner: > _typeFilter supports queries with the search parameters supported for the R4 resources with the exception of the following search parameters: _count, _contained, _containedType, _elements, _id, _include, _lastUpdated, _revinclude, _sort, _summary, _total, encounter, identifier, patient, subject. - Should `_lastUpdated` be excluded? - Epic: > Note that the following are not supported by _typeFilter: > - Searching by patient, subject, or _id > - Query strings with search result parameters, such as _count, _include, or _revInclude > - Query strings for the Patient resource > - Query strings for resources that don't contain patient information > - Applying the search query strings to resources included by reference - What about _id, or Patient parameters - Document how `_typeFilter` works with `_type` - ```mermaid flowchart TB A[Start Bulk-export Request] --> B{Any _type supplied?} B -- Yes --> C{{For each _type 't'\nsupplied}} C --> D{Any _typeFilter\nsupplied\nfor type 't'?} D -- Yes --> E1{{For each _typeFilter 'f'\nsupplied for type 't'}} E1 --> G[Emit all resources\n- of type 't'\n- matching filter 'f'\n- not yet emitted] --> K D -- No --> H[Emit all resources\n- of type 't'] B -- No --> J[Emit all resources] H --> K J --> K[Include additional resources as determined by server] K --> L[End of Process] ``` * e.g. - `_type=Patient,Observation&_typeFilter=Observation?category=vitals&_typeFilter=Observation?date=2020` means "all Patient resources, and all Observation resources that are Vitals, and all Observation resources from 2020" - `_type=Patient,Observation&_typeFilter=Observation?category=vitals&date=2020` means "all Patient resources, and all Observation resources that are vitals from 2020" - Note that in both examples, the contents of the `_typeFilter` parameters should be url encoded, but are not here for readability. ### `_until` parameter - Pain points - Exports that could have an upper bound currently need to retrieve more data than required and filter client side - There's no way to for a client to request a subset of a large request at a time (paging). Seems like _until probably isn't the best option for this since dates don't correspond directly to resource counts, but may be an opportunity for one new parameter to handle multiple use cases - Current use - DaVinci ATR - "Resources updated before the specified time will be included in the response. When omitted, the Server is free to export all data for the members until the time the export operation is started. This is a new parameter added to the operation that is not present in the Bulk Data Export operation." - Regenstrief - implemented for paging requests to their bulk data export facade server. - [Customer request](https://chat.fhir.org/#narrow/stream/179250-bulk-data/topic/_endTime.20parameter/near/424954691) for CMS Bulk API - Notes: - Need to align with guidance on _since as will have same issues if lastUpdated isn't being tracked ### `exportType` parameter (can we find a name without "type" in it? exportRules? exportBehavior? followingIG?) - Pain points - Single server endpoint can't handle different types of bulk exports (e.g., patients from claim data, vs. patients from EHR data) or ensure that type of export it's providing aligns with the client's expectations - Current use - DaVinci ATR: "The code that indicates the type of export to be performed. For e.g hl7.fhir.us.davinci-atr to indicate Member Attribution export, and hl7.fhir.us.davinci-pdex for a PDex Export. Servers are supposed to provide the export guidance in the individual IGs." - Questions: can this change behavior of other params? Can it be used with them? Does it all depend on the export type? ### `outputGrouping` parameter - Pain points - Clients need to first retrieve and load entire dataset to work with data for a single patient or encounter - Servers may be doing unnecessary processing by breaking up patient records by resource just for a client to reassemble - Proposal - `outputGrouping` kickoff parameter - Parameter to control resource split between files and grouping within files ``` outputGrouping = `resourceType` (default) | `patient` | `encounter` ``` - Q. does "resourceType" imply that all Patients are in a single file? No. Servers can split as needed, e.g. to enforce size limits. - Q. does "patient" imply each patient's stuff is in one file? Yes, we think. Single patients should not span file boundries. - Q. does "patient" imply each file has only one patient's stuff No, we think since this could result in an unmanagable number of files. - Q. should we support the return of partial manifests when only a subset of the files are generated (i.e., 202 with manifest vs. 200 with manifest)? - Support custom values in the valueset for extensibility if they're preceded by `__`? - Support partial completion manifest response while job is still in 202 status? - Addresses https://jira.hl7.org/browse/FHIR-38075 - group by patient and encounter in addition to resource type ### `acceptableLagDays` parameter - Pain point - Clients don't have a way to select between slow, real time exports from a transactional database and exports from a data warehouse that are faster but may have some limitations (e.g., 24 hour delay) - Proposal - `acceptableLagDays` kickoff parameter - When set to `0` (default), server SHALL return data the reflects the current state of the production system or return an error - When set to another value, server MAY return data prior to the current state of the system by the specified period ### `patientFilter` parameter - Pain points - MITRE DEQM project is looking for ways to filter patients without having to manually set up groups at each site or manage dozens of groups per site - DaVinci wants to have bulk exports of the intersect of multiple groups (e.g., members of a payer and patients with HTN) - Researchers want a standard way to create condition and demographic specific groups across multiple health systems - May also be a way to get at https://jira.hl7.org/browse/FHIR-42055 - provide a way to indicate why a patient was included in a group export - **Group creation for specific use cases such as member attribution are addressed though IGs like the [Da Vinci ATR IG](https://build.fhir.org/ig/HL7/davinci-atr/) and group export for an entire group is addressed in the current Bulk IG. However, there is a gap around filtering the group to export data on a subset (e.g., attributed members with dx of DM or patients a client can access that have a lab result with a specific LOINC code).** - Success criteria - Can EHRs implement this in a peformant manner? - Will multiple systems be able to implement this to return the same result set? - How can we dramaticaly reduce the amount of data in a bulk export? - **Objective is to support some coarse grained filtering to reduce amount of irrelevant data in the export. Clients will *still have to do additional filtering* and/or calcualtion.** - Proposal - `_patientFilter` kickoff parameter. Limits the set of patients included in the response - Only affects`Patient` ouptuts and resources in the [patient compartment](https://build.fhir.org/compartmentdefinition-patient.html) (e.g., `Observation`, `Condition`, `Encounter`, ...) - Included resource types not in the patient compartment aren't impacted by this parameter (e.g., `Organization`, `Location`, ...) - MAY be used in - `/$export` - `/Patient/$export` - `/Group/:id/$export` - As with `_typeFilter`, search result parameters such as _include, revinclude, contained are not supported - Implementations may choose to support only US Core search parameter combinations - Processing Model 1. Consider all Group members (defaulting to all patients) 2. For each `_patientFilter` 2a. Run the query, determine a list of all `subject`s of query results, and filter restrict the group members 3. Proceed with this restricted Group, applying `_type` and `_typeFilter` to restrict the *data* returned for these patients - E.g., to export demographics for patients who had an ED visit in January the export could look something like the following (with proper escaping): ``` $export? _type = Patient &_patientFilter = Encounter.class=http://terminology.hl7.org/CodeSystem/v3-ActCode|EMER &Encounter.date=ge2024-01-01 &Encounter.date=le2024-01-31 ``` To return just recent encounters for these patients, `Encounter` could be added to `_type` and an `_typeFilter` parameter with the same criteria as the `_patientFilter` could be added. - Alternate Proposal - profile group management as part of the Bulk IG - Creation of dynamic groups using characteristics - This seems poorly specified in FHIR docs and may not be specific enough? - `determinedByExpression` of type `text/x-fhir-query` in CI build seems right, but lots of syntax and complexity! - Creation of dynamic groups that are the union and intersect of other group references - Maybe new group that has `member` with references to other groups, but that won't address group intersects? - Creation of static groups using identifiers, e.g., [DPC Attribution Groups](https://dpc.cms.gov/docsV1.html#groups-attribution) - Listing and searching for groups using the REST API - Alternate Proposal - pre-agreed cohort filters that can be used to subset groups - `/Group/myroster/$export?_intersectWith=https://any-uri/ed-patients, https://any-uri/visit-last-30days` - e.g. could have pre-agreed filters for components necessary to calculate https://ecqi.healthit.gov/sites/default/files/ecqm/measures/CMS165v10.html#toc ### `_since` parameter - Current guidance ties this parameter to `lastUpdated` which is difficult for some servers to track. - Consider redefining `_since` or adding a search parameter to help with limiting the data (e.g., `_knownUpdated`) - Align with USCore work here?