owned this note
owned this note
Published
Linked with GitHub
# Extensions for ML-Derived FHIR Resources
*Call 1 Meeting Notes*
## Introductions
* Josh: Microsoft Health Standards Interop / SMART Health IT
* Bin Mao: Boston Children's data engineer
* Danny: Microsoft data scientist
* Donny: Software eng lead at Google Cloud Health care NLP
* Gino Canessa: Microsoft engineer on healthcare interop
* Guy: Microsoft Health product manager, works in language understanding
* Hadas: Microsoft Health AI, NLP
* Siddhi: AWS Health AI, leads science and eng for NLP in medical domain
* Dan (after intros): Central Square Solutions / SMART Health IT
## Existing Capabilities (inputs, outputs, provenance/tracking, standards)
Bin: Today converting NLP results into FHIR resources. We do see a need to identify spans of text from source material; also like a clear way to indicate that a resource has been drived by NLP a clear top-level designation to avoid confusion. We also like to track information about the engine or algorithm used to perform the derivations, for reproducibility. Currently we're populating `modifierExtension` with some of this information, as well as indications for negation.
Dan: Working with Bin on project that intermingle EHR-source FHIR data with NLP-derived data; want to combine these in defining cohorts, and to understand the confidence in these data. As a research project, also experimenting with different NLP engines/versions and want to track this in outputs so we can filter/compare.
Donny: Currently performing NLP but not translating results into FHIR. We extract information and share text-span type provenance, but we think more details are needed for labeling, and tracing where data came from (multiple sources, image bounding boxes for OCR, etc). Would like to ensure standards are rich enough to encompass the things in this set. Today's outputs are custom JSON data structures to categorize text spans with entity types and metadata, confidence scores, etc.
Siddhi: Similar to what Donny described. Thinking about deriving document sections, e.g. family history or social determinants of health. Secondarily, thinking about structuring data within. Outputs aren't FHIR yet; focus is on the superstructure/scaffold of sections, rather than detailed clinical content within these. Common theme: coverage and accuracy go hand in hand; could vary by specialty or other context, so we're always making decisions about what to optimze for (generalization has been difficult).
Guy: Looking at ways to reference text when deriving FHIR data, key goals are pointing to one or more relevant spans of text for explainability and validation.
## Current FHIR definitions
https://build.fhir.org/extension-derivation-reference.html
#### Standalone text source
```json
{
"resourceType": "Condition",
"id": "123",
"extension": [{
"url": "http://hl7.org/fhir/StructureDefinition/derivation-reference",
"extension": [{
"url": "reference",
"valueReference": {
// assuming this is a text-oriented source represented
// as a lone `DocumentReference.attachment` element
"reference": "DocumentReference/123",
"display": "Discharge summary from hospital"
}
}, {
"url": "offset",
"valueInteger": 300
}, {
"url": "length",
"valueInteger": 25
}]
}]
}
```
#### FHIR Narrative source
E.g., for a Composition like http://hl7.org/fhir/composition-example-mixed.json.html
```json
{
"resourceType": "Condition",
"id": "123",
"extension": [{
"url": "http://hl7.org/fhir/StructureDefinition/derivation-reference",
"extension": [{
"url": "reference",
"valueReference": {
"reference": "Composition/456",
"display": "Discharge summary from hospital"
}
}, {
"url": "path",
"valueString": "section[3].text.div"
}, {
"url": "offset",
"valueInteger": 50
}, {
"url": "length",
"valueInteger": 10
}]
}]
}
```
## Discussion
Dan: If you have a procedure with a date and a code derived from a clinical note... would this be a derived resource, or where would the extensions go?
```json
{
"resourceType": "Procedure",
"id": "123",
"extension": [{
"url": "http://hl7.org/fhir/StructureDefinition/derivation-reference",
"extension": [{
"url": "reference",
"valueReference": {
"reference": "DocumentReference/456",
"display": "Discharge summary from hospital"
}
}, {
"url": "offset",
"valueInteger": 50
}, {
"url": "length",
"valueInteger": 10
}]
}],
"occurrenceDateTime": "2022-10-01",
"code": {
"text": "",
"codings": []// e.g., 3 different coding systems here
"extension": [{
"url": "http://hl7.org/fhir/StructureDefinition/derivation-reference",
"extension": [{
"url": "offset",
"valueInteger": 55
}, {
"url": "length",
"valueInteger": 5
}]
}]
}
}
```
Josh: today there's flexibility about whether to provide information at the Resource or the Element level or both. We may want specific advice about what to do when lots of elements have been derived from (slightly differnet places in ) the same source document. How to avoid repetition and communicate these details.
Donny: You mentioned use cases like "this piece of information was derived from a specific text source"; but when sticking this into a resource there may not always be a direct mapping back to a single source of evidence. Might mark a diagnosis of asthma as "severe" based on several data points, e.g. putting together a package of information as an explainer for a conclusion. Need to be able to point to multiple points of evidence as justification. Do we think about packaging up inferences into a conclusion as a different sort of case?
Josh: I think we're focused here on the "low-level" pointers from structured data back to sources; not about higher-level clinical reasoning-type conclusions.
Donny: we may need different extensions or different approach to model
Random Modeling Questions (Gino):
* FHIRpath vs. element.id
* reasoning is that element.id may not be present on a resource that is not able to be updated
* note: one way is likely better than two options here (simplicity in implementation)
* offset + length vs. range
"https://build.fhir.org/datatypes#Range" -- low and high
Felt like an abuse of range, not really what it is intended for
Bin: what about excerpting the spans into the extension, too? Instead of just a reference?
Josh/Siddhi: This might get verbose and hard to interpret out of context. Consumers really will need access to the source files to evaluate these results meaningfully.
Donny: this could be an optional addition. Would get very large very quickly.
Guy: Could Resource.text be used to capture this?
Josh: Yes, but it would prevent other use of that field.
Josh: does anyone currently point back to algorithm or engine in their outputs?
Bin: We use cTAKES with two different configs; we like to point back to this information (model + config) with a version, which helps us reproduce results.
Josh: so you wrap up all the alg/config info into a URI, and leave the semantics for that up to the producer?
Bin: yes.
Siddhi: We also sometimes derive information from ontologies in addition to the textual sources; as new versions of ontologies or metathesaurus data are published, results can change. So "versioning" would need to encompass more than just ML algorithm.
Josh: is there middle ground, like a URI that points to a "for more info" webpage?
(Note: suggestion from Google along these lines as well: https://jira.hl7.org/browse/FHIR-34475?focusedCommentId=195005&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-195005 )
Siddh: potentially, yes.
Gino: For derivation referenecs, should these be versioned?
Josh: Yeah, could add SHOULD-level guidance here.
## Goals and Timelines
* Explainability
* Validation by customers/users
## Follow-up
* Should we define different approaches for higher level conclusions? Evidence resources, Assessment resources, new extensions, etc?
* Propose an approach for 2D bounding boxes
* Propose an approach for pointing config URIs
## Other Topics?