owned this note
owned this note
Published
Linked with GitHub
# Life GPT Review literature:
> Example questions to review paper
> - what do they do with the input data?
> - what model do they use (what's the underlying self-supervised learning task)
> - what do they obtain (it seems they obtain a BERT model often, which is encoder only so you get a bunch of embeddings)
> - what do they test this stuff on?
## Shaky fundations
The review examines 84 foundation models trained on non-imaging EHR data, creating a taxonomy of their architectures, training data, and potential use cases.
Most models are trained on small clinical datasets like MIMIC-III or broad biomedical corpora like PubMed and are evaluated on tasks that do not necessarily reflect their utility in health systems.
### Categories of Clinical FMs:
- Clinical Language Models (CLaMs): Specialized in clinical text and capable of extracting information, summarizing medical dialogues, and predicting mechanical ventilation needs.
- Foundation Models for EMRs (FEMRs): Trained on a patient's entire medical history, outputting machine-understandable representations for downstream prediction tasks.

https://www.nature.com/articles/s41746-023-00879-8
## Med-BERT
### Input data
Med-BERT uses structured electronic health records (EHRs) from a dataset containing 28,490,650 patient records. These records include various types of structured diagnosis data.
Med-BERT uses code embeddings, visit embeddings, and serialization embeddings to represent EHR data, capturing the interrelations between clinical codes within visits.
### Data processing:
"given the semantic differences between EHR and text, adapting the BERT methodology to structured EHR is non-trivial. For example, while the input modality of the original BERT was a 1-D sequence of words, our input modality is structured EHR which is recorded in a multilayer and multi-relational style. There are no clear rules on how to flatten the structured EHR into a 1-D sequence and how to encode the “structures” of the structured EHR in the BERT transformer architecture.”"
### Evaluation
Med-BERT was evaluated through fine-tuning on two disease prediction tasks: predicting heart failure among patients with diabetes and the onset of pancreatic cancer.
The model used pretraining tasks such as the Masked Language Model (Masked LM) and prediction of prolonged length of stay in the hospital to capture contextual semantics in EHR data.

## Large language models encode clinical knowledge
~~https://www.sciencedirect.com/science/article/pii/S1532046420302653~~
https://www.nature.com/articles/s41586-023-06291-2
Explores use of large language models (LLMs) in clinical settings, focusing on their capabilities and limitations.
Objective: The paper presents MultiMedQA, a benchmark combining six existing medical question-answering datasets and a new dataset, HealthSearchQA.
Evaluation: It proposes a human evaluation framework to assess model answers on multiple axes, including factuality, comprehension, reasoning, possible harm, and bias.
Models Evaluated:
PaLM: A 540-billion parameter LLM.
Flan-PaLM: An instruction-tuned variant achieving state-of-the-art accuracy on multiple medical question-answering datasets.
## CPLLM: CLINICAL PREDICTION WITH LARGE LANGUAGE MODELS
https://openreview.net/forum?id=fnBYPL5Ged
"We present Clinical Prediction with Large Language Models (CPLLM), a method that involves fine-tuning a pre-trained Large Language Model (LLM) for clinical disease prediction. We utilized quantization and fine-tuned the LLM using prompts, with the task of predicting whether patients will be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical diagnosis records. We compared our results to various baselines, including Logistic Regression, RETAIN, and Med-BERT, which is the current state-ofthe-art model for disease prediction using temporal structured EHR data. Our experiments have shown that CPLLM surpasses all the tested models in terms of PR-AUCandROC-AUCmetrics, displaying noteworthy enhancements compared to the baseline models.
...
However, we want to harness the power of LLMs in understanding sequences of tokens derived from structured EHR data, specifically to train prediction models. We represent the structured data as a text by representing each medical concept corresponds to a word, admissions are treated as visits, and patient history is considered a document. The objectives of this study are to develop a novel method for using LLMs to train clinical predictors and to evaluate the performance of this method on real-world datasets.
We used two different LLMs, Llama2, which is a general LLM (Touvron et al., 2023b) and BioMedLM which was trained on biological and clinical text (Venigalla et al., 2022). We used three prediction tasks and two datasets and compared the performance to three baseline models."
## Language models are an effective representation learning technique for electronic health record data
https://www.sciencedirect.com/science/article/pii/S1532046420302653
The aim to demonstrate that patient representation schemes inspired by natural language processing (NLP) techniques can improve the accuracy of clinical prediction models by transferring information from a larger patient population to a smaller, relevant subset.
Proposes using clinical language model-based representations (CLMBR) to leverage the structure and sequence of EHR data.
Empirically evaluates the effectiveness of CLMBR for five prediction tasks and compares it with standard baselines and other representation learning techniques.
"The study was done with approval by Stanford University’s Institutional Review Board. We treated each patient’s record as a sequence of days 𝑑1,…, 𝑑𝑁, ordered by time. Each day consists of a set of medical codes for diagnoses, procedures, medication orders, and laboratory test orders (ICD10, CPT or HCPCS, RXCUI, and LOINC codes respectively) recorded on that day. Fig. 3 illustrates an example patient record annotated with our notation. In this study, we did not use quantitative information such as laboratory test results or vital sign measurements. We also did not use clinical notes (i.e. textual documents), images, or explicit linkages between codesas they were not available in our
de-identified EHR data due to logistical and IRB related issues."

"As described in Section 2.1, our EHR data consisted of sequences of days 𝑑1…𝑑𝑁 where each day is comprised of a set of medical codes that represents the events of that particular day. The goal of building a clinical language model is to construct a model that can predict the
probability of these sequences of days 𝑝(𝑑1,…, 𝑑𝑁). As is standard for many other sequence models, we factorized the probability distribution over the sequence into a series of predictions where only a single
element of the sequence is predicted at a given time. In EHR data, this corresponds to predicting the next day in a patient record given the previous days, i.e., 𝑝(𝑑𝑖|𝑑1,…, 𝑑𝑖−1). Because each day 𝑑𝑖 consists of a
set of medical codes, this problem is a set prediction problem which is also known as multi-label prediction [44]. We solved this set prediction problem in two steps: First, we constructed a model for computing fixed length patient representation given days of history and second, we constructed a set predictor that predicts the set of codes for thefollowing day given that patient representation.

"
# Predicting Risk of Alzheimer’s Diseases and Related Dementias with AI Foundation Model on Electronic Health Records
https://www.medrxiv.org/content/10.1101/2024.04.26.24306180v1
A large-scale EHR dataset contains rich information on patients. To enable the model to understand EHR best, we designed a prediction framework with two stages (Figure 1a): (1) pretraining, where we pretrained a foundation model for EHR withTransformer architecture with the pretraining cohort. The model was trained without labels and merely by reconstructing randomly masked information from the EHR; (2) fine-tuning, where we fine-tuned the model with the medical history and AD/ADRD/MCI outcomes in the fine-tuning cohort to identify the high-risk patients (see the Method section for more details).
We conducted pretraining only on the pretraining cohort. We fine-tuned the model with the training set in the AD/ADRD/MCI
finetuning cohort and used the validation set to examine the performance for different hyperparameter settings, which guided model selection. The performance of the selected models was evaluated on the fully held-out validation set of patients and reported as an estimate of performance in new patient cohorts.

# CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines
https://arxiv.org/abs/2402.04400
**"To our knowledge, this is the first attempt to utilize GPT for generating time-series EHR data.""**
- We design a novel patient representation that captures visit types, discharge facilities for inpatient visits, and all temporal data, such as starting year, age, intervals between visits, and inpatient visit duration. This is the first instance of fully preserving such temporal information, to our knowledge (introducing artificial time token (ATT) between two neighboring visits).
- We treat patient sequence generation as a language modeling problem, which allowed us to use the state-ofthe-art language model Generative Pre-trained Transformers (GPT) to learn the distribution of patient sequences to generate new synthetic sequences [8, 9].

"Because the patient representation encodes all the temporal information in the sequence, the trained GPT model could be used potentially for time-sensitive forecasting. We could prompt the trained GPT model with a patient history and estimate the time of the next visit via a Monte Carlo Sampling approach"
Link the the baseline embedding model: https://arxiv.org/pdf/2111.08585
# Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study
https://www.thelancet.com/journals/landig/article/PIIS2589-7500(24)00025-6/fulltext
# Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions
https://arxiv.org/abs/2404.03264
# BLOOM
https://arxiv.org/abs/2211.05100
# Forward citation review articles for the "shaky foundations" paper
Levan: searched articles citing the "shaky foundations" review paper, and then manually selected review articles among those. Sorry for the overload, will sift through these.
* Augmented non-hallucinating large language models as medical information curators #review
* A Systematic Review of Testing and Evaluation of Healthcare Applications of Large Language Models (LLMs) #review
* Recent Advances in Large Language Models for Healthcare #review
* Generative AI and large language models in health care: pathways to implementation #review
* Large language models in medical and healthcare fields: applications, advances, and challenges #review
* Leveraging foundation and large language models in medical artificial intelligence #review
* Generative artificial intelligence and ethical considerations in health care: a scoping review and ethics checklist #review
* Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review #review
* Advancing healthcare: the role and impact of AI and foundation models #review
* Potential of Large Language Models in Health Care: Delphi Study #review
* Unlocking the potential of large language models in healthcare: navigating the opportunities and challenges #review
# 2024 Moor et al - Foundation models for generalist medical artificial intelligence