# Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation
**Authors:** Mengying Yan, Meng Xia, Wei A. Huang, Chuan Hong, Benjamin A. Goldstein, Matthew M. Engelhard
**Affiliations:** Duke AI Health, Department of Biostatistics and Bioinformatics, Department of Electrical and Computer Engineering, Duke University School of Medicine
---

## **Understanding the Research**
This research addresses a real-world challenge in healthcare: predicting **long-term patient outcomes** (e.g., 1-year mortality) in **recent patient cohorts** for whom such outcomes are not yet fully available. Traditional predictive models struggle when applied across time due to **changes in clinical practice**, **patient populations**, and **label availability**. To overcome this, the authors introduce an approach that combines **adversarial domain adaptation** and **positive-unlabeled (PU) learning** to enable prediction using partially labeled data.
---
## **Motivation**
Predicting long-term outcomes is crucial in clinical settings, yet:
* Recent patient data often lacks full 1-year outcome labels due to insufficient follow-up time.
* There is a **distribution shift** between historical and contemporary patient cohorts.
* Standard supervised models trained on old data fail to generalize to new data with partial labels.
This research proposes a solution tailored to these operational and data limitations.
---
## **Core Idea**
The study proposes an **adversarial positive-unlabeled domain adaptation** method to **transfer knowledge** from historical data (with full labels) to recent data (with partial labels), by **aligning feature distributions** and learning to predict long-term outcomes even when they haven’t yet occurred.
---
## **Methods**
* **Data**:
* **Source domain**: 2018 ED visits with full 1-year mortality labels.
* **Target domain**: 2021 ED visits with only partial 1-year outcomes (e.g., 7, 30, or 90-day mortality known).
* **Learning Framework**:
* **PU Learning**: Target domain includes known positives (observed deaths), while unlabeled patients may be either positive or negative.
* **Domain Adaptation**: Aligns source and target distributions via three-level feature alignment:
1. **Overall alignment**: Aligns general feature distributions using adversarial losses.
2. **Partial alignment**: Separates and aligns positive and negative source examples to the target using KL divergence and reverse-GAN loss.
3. **Conditional alignment**: Supervises model using known positives from the target domain.
* **Model initialization**: Uses pretrained models trained on source data.
* **Loss Functions**: A composite of the above, with weighting hyperparameters to balance objectives.
## Results
Applied to predict 1-year mortality for ED patients in 2021, where only partial follow-up is available:
* Only **51.7%** of patients had known outcomes at 90 days, dropping to **17.3%** at 7 days.
* The proposed method outperforms baseline models (source-only, naïve PU learning) in **AUROC**.
* Even with only **50% label availability**, the model approaches the performance of fully supervised models trained on complete labels.
## **Why This Matters**
This study shows that it is possible to predict long-term outcomes even when follow-up is limited and target distributions differ from the source.
The proposed method enables early prediction of outcomes like 1-year mortality in recent patient cohorts, where full labels are not yet available.
It addresses shifts between historical and current populations, which often degrade model performance in real-world clinical applications.
Compared to standard positive-unlabeled and domain adaptation baselines, the method consistently outperforms them and remains robust even when only 90, 30, or 7 days of outcome data are available.
These results indicate a practical modeling approach that can be used in evolving clinical settings where outcome labels are incomplete and data distributions change over time.
This approach is particularly relevant for:
* Real-time hospital triage tools
* Early evaluation of post-pandemic care
* Adaptive learning systems that evolve as more outcome labels become available
---
## **Acknowledgment**
This work was conducted by researchers at Duke University’s AI Health initiative, with affiliations across the School of Medicine and the Departments of Biostatistics and Electrical Engineering.
*Presented as part of a poster session highlighting machine learning methods addressing high-impact healthcare prediction challenges using imperfect real-world data.*