# Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation **Authors:** Mengying Yan, Meng Xia, Wei A. Huang, Chuan Hong, Benjamin A. Goldstein, Matthew M. Engelhard **Affiliations:** Duke AI Health, Department of Biostatistics and Bioinformatics, Department of Electrical and Computer Engineering, Duke University School of Medicine --- ![Adversarial-Positive-Unlabeled-Domain=Adaptation](https://hackmd.io/_uploads/ByBlM4fIgl.jpg) ## **Understanding the Research** This research addresses a real-world challenge in healthcare: predicting **long-term patient outcomes** (e.g., 1-year mortality) in **recent patient cohorts** for whom such outcomes are not yet fully available. Traditional predictive models struggle when applied across time due to **changes in clinical practice**, **patient populations**, and **label availability**. To overcome this, the authors introduce an approach that combines **adversarial domain adaptation** and **positive-unlabeled (PU) learning** to enable prediction using partially labeled data. --- ## **Motivation** Predicting long-term outcomes is crucial in clinical settings, yet: * Recent patient data often lacks full 1-year outcome labels due to insufficient follow-up time. * There is a **distribution shift** between historical and contemporary patient cohorts. * Standard supervised models trained on old data fail to generalize to new data with partial labels. This research proposes a solution tailored to these operational and data limitations. --- ## **Core Idea** The study proposes an **adversarial positive-unlabeled domain adaptation** method to **transfer knowledge** from historical data (with full labels) to recent data (with partial labels), by **aligning feature distributions** and learning to predict long-term outcomes even when they haven’t yet occurred. --- ## **Methods** * **Data**: * **Source domain**: 2018 ED visits with full 1-year mortality labels. * **Target domain**: 2021 ED visits with only partial 1-year outcomes (e.g., 7, 30, or 90-day mortality known). * **Learning Framework**: * **PU Learning**: Target domain includes known positives (observed deaths), while unlabeled patients may be either positive or negative. * **Domain Adaptation**: Aligns source and target distributions via three-level feature alignment: 1. **Overall alignment**: Aligns general feature distributions using adversarial losses. 2. **Partial alignment**: Separates and aligns positive and negative source examples to the target using KL divergence and reverse-GAN loss. 3. **Conditional alignment**: Supervises model using known positives from the target domain. * **Model initialization**: Uses pretrained models trained on source data. * **Loss Functions**: A composite of the above, with weighting hyperparameters to balance objectives. ## Results Applied to predict 1-year mortality for ED patients in 2021, where only partial follow-up is available: * Only **51.7%** of patients had known outcomes at 90 days, dropping to **17.3%** at 7 days. * The proposed method outperforms baseline models (source-only, naïve PU learning) in **AUROC**. * Even with only **50% label availability**, the model approaches the performance of fully supervised models trained on complete labels. ## **Why This Matters** This study shows that it is possible to predict long-term outcomes even when follow-up is limited and target distributions differ from the source. The proposed method enables early prediction of outcomes like 1-year mortality in recent patient cohorts, where full labels are not yet available. It addresses shifts between historical and current populations, which often degrade model performance in real-world clinical applications. Compared to standard positive-unlabeled and domain adaptation baselines, the method consistently outperforms them and remains robust even when only 90, 30, or 7 days of outcome data are available. These results indicate a practical modeling approach that can be used in evolving clinical settings where outcome labels are incomplete and data distributions change over time. This approach is particularly relevant for: * Real-time hospital triage tools * Early evaluation of post-pandemic care * Adaptive learning systems that evolve as more outcome labels become available --- ## **Acknowledgment** This work was conducted by researchers at Duke University’s AI Health initiative, with affiliations across the School of Medicine and the Departments of Biostatistics and Electrical Engineering. *Presented as part of a poster session highlighting machine learning methods addressing high-impact healthcare prediction challenges using imperfect real-world data.*