CSE 6250 Project Proposal

#### CSE 6250 Project Proposal *Team B2 - Ngok Chao HO, Zeyu Wei* ##### Paper 1: Index 113, **Automated LOINC Standardization Using Pre-trained Large Language Models**, Google Research, Author(s): Tao Tu, Eric Loreaux, Emma Chesley, Adam D. Lelkes, Paul Gamble, Mathias Bellaiche, Martin Seneviratne, Ming-Jun Chen 1) **Task**: Tool for mapping general text to **LOINC**. 2) **Innovation**: Uses embeddings from pre-trained **LLM** with minimal labeled data. 3) **Dis/Adv**: Learns from small datasets but requires substantial computational resources. 4) **Data**: Yes, **MIMIC-III** ([Link](https://paperswithcode.com/dataset/mimic-iii)). 5) **Code**: Not provided. ##### Paper 2: Index 75, **Generative Biomedical Entity Linking via Knowledge Base-Guided Pre-training and Synonyms-Aware Fine-tuning**, Tsinghua University, Author(s): Hongyi Yuan, Zheng Yuan, Sheng Yu, Zifeng Wang 1) **Task**: Map biomedical synonyms to knowledge bases like **UMLS**. 2) **Innovation**: Generative entity linking with knowledge base-guided pre-training. 3) **Dis/Adv**: State-of-the-art but high computational cost due to large pre-trained models. 4) **Data**: Yes, [Link](https://drive.google.com/file/d/1JWYMdwxp7_ZZRGAO-ENmgUNirx9-nX32/view). 5) **Code**: Yes, [GitHub](https://github.com/Yuanhy1997/GenBioEL). ##### Paper 3: Index 2, **SurvTRACE: Transformers for Survival Analysis with Competing Events**, UIUC Urbana, Author: Zifeng Wang 1) **Task**: Predict time-to-event outcomes with **competing events**. 2) **Innovation**: Transformer-based model using **IPS** to address selection bias. 3) **Dis/Adv**: Handles complex interactions, but computationally intensive for large datasets, though feasible. 4) **Data**: Yes, **SUPPORT**, **METABRIC**, **SEER** ([pycox](https://github.com/havakv/pycox), [SEER](https://seer.cancer.gov/)). 5) **Code**: Yes, [GitHub](https://github.com/RyanWangZf/SurvTRACE). ##### Target Paper: **SurvTRACE: Transformers for Survival Analysis with Competing Events** ##### Why we chose this paper: Using Transformer in this task is innovative while requiring fewer computational resources, making it feasible to replicate with AWS. ##### Hypothesis to verify: We will verify the hypothesis that **SurvTRACE outperforms traditional survival analysis models** in handling competing events, using the **concordance index (C-Index)** as the performance metric. ##### Data and Computational Resources: - **Data**: We will use **SUPPORT** and **METABRIC** from pycox. - **Computational Resources**: We plan to use **AWS**. - **Software**: Open-source libraries like PyTorch. #### Appendix - **SUPPORT**: Available from [pycox GitHub](https://github.com/havakv/pycox). - **METABRIC**: Available from [pycox GitHub](https://github.com/havakv/pycox). - **SEER**: Available from [SEER](https://seer.cancer.gov/). #### References 1. Tao Tu et al., "Automated LOINC Standardization Using Pre-trained Large Language Models," Google Research. 2. Hongyi Yuan et al., "Generative Biomedical Entity Linking via Knowledge Base-Guided Pre-training and Synonyms-Aware Fine-tuning," Tsinghua University. 3. Zifeng Wang, "SurvTRACE: Transformers for Survival Analysis with Competing Events," UIUC Urbana.