#### CSE 6250 Project Proposal
*Team B2 - Ngok Chao HO, Zeyu Wei*
##### Paper 1: Index 113, **Automated LOINC Standardization Using Pre-trained Large Language Models**, Google Research, Author(s): Tao Tu, Eric Loreaux, Emma Chesley, Adam D. Lelkes, Paul Gamble, Mathias Bellaiche, Martin Seneviratne, Ming-Jun Chen
1) **Task**: Tool for mapping general text to **LOINC**.
2) **Innovation**: Uses embeddings from pre-trained **LLM** with minimal labeled data.
3) **Dis/Adv**: Learns from small datasets but requires substantial computational resources.
4) **Data**: Yes, **MIMIC-III** ([Link](https://paperswithcode.com/dataset/mimic-iii)).
5) **Code**: Not provided.
##### Paper 2: Index 75, **Generative Biomedical Entity Linking via Knowledge Base-Guided Pre-training and Synonyms-Aware Fine-tuning**, Tsinghua University, Author(s): Hongyi Yuan, Zheng Yuan, Sheng Yu, Zifeng Wang
1) **Task**: Map biomedical synonyms to knowledge bases like **UMLS**.
2) **Innovation**: Generative entity linking with knowledge base-guided pre-training.
3) **Dis/Adv**: State-of-the-art but high computational cost due to large pre-trained models.
4) **Data**: Yes, [Link](https://drive.google.com/file/d/1JWYMdwxp7_ZZRGAO-ENmgUNirx9-nX32/view).
5) **Code**: Yes, [GitHub](https://github.com/Yuanhy1997/GenBioEL).
##### Paper 3: Index 2, **SurvTRACE: Transformers for Survival Analysis with Competing Events**, UIUC Urbana, Author: Zifeng Wang
1) **Task**: Predict time-to-event outcomes with **competing events**.
2) **Innovation**: Transformer-based model using **IPS** to address selection bias.
3) **Dis/Adv**: Handles complex interactions, but computationally intensive for large datasets, though feasible.
4) **Data**: Yes, **SUPPORT**, **METABRIC**, **SEER** ([pycox](https://github.com/havakv/pycox), [SEER](https://seer.cancer.gov/)).
5) **Code**: Yes, [GitHub](https://github.com/RyanWangZf/SurvTRACE).
##### Target Paper: **SurvTRACE: Transformers for Survival Analysis with Competing Events**
##### Why we chose this paper:
Using Transformer in this task is innovative while requiring fewer computational resources, making it feasible to replicate with AWS.
##### Hypothesis to verify:
We will verify the hypothesis that **SurvTRACE outperforms traditional survival analysis models** in handling competing events, using the **concordance index (C-Index)** as the performance metric.
##### Data and Computational Resources:
- **Data**: We will use **SUPPORT** and **METABRIC** from pycox.
- **Computational Resources**: We plan to use **AWS**.
- **Software**: Open-source libraries like PyTorch.
#### Appendix
- **SUPPORT**: Available from [pycox GitHub](https://github.com/havakv/pycox).
- **METABRIC**: Available from [pycox GitHub](https://github.com/havakv/pycox).
- **SEER**: Available from [SEER](https://seer.cancer.gov/).
#### References
1. Tao Tu et al., "Automated LOINC Standardization Using Pre-trained Large Language Models," Google Research.
2. Hongyi Yuan et al., "Generative Biomedical Entity Linking via Knowledge Base-Guided Pre-training and Synonyms-Aware Fine-tuning," Tsinghua University.
3. Zifeng Wang, "SurvTRACE: Transformers for Survival Analysis with Competing Events," UIUC Urbana.