MolCA: Molecular graph-language modeling with cross- modal projector and uni-modal adapter - HackMD

<style> img { display: block; margin-left: auto; margin-right: auto; } </style> > [Paper link](https://arxiv.org/abs/2310.12798) | [Code link](https://github.com/acharkq/MolCA) | EMNLP 2023 :::success **Thoughts** This study purposed on method called MolCA. It is a molecular language modeling, which aims to enable LMs to perceive 2D graphs for molecule-totext generation. ::: ## Abstract Lack of 2D graph perception for language models. This study propose **Mol**ecular Graph-Language Modeling with **C**ross-Modal Projector and Uni-Modal **A**dapter. It enables an LM to understand both text- and graph-based molecular contents via the cross-modal projector. ## Background Language Models (LMs) have demonstrated significant achievements across various domains. This study aims to utilize LMs for molecule understanding. There has some methods in 1D Simplified Molecular. But in 2D graph representations, which are crucial to human professionals in comprehending the molecule structures. ![image](https://hackmd.io/_uploads/B19xgoHiR.png) ## Method Three key components of MolCA’s architecture: 1) a graph encoder for 2D structure understanding 2) an LM for text generation 3) a cross-modal projector to connect the graph encoder and the LM ![image](https://hackmd.io/_uploads/SJubgjroC.png) In MolCA’s pretrain stage 1. The graph encoder and the cross-modal projector (i.e., Q-Former) are jointly optimized using three cross-modal tasks. Modules of the same color share weights. ![image](https://hackmd.io/_uploads/SymExjHsA.png) MolCA’s pretrain stage 2 by molecule captioning. MolCA’s fine-tune stage for molecule-to-text generation. ![image](https://hackmd.io/_uploads/r1MBxsSsC.png) ## Experiment Below is the table for molecule captioning on the PubChem324k and CheBI-20 datasets. ![image](https://hackmd.io/_uploads/SySDxoHiA.png)