Augmented FACTA+
===
## Introduction:
In order to support biomedical research on COVID-19 with information science, we are developing an AI system (called **[Augmented FACTA+](https://www.cl.ecei.tohoku.ac.jp/~ryo-t/facta/index.html)**)that automatically analyzes big data from 30K academic papers related to COVID-19 and other 25 million abstracts of biomedical papers. This system significantly augments the existing knowledge base by automatically extracting the vast amount of information and knowledge that may be relevant to COVID-19 from the big data. We also infer potential relationship among gene/protein, drug, disease , symptoms, enzyme, etc. based on the associations among the large mount of knowledge and reference relationships between papers. The system presents such information to researchers and supports access to academic papers, knowledge and a "big picture" that would contribute to the research on mechanism, drugs repurposing, diagnostics, etc.
## Usage:
![](https://i.imgur.com/a9SbFLh.jpg =400x)
(Figure1)
**[Augmented FACTA+](https://www.cl.ecei.tohoku.ac.jp/~ryo-t/facta/index.html)** (as shown in Figure1) is a tool for generating potentially related concepts and corresponding explanatory inferences given a target concept (e.g., "COVID-19") by reasoning over the large amount of knowledge extracted from [MEDLINE](https://www.nlm.nih.gov/databases/download/pubmed_medline.html) corpus, [CORD-19](https://www.semanticscholar.org/cord19/download) corpus and [UMLS](https://www.nlm.nih.gov/research/umls/index.html) knowledge base.
![](https://i.imgur.com/AL7wpN6.jpg)
(Figure2)
As shown in Figure2, searching for a target concept (e.g., "COVID-19") displays the candidate concepts (the third column) and their Predicted Relationship (the first column) with the target concept as well as the Reasoning Paths that connect the target and candidate concepts. Each Reasoning Path (row) consists of multiple Pivot Concepts and Pivot Relationship among them. Clicking a Pivot Concept shows the papers where the Pivot Concept appears, and clicking the Pivot Relationship in a Reasoning Path does the paper where the connected concept pair coexists.
## Algorithm:
To identify biomedical concepts in textual corpus, we apply [Scispacy](https://allenai.github.io/scispacy/), a state-of-the-art biomedical named entity recognizer, to detect UMLS entities in text.
To predict potentially related candidate concepts and generate the supporting explanation (Reasoning Path), we, firstly, represent the large amount of knowledge extracted from existing structured knowledge base and unstructured text as a **Universal Knowledge Graph**. In the **Universal Knowledge Graph**, each node represents the UMLS concept and each edge indicates a UMLS relationship or a textual relationship, as shown in Figure3. In addition, we will incorporate Multi-modal data such as molecular structure into the **Universal Knowledge Graph** in the future.
![](https://i.imgur.com/e6o0LLW.jpg =600x)
(Figure3)
Next, our system relies on a state-of-the-art Tail Prediction Model (Takahashi et al., 2018) that learns to project **Universal Knowledge Graph** into a same continuous vector space and predicts the candidate concepts based on the vector calculation.
Then, our system searches the multi-hop reasoning paths that connect the candidate and target concept over the **Universal Knowledge Graph**. Finally, our system applies a new Relation Prediction Model (Dai et al., 2019) to classify (or reexamine) the relationship between target and candidate concepts based on the multi-hop reasoning paths and evaluate the contribution of each path via a Knowledge Graph based attention mechanism.
## Statistics:
[**Statistics**](https://hackmd.io/@CTChU39nQwq3Xqziu97IOA/HkncogKc8)
## Case on COVID-19:
Drug repurposing is a drug developing strategy used to identify novel uses for existing approved and investigational drugs besides their original indication. One potential usage of our system is to repurpose exiting drugs for treating "COVID-19" by searching the candidate concepts that have "may_treat" relationship with "COVID-19". For instance, as shown in Figure2, our system hypothesizes that "Chloroquine may_treat COVID-19". The hypothesis is proposed based on multiple reasoning paths such as shown in Figure4 and Figure5. The hypothesis is recently shown to be promising by the [paper](https://www.kansensho.or.jp/uploads/files/topics/2019ncov/covid19_casereport_200519_2.pdf), which is published on [The Japanese Association for Infectious Diseases](http://www.kansensho.or.jp/modules/topics/index.php?content_id=31).
![](https://i.imgur.com/THrGqlQ.png =400x)
(Figure4)
![](https://i.imgur.com/jBFIULO.png =400x)
(Figure5)
## Members:
This project is led by [Kentaro Inui](http://www.cl.ecei.tohoku.ac.jp/~inui/) with a team of members: [Qin Dai](http://www.cl.ecei.tohoku.ac.jp/~dq/) and [Ryo Takahashi](https://reiyw.com/), from the Tohoku University, and is supported by the JST CREST Project on [Scientific Paper Analysis](https://www.jst.go.jp/kisoken/crest/en/project/44/15656596.html) (Research Director:[Yuji Matsumoto](https://cl.naist.jp/staff/matsu/home-e.html)). Special thanks to [Yoshimasa Tsuruoka](https://www.logos.ic.i.u-tokyo.ac.jp/~tsuruoka/) and [Akiko Aizawa](https://www.nii.ac.jp/en/faculty/digital_content/aizawa_akiko/) for their help and advice in building the system.
### References:
- Ryo Takahashi, Ran Tian and Kentaro Inui. 2018. Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder. In *Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics*(Volume 1: Long Papers), volume 1, pages 2148–2159.
- Qin Dai, Naoya Inoue, Paul Reisert, Ryo Takahashi and Kentaro Inui. 2019. Incorporating Chains of Reasoning over Knowledge Graph for Distantly Supervised Biomedical Knowledge Acquisition. In *Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC33)*, pages 19-28.