<style> img { display: block; margin-left: auto; margin-right: auto; } </style> > [Paper link](https://ojs.aaai.org/index.php/AAAI/article/view/29859) | [Code link](https://github.com/seukgcode/OntoFact) | AAAI 2024 :::success **Thoughts** This study tackles the primary challenges associated with the factuality evaluation of large language models (LLMs) and introduce OntoFact—an adaptive framework designed for **detecting inaccuracies through a sophisticated Ontology-level Relation Linking (ORL) mechanism**. This framework aims to uncover knowledge deficiencies at the ontology level. ::: ## Abstract Large language models (LLMs) excel in information retrieval but often generate inaccurate responses, a problem known as intrinsic hallucination. This issue stems from the vague and unreliable fact distribution in LLMs trained on massive datasets. To address this, we introduce OntoFact, an adaptive framework designed to detect unknown facts by exploring ontology-level gaps in knowledge. ## Background LLMs are often criticized for their lack of factual accuracy. Existing studies on factuality detection typically adopt a question-answering approach. ![image](https://hackmd.io/_uploads/S1eiyWq9C.png) **Text-driven methods** explore the limitations of LLMs by using natural language text to identify gaps in knowledge. **KG-driven methods** use knowledge graphs (KGs) to collect test cases, combining instance-level entities and relations. However, both approaches are generally tested only on small-scale datasets within a few typical domains. Moreover, knowledge graphs at the instance level contain a significant number of inaccuracies. ## Method This study try to utilize ontology-level triples as probes can enhance the reliability of test cases. Suppose a specific KG $\mathcal{G} = \{ \mathcal{G}_f, \mathcal{G}_o \}$ is geiven, where $\mathcal{G}_f = \{ (h_f, r_f, t_f) \}$ and $\mathcal{G}_o = \{ (h_o, r_o, t_o) \}$ are for instance and ontology sub-graph. $h$ means head, $r$ means relation and $t$ means tail. ![image](https://hackmd.io/_uploads/SylcH-qqA.png) In first stage, OntoFact **initializes test cases** by combining single instance-level triples. For the second stage, OntoFact leverages the ORL mechanism to wander along KG towards widely-range ontologies and instances, **producing error-prone test cases adaptively**. ![image](https://hackmd.io/_uploads/S1hjSb9c0.png) The last stage, OntoFact **feeds test cases into the hallucination-free detection module** to obtain unbiased results. ## Experiment This study use three large-scale KGs: 1. DBpedia (EN) 2. YAGO (EN) 3. CN-DBpedia (CN) For the large language models, they use 20 LLMs. > Hallucination-free detection (HFD) They calculate the **error proportion (EP)** of ontology-level triples as evaluation metrics. > Ontology-driven reinforcement learning (ORL) They utilize **accuracy** ACC, **precision** P, **recall** R, and the corresponding **F1-score**. ![image](https://hackmd.io/_uploads/BkOv2-5qA.png) ![image](https://hackmd.io/_uploads/HkNu3-5qC.png)