TRAVEL: Tag-Aware Conversational FAQ Retrieval via Reinforcement Learning - HackMD

<style> img { display: block; margin-left: auto; margin-right: auto; } </style> > [Paper link](https://aclanthology.org/2023.emnlp-main.234/) ｜ EMNLP 2023 :::success **Thoughts** This study uses reinforcement learning (RL) and identifies relevant and irrelevant tags to enhance the process of retrieving FAQ questions. ::: ## Abstract In online customer service, efficiently retrieving FAQ questions is crucial. Existing methods enhance the semantic association between the user query and FAQ questions using dynamic conversation context. This study introduces the use of tags for FAQ questions to help eliminate irrelevant information. ## Background Conversational FAQ retrieval aims to find FAQ questions that align with users' intent during user-system interactions. Current methods prioritize modeling the semantic information in the conversation context. However, users may click on questions with information irrelevant to their intent due to unfamiliarity with the domain or accidental clicks. This irrelevant information, referred to as "tags," introduces noise into the conversation context and disrupts retrieval efficiency. ![image](https://hackmd.io/_uploads/ry5heJK9R.png) The **red** and **purple** texts describe the **relevant and irrelevant tags** in the context, respectively. ## Method They propose a **tag-aware reinforcement learning strategy** that models the dynamic changes in the irrelevance of tags to achieve successful FAQ retrieval with minimal interaction turns. In this study, the authors define question-answer pairs as $\text{FAQ} = \{ (q_1, a_1), \dots, (q_n, a_n) \}$. Each question $q_i$ is categorized into a set of tags $P_{q_i} = \{ p_{i1}, p_{i2}, \dots, p_{im} \}$. The conversation context at turn $t$ consists of: - $H_t$: This records the user’s queries \( u_i \). - $Q_{\text{click}}^t$: This records the questions that the user has clicked on. - $P_{\text{click}}^t$: Corresponding tags that the user has clicked on. - $Q_{\text{rej}}^t$: This records the questions that the user has ignored. - $P_{\text{rej}}^t$: Corresponding tags that the user has ignored. ![image](https://hackmd.io/_uploads/ryjKmyKqR.png) This study formulates **TRAVEL** as a multi-turn tag-aware reinforcement learning framework. It aims to learn an optimal policy $\pi^\ast = \underset{\pi \in \prod}{\arg \max} \, \mathbb{E}\left[\sum_{t=0}^T r(s_t, \mathbf{a}_t)\right]$, where the state $s_t$ captures the conversation context at turn $t$, and $\mathbf{a}_t$ indicates which FAQ question to retrieve from the candidates. Here are the five different rewards designed in the study: 1. $r_{\text{click\_suc}}$: A positive reward when the user clicks on a question. If the clicked questions contain irrelevant tags, the value of this reward is reduced. 2. $r_{\text{click\_fail}}$: A negative reward when the user does not click on any question. 3. $r_{\text{ret\_suc}}$: A strong positive reward when the user successfully retrieves their target question. 4. $r_{\text{extra\_turn}}$: A negative reward applied when the number of turns increases. 5. $r_{\text{quit}}$: A strong negative reward incurred when reaching the maximum number of turns. ### TRAVEL The **Tag-Level State Representation** component focuses on estimating irrelevant tags within the conversation context and transforming this context into a state representation. The **Conversational Retrieval Strategy Optimization** component utilizes the state to determine a strategy for FAQ retrieval using Q-Learning, with the goal of achieving accurate retrieval in minimal turns. ## Experiment ### Dataset They use their own data, which contains 72,013 conversation sessions. Each sessions formulated as $(u_i, q_i, H_i)$. There're 1449 FAQ questions and 1201 tags in the dataset. ### Baseline They compare the TRAVEL with two classes of baselines methods: 1. FAQ retrieval - BERT_TSUBAKI - SBERT_FAQ - DoQA - CombSum 2. Question Suggestion - CFAN - KnowledgeSelect - DeepSuggest ### Evaluation Metrics - **Recall@5**: Measures the proportion of relevant FAQ questions retrieved in the top 5 results. - **SR@k**: Success Rate at k, indicating the percentage of conversations where the correct FAQ is found within the top k results. - **AT (Average Turn)**: The average number of turns it takes to retrieve the correct FAQ question. - **Average Shown (AS)**: The average number of FAQ questions displayed to the user before reaching the correct one. - **hNDCG@(T, K)**: Hierarchical Normalized Discounted Cumulative Gain at T and K, evaluating the ranking quality of the FAQ questions by considering their relevance and position. ![image](https://hackmd.io/_uploads/BJ8WdJY5C.png)