# Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies ###### tags: `筆記`, `NLP` ## Abstract - Motivation: Recently, Differentiable Search Index (DSI) has been proposed for document retrieval, directly mapping queries to relevant document identifiers within a neural model. This work introduces the Semantic-Enhanced DSI model (SE-DSI), inspired by Learning Strategies in Cognitive Psychology, to advance original DSI by enhancing document identification and memorization. - This model improves retrieval performance over prevailing baselines through two main strategies: adopting Elaboration Strategies for meaningful document identification and employing Rehearsal Strategies for selecting fine-grained semantic features to improve document memorization. - ![image](https://hackmd.io/_uploads/B1CmlAFA6.png) ## Introduction - Document retrieval is crucial for search and question answering systems, with traditional algorithms facing vocabulary mismatch issues. Dense retrieval has shown effectiveness but struggles with the "index-retrieval" pipeline's complexity and optimization. - The Differentiable Search Index (DSI) paradigm offers a consolidated model for indexing and retrieval, aiming for an end-to-end optimization. However, the challenge remains in designing an effective generative model for retrieval that can efficiently encode and memorize the entire corpus within model parameters. ## Methodology - Method Name: Semantic-Enhanced Differentiable Search Index (SE-DSI) - ![image](https://hackmd.io/_uploads/B1LyxRKRa.png) - SE-DSI improves the original DSI by leveraging Learning Strategies from Cognitive Psychology, specifically through Elaborative Description (ED) and Rehearsal Contents (RCs). ED employs **query generation** techniques to create **meaningful document identifiers**, while RCs select key semantic features from documents to aid memorization. These strategies aim to enhance the model's ability to recall and retrieve documents effectively. ## Experiments - Datasets: Experiments were conducted on two representative document retrieval datasets: **MSMARCO** and **NQ**, showing significant performance improvements over strong baseline solutions. - Metrics: The evaluation employed Hit ratio (**Hits@N**) and Mean Reciprocal Rank (**MRR@N**), demonstrating the effectiveness of SE-DSI in improving retrieval performance compared to existing methods. > The contents shared herein are quoted verbatim from the original author and are intended solely for personal note-taking and reference purposes following a thorough reading. Any interpretation or annotation provided is strictly personal and does not claim to reflect the author's intended meaning or context.