Recommender Systems with Generative Retrieval

# Recommender Systems with Generative Retrieval ###### tags: ```筆記```, ```NLP``` > https://arxiv.org/pdf/2305.05065.pdf ## Abstract - Modern recommender systems typically perform large-scale retrieval by embedding queries and item candidates in a unified space, using approximate nearest neighbor search to select top candidates given a query embedding. - This paper introduces a novel generative retrieval approach, using a retrieval model that decodes the identifiers of target candidates autoregressively. Semantic IDs are created as semantically meaningful tuples of codewords for each item. A Transformer-based model is trained to predict the next item's Semantic ID in a user session, significantly outperforming current state-of-the-art models on various datasets. - **Motivation**: The question aims to solve the challenge of effectively and efficiently retrieving relevant item recommendations in large-scale systems. By proposing a novel generative retrieval approach, it seeks to enhance the performance of recommender systems, enabling them to better predict and suggest items that users are likely to interact with, even when those items have no prior interaction history. This approach utilizes semantically meaningful identifiers (Semantic IDs) for items, aiming to improve retrieval accuracy and generalization capabilities of the recommender system. ## Introduction - Recommender systems are crucial in helping users discover content across various domains such as videos, apps, products, and music. These systems typically adopt a retrieve-and-rank strategy, emphasizing the importance of the retrieval stage emitting highly relevant candidates. - This paper proposes a new paradigm for building generative retrieval models for sequential recommendation, leveraging Transformer memory as an end-to-end index for retrieval. This involves assigning Semantic IDs to items and using these IDs to train a retrieval model. ## Methodology - the Transformer Index for GEnerative Recommenders (TIGER) framework - ![image](https://hackmd.io/_uploads/SJx20htCT.png) - ![image](https://hackmd.io/_uploads/BJO2R2FA6.png) - **Semantic ID Generation**: Semantic IDs are generated by encoding item content features into a semantic embedding and quantizing this embedding into a tuple of semantic codewords. This approach allows for semantically meaningful representations of items. - **Generative Retrieval with Semantic IDs**: A generative approach is adopted where a Transformer model is trained on the sequential recommendation task using sequences of Semantic IDs. This allows for direct prediction of the next item's Semantic ID. ## Experiments - **Datasets**: Evaluations are conducted on three public real-world benchmarks from the Amazon Product Reviews dataset, specifically "Beauty", "Sports and Outdoors", and "Toys and Games". - **Metrics**: The framework's performance is assessed using top-k Recall (Recall@K) and Normalized Discounted Cumulative Gain (NDCG@K) with K = 5, 10. > STATEMENT: The contents shared herein are quoted verbatim from the original author and are intended solely for personal note-taking and reference purposes following a thorough reading. Any interpretation or annotation provided is strictly personal and does not claim to reflect the author's intended meaning or context.