Sentence-T5 (ST5): Scalable Sentence Encoders from Pre-trained Text-to-Text Models

# Sentence-T5 (ST5): Scalable Sentence Encoders from Pre-trained Text-to-Text Models ###### tags: ```筆記```, ```NLP```, ```ACL 2022``` ## Abstract - Motivation: Exploring sentence embeddings from T5 models, including the impact of scaling up to 11B parameters. - This paper is the first to explore sentence embeddings from T5, including the creation of the SentGLUE benchmark for sentence representation evaluation. ## Introduction - Sentence embeddings are crucial for many language processing tasks. The paper explores generating these embeddings from T5 models. - Investigates three methods to construct ST5 models using the T5 encoder and encoder-decoder, with significant performance improvements noted. ## Methodology - Encoder-only and Encoder-decoder methods - ![image](https://hackmd.io/_uploads/ry_CGQaR6.png) - (b), \(c): pooling strategies widely used in encoder-only pre-trained models such as BERT. - (d): Unlike BERT models, **T5 models do not have a ‘CLS’ token** at the beginning of each sentence. For T5 encoder-decoder models, the authors **assume** the decoder is aware of the semantics of the entire in- put sentence when generating its first token predic- tion; and if so, the first decoder output embeddings (i.e. input to the softmax layer) might **naturally capture the sentence semantics**. - Introduces a new sentence representation transfer benchmark, SentGLUE, extending SentEval with tasks from GLUE for comprehensive evaluation. ## Experiments - Datasets: Utilized SentEval, SentGLUE, and various GLUE benchmark tasks. - Metrics: Used classification accuracy and **spearman correlation** to evaluate performance on sentence transfer tasks and STS tasks, respectively. ## Takeaways - 即使未微調，僅使用編碼器的ST5模型在句子轉移任務上表現良好，超越了當前的最佳模型。 - Encoder-decoder的句子嵌入模型在STS上建立了新的最先進水平。 - 對T5風格的預訓練模型進行微調時，對比學習特別有效，特別是使用我們提出的兩階段對比學習方法。 - 通過使用對比損失對ST5進行更長時間和更多數據的訓練，可以在句子轉移和STS任務上持續改進。 - 創建了一個新的句子表示轉移基準——SentGLUE，擴展了SentEval句子評估工具包，涵蓋了GLUE基準中的九項任務。 > The contents shared herein are quoted verbatim from the original author and are intended solely for personal note-taking and reference purposes following a thorough reading. Any interpretation or annotation provided is strictly personal and does not claim to reflect the author's intended meaning or context.