Semantic Textual Similarity (STS) benchmark Turkish (STSb-TR) dataset is the machine translated version of English STS benchmark dataset using Google Cloud Translation API. No human corrections have been made to the translations. The official website for the STSb Turkish dataset can be found at STSb-TR.
The dataset consists of 8628 machine-translated Turkish sentence pairs. In this dataset, each sentence pair was annotated by crowdsourcing and assigned a semantic similarity score. Scores range from 0 (no semantic similarity) to 5 (semantically equivalent).
An example from STSb-TR dataset is given below.
Example:
field | dtype | Explanation |
---|---|---|
genre | string | Genre of the text |
dataset | string | Original name of the dataset |
year | string | Year of the original dataset |
sid | string | Sentence id |
score | float | Semantic similarity score. 0.0 is the lowest and 5.0 is the highest score. |
sentence1 | string | First sentence |
sentence2 | string | Second sentence |
Training | Validation | Test | Total |
---|---|---|---|
5749 | 1500 | 1379 | 8628 |
Semantic textual similarity studies are common in English. These studies are based on datasets that have been annotated by humans. However, annotation is a costly and time-consuming task. Recently, with the increasing advancement in machine translation, it has become possible to use datasets by translating them from one language to another.
The original STSb dataset is a selection of the English datasets used in SemEval STS studies between 2012 and 2017. It includes text from image captions, news headlines, and user forums.
Each sentence pair was annotated by crowdsourcing via Amazon Mechanical Turk. Five scores were collected for each pair and gold scores were obtained by averaging them.
The quality of translations was tested as follows: We selected 50 sentence pairs (100 sentences) randomly, considering the percentage of the categories in the dataset. So, 6, 19 and 25 pairs chosen from forum, caption and news categories respectively. These sentences were translated by three native Turkish speakers who are fluent in English. We evaluated quality of the system translations with the three references and found BLEU score as 60.21 which shows that our system translations can be considered as very high quality translations. Therefore, no changes have been made to the translations.
This version of the dataset is retrieved from the original github repository at 2022.04.30 from commit id: 5546780
“Published by Figen Beken Fikri, Kemal Oflazer, Berrin Yanıkoğlu.”
Please cite the following paper if you found this dataset useful:
Figen Beken Fikri, Kemal Oflazer, Berrin Yanıkoğlu. Semantic Similarity Based Evaluation for Abstractive News Summarization , In Proceedings of ACL-IJCNLP Workshop GEM: Natural Language Generation, Evaluation, and Metrics, August 6, 2021.