# SemEval-2016 ABSA Restaurant Reviews-Turkish Train Data (Subtask 1)
This dataset is a collection of restaurant reviews. The main goal of this dataset creation is to be used in training of Aspect Based Sentiment Analysis models.
## Dataset Details
This dataset consists of 320 restaurant reviews with 1317 sentences. Each sentence was annotated in terms of mentioned aspect and sentiment of the given aspect, on token level.
There are 12 different aspects.
| Aspects Categories|
-----------
AMBIENCE#GENERAL
DRINKS#PRICES
DRINKS#QUALITY
DRINKS#STYLE_OPTIONS
FOOD#PRICES
FOOD#QUALITY
FOOD#STYLE_OPTIONS
LOCATION#GENERAL
RESTAURANT#GENERAL
RESTAURANT#MISCELLANEOUS
RESTAURANT#PRICES
SERVICE#GENERAL
Each aspect could be mentioned with possible 3 different sentiments.
| Sentiments |
-----
Positive
Neutral
Negative
### Samples
Samples of data instances from all types of data present in the dataset.
Example:
```
<Review rid="1000">
<sentences>
<sentence id="1000:0">
<text>Manzara sahane evet ama servis rezalet.</text>
<Opinions>
<Opinion target="servis" category="SERVICE#GENERAL" polarity="negative" from="24" to="30"/>
<Opinion target="Manzara" category="AMBIENCE#GENERAL" polarity="positive" from="0" to="7"/>
</Opinions>
</sentence>
.
.
.
```
### Fields
Explain the fields of the instances.
| field | dtype | content |
|----------|--------|---------
| text | string |Raw text of sentence.
| target | string |Text span where an aspect is mentioned.
| category|string |Category of mentioned aspect.
| polarity | string |Opinion of reviewer on the mentioned aspect.
| from | integer |Starting index of *target*.
| to | integer |Ending index of *target*.
### Splits
There were train and test splits which were officially released by shared task committee.
Example:
| | Training | Test |
|-|-----------|--------|
|**# of Reviews** | 300 |39
|**# of Sentences**| 1104 | 124
## Dataset Creation
### Curation Rationale
Explain the motivation behind in creating this dataset. Example:
"The dataset is motivated by the desire to advance sentiment analysis and text classification in other (non-English) languages."
### Data Source
Indicate where the data is gathered from. Example:
"The authors gathered the reviews from the marketplaces in the US, Japan, Germany, France, Spain, and China for the English, Japanese, German, French, Spanish, and Chinese languages, respectively."
### Annotations
Explain the annotation process and the annotators. Indicate if a procedure is different during the annotation process. Example:
"Each of the fields included are submitted by the user with the review or otherwise associated with the review. No manual or machine-driven annotation was necessary."
### Quality
Comment on the dataset quality. Include details about the cleanness of the dataset, and the quality of the annotations (try to find interannotator agreement info).
"We observed that this dataset contains duplications. The text samples seems clean. The interannotator agreement (IAA) rate was measured by the authors, they reported `cohen's kappa = 0.83` as an IAA rate."
### Personal and Senstive Information
Indicate if any personal and/or sensitive information is present in the dataset. Example:
"Amazon Reviews are submitted by users with the knowledge and attention of being public. The reviewer ID's included in this dataset are quasi-anonymized, meaning that they are disassociated from the original user profiles. However, these fields would likely be easy to deannoymize given the public and identifying nature of free-form text responses."
## Considerations
### Social Impact of Dataset
The expected impact of the dataset on the society. What is aimed to be changed within the society? Example:
"This dataset is part of an effort to encourage text classification research in languages other than English. Such work increases the accessibility of natural language technology to more regions and cultures. Unfortunately, each of the languages included here is relatively high resource and well studied."
### Discussion of Biases
Indicate if any bias is present within the dataset. Example:
"The data included here are from unverified consumers. Some percentage of these reviews may be fake or contain misleading or offensive language."
### Other Known Limitations
Point out the limitations that are not or not appropriate to be specified above. Example:
"The dataset is constructed so that the distribution of star ratings is balanced. This feature has some advantages for purposes of classification, but some types of language may be over or underrepresented relative to the original distribution of reviews to acheive this balance."
## Additional Information
### Dataset Curators
List the names of the creators of the dataset. Example:
"Published by Phillip Keung, Yichao Lu, György Szarvas, and Noah A. Smith. Managed by Amazon."
### Citation Information
Include a way of citing the information given with the dataset. Example:
Please cite the following paper (arXiv) if you found this dataset useful:
Phillip Keung, Yichao Lu, György Szarvas and Noah A. Smith. “The Multilingual Amazon Reviews Corpus.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020.
```
@inproceedings{marc_reviews,
title={The Multilingual Amazon Reviews Corpus},
author={Keung, Phillip and Lu, Yichao and Szarvas, György and Smith, Noah A.},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
year={2020}
}
```