# HistNet
In our study, we present a polarity dictionary to provide an extensive polarity dictionary for Turkish that dictionary-based sentiment analysis studies have been longing for. Our primary objective is to provide a more refined and extensive polarity dictionary than the previous SentiTurkNet. In doing so, we have resorted to a different network from the referenced study. We have identified approximately 76,825 synsets from Kenet, which then were manually labeled as positive, negative or neutral by three native speakers of Turkish. Subsequently, a second labeling was further made on positive and negative words as strong or weak based on their degree of positivity or negativity.
The original paper can be found from [here](https://www.researchgate.net/publication/348264482_HisNet_A_Polarity_Lexicon_based_on_WordNet_for_Emotion_Analysis) and you can access the original repository [TurkishHistNet](https://github.com/StarlangSoftware/TurkishSentiNet).
## Dataset Details
In this study, we have identified approximately 76,825 synsets from Kenet. Subsequently, all of these synsets were manually labeled as positive, negative or neutral by three native speakers of Turkish. Following the first labelling, a second labelling process was conducted for the words which were labeled as positive and negative in the first round. To be more specific, the words were re-labeled based on the degree of their positivity or negativity as strong or weak.
Following table shows the number of synsets belonging to each category:
| Polarity Level | # of SynSets |
| -------- | -------- |
| Strongly positive (1.00) | 1038 |
| Very positive (0.75) | 451 |
| Positive (0.50) | 456 |
| Weakly positive (0.25) | 1234 |
| Weakly positive (0.25) | 1234 |
| Objective (0.00) | 65767 |
| Strongly negative (-1.00) | 4430 |
| Very negative (-0.75) | 1465 |
| Negative (-0.50) | 1238 |
| Weakly negative (-0.25) | 3360 |
### Samples
Single sample from the dataset is shown below:
```
<SYNSET><ID>TUR10-0754670</ID><PSCORE>0.0</PSCORE><NSCORE>0.0</NSCORE></SYNSET>
```
```ID``` is the ID of the sysnset in the KeNet dataset, ```PSCORE``` and ```NSCORE``` values shows the positive and negative scores of the synset consequtively.
### Fields
Explain the fields of the instances.
| field | dtype |
|----------|------------|
| ID | string |
| PSCORE | float |
| NSCORE | float |
## Dataset Creation
### Curation Rationale
Motivation behind creating this dataset is explained in the following manner:
> In recent years, sentiment analysis studies have gained significance in NLP applications. Currently, popular sentiment analysis applications frequently employ data regarding product interpretation, film interpretation, service evaluation and political events, mostly extracted from social media platforms. The aim of sentiment analysis is to reveal all emotions and commentary present in the data examined. There are several applicable methods for this purpose, one of which is the dictionary-based method where a polarity dictionary is employed.
>
> Exploiting a dictionary-based method necessitates the construction of a specific polarity dictionary in the same language as the data-to-be analyzed. The reason behind this necessity stems from the improbability of creating a universal polarity dictionary due to both grammatical and cultural asymmetries between languages. For instance, a certain historical event can have positive connotations in one culture and negative connotations in another culture. Thus, it is an essential step to create a language specific polarity dictionary.
>
> In our study, we present a polarity dictionary to provide an extensive polarity dictionary for Turkish that dictionary-based sentiment analysis studies have been longing for. Our primary objective is to provide a more refined and extensive polarity dictionary than the previous SentiTurkNet.
### Data Source
Synsets are taken from KeNet, which uses Contemporary Dictionary of Turkish (CDT) (2011’s print) published by the Turkish Language Institute (TLI) as data source.
### Annotations
Main part of the annotation process is given as follows in the paper:
> As the first step of our project, we have identified approximately 76,825 synsets from Kenet. Subsequently, all of these synsets were manually labeled as positive, negative or neutral by three native speakers of Turkish. Following the first labelling, a second labelling process was conducted for the words which were labeled as positive and negative in the first round. To be more specific, the words were re-labeled based on the degree of their positivity or negativity as strong or weak. There was no second labeling on objective words.
For the complete annotation process, please refer to [original paper](https://www.researchgate.net/publication/348264482_HisNet_A_Polarity_Lexicon_based_on_WordNet_for_Emotion_Analysis).
### Quality
Fleiss’s Kappa values are calculated for 3 different annotators:
| Polarity | Kappa | Strength |
| -------- | -------- | -------- |
| Positive | 0.618 | Good |
| Negative | 0.652 | Good |
| Polarity | Annotator | Kappa | Strength |
| -------- | -------- | -------- | -------- |
| Positive | 1-2 | 0.694 | Good |
| Positive | 1-3 | 0.461 | Moderate |
| Positive | 2-3 | 0.695 | Good |
| Negative | 1-2 | 0.720 | Good |
| Negative | 1-3 | 0.534 | Moderate |
| Negative | 2-3 | 0.701 | Good |
## Additional Information
### Version
This dataset is taken from the original repository with commit id ```e2933f8``` on 16 Oct 2022.
### Dataset Curators
Merve Özçelik, Bilge Nas Arıcan, Özge Bakay, Elif Sarmış, Nilgun Güler Bayazıt, Özlem Ergelen, Olcay Taner Yıldız
### Citation Information
Please cite the following paper if you found this dataset useful:
Merve Özçelik, Bilge Nas Arıcan, Özge Bakay, Elif Sarmış, Özlem Ergelen, Nilgün Güler Bayezit, and Olcay Taner Yıldız. 2021. HisNet: A Polarity Lexicon based on WordNet for Emotion Analysis. In Proceedings of the 11th Global Wordnet Conference, pages 157–165, University of South Africa (UNISA). Global Wordnet Association.
```
@inproceedings{ozcelik-etal-2021-hisnet,
title = "{H}is{N}et: A Polarity Lexicon based on {W}ord{N}et for Emotion Analysis",
author = {{\"O}z{\c{c}}elik, Merve and
Ar{\i}can, Bilge Nas and
Bakay, {\"O}zge and
Sarm{\i}{\c{s}}, Elif and
Ergelen, {\"O}zlem and
Bayezit, Nilg{\"u}n G{\"u}ler and
Y{\i}ld{\i}z, Olcay Taner},
booktitle = "Proceedings of the 11th Global Wordnet Conference",
month = jan,
year = "2021",
address = "University of South Africa (UNISA)",
publisher = "Global Wordnet Association",
url = "https://aclanthology.org/2021.gwc-1.18",
pages = "157--165",
abstract = "Dictionary-based methods in sentiment analysis have received scholarly attention recently, the most comprehensive examples of which can be found in English. However, many other languages lack polarity dictionaries, or the existing ones are small in size as in the case of SentiTurkNet, the first and only polarity dictionary in Turkish. Thus, this study aims to extend the content of SentiTurkNet by comparing the two available WordNets in Turkish, namely KeNet and TR-wordnet of BalkaNet. To this end, a current Turkish polarity dictionary has been created relying on 76,825 synsets matching KeNet, where each synset has been annotated with three polarity labels, which are positive, negative and neutral. Meanwhile, the comparison of KeNet and TR-wordnet of BalkaNet has revealed their weaknesses such as the repetition of the same senses, lack of necessary merges of the items belonging to the same synset and the presence of redundant narrower versions of synsets, which are discussed in light of their potential to the improvement of the current lexical databases of Turkish.",
}
```
Uploaded and documented by Arda Goktogan: `ardagoktogan gmail com`.