Turkish FrameNet

# Turkish FrameNet Introduced in 1997, FrameNet (Lowe, 1997; Baker et al., 1998; Fillmore and Atkins, 1998; Johnson et al., 2001) has been developed by the International Computer Science Institute in Berkeley, California. It is a growing computational lexicography project that offers in-depth semantic information on English words and predicates. Based on the theory of Frame Semantics by Fillmore (Fillmore and others, 1976; Fillmore, 2006), FrameNet offers semantic information on predicate-argument structure in a way that is loosely similar to wordnet (Kilgarriff and Fellbaum, 2000). In FrameNet, predicates and related lemmas are categorized under frames. The notion of frame here is thoroughly described in Frame Semantics as a schematic representation of an event, state or relationship. These semantic information packets called frames are constituted of individual lemmas (also known as Lexical Units) and frame elements (such as the agent, theme, instrument, duration, manner, direction etc.). Frame elements can be described as semantic roles that are related to the frame. Lexical Units, or lemmas, are linked to a frame through a single sense. For instance, the lemma ”roast” can mean to criticise harshly or to cook by exposing to dry heat. With its latter meaning, ”roast” belongs to the Apply Heat frame. In this version of Turkish FrameNet, we aimed to release a version of Turkish FrameNet that captures at least a considerable majority of the most frequent predicates, thus offering a valuable and practical resource from day one. Because Turkish is a low-resource language, it was important to ensure that FrameNet had enough coverage that it could be incorporated into NLP solutions as soon as it is released to the public. The original paper can be found from [here](https://aclanthology.org/2021.gwc-1.14.pdf) and you can access the original repository [TurkishFrameNet](https://github.com/StarlangSoftware/TurkishFrameNet). ## Dataset Details In this study, a total number of 139 Frames in 8 domains were created. 16 of these frames were created specifically for Turkish while the remaining 123 are translated from English FrameNet. These frames include a total number of 2769 synsets (See Table). As we used Turkish WordNet and PropBank’s repositories, the Lexical Units were made of wordnet synsets. Thus some LUs contain more than one predicate. The total number of predicates annotated in this study is 4080. In other words, 4080 predicates were annotated into their respective frames. Sample sentences of all were marked up for the specific roles in them. | Attribute | Count | | -------- | -------- | | Total Frames | 139 | | Unique Frames | 16 | | Synsets (LUs) | 2561 | | Individual Predicates | 4080 | | Frame Elements | 203 | ### Samples In FrameNet, predicates and related lemmas are categorized under frames. The notion of frame here is thoroughly described in Frame Semantics as a schematic representation of an event, state or relationship. These semantic information packets called frames are constituted of individual lemmas (also known as Lexical Units) and frame elements (such as the agent, theme, instrument, duration, manner, direction etc.). Frame elements can be described as semantic roles that are related to the frame. Lexical Units, or lemmas, are linked to a frame through a single sense. For instance, the lemma ”roast” can mean to criticise harshly or to cook by exposing to dry heat. With its latter meaning, ”roast” belongs to the Apply Heat frame. The structure of a sample synset is as follows: ``` <FRAME NAME="Apply_Heat"> <LEXICAL_UNITS> <LEXICAL_UNIT>TUR10-0942600</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-0271270</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-0458750</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-0811060</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-1175470</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-0943830</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-0354260</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-1154650</LEXICAL_UNIT> <LEXICAL_UNIT>TUR10-1196810</LEXICAL_UNIT> ... </LEXICAL_UNITS> <FRAME_ELEMENTS> <FRAME_ELEMENT>Co-participant</FRAME_ELEMENT> <FRAME_ELEMENT>Container</FRAME_ELEMENT> <FRAME_ELEMENT>Cook</FRAME_ELEMENT> <FRAME_ELEMENT>Degree</FRAME_ELEMENT> <FRAME_ELEMENT>Duration</FRAME_ELEMENT> <FRAME_ELEMENT>Food</FRAME_ELEMENT> <FRAME_ELEMENT>Heating_instrument</FRAME_ELEMENT> <FRAME_ELEMENT>Manner</FRAME_ELEMENT> <FRAME_ELEMENT>Means</FRAME_ELEMENT> <FRAME_ELEMENT>Medium</FRAME_ELEMENT> <FRAME_ELEMENT>Place</FRAME_ELEMENT> <FRAME_ELEMENT>Purpose</FRAME_ELEMENT> <FRAME_ELEMENT>Temperature_setting</FRAME_ELEMENT> <FRAME_ELEMENT>Time</FRAME_ELEMENT> </FRAME_ELEMENTS> </FRAME> ``` Note that to make this dataset compatible with Turkish Wordnet KeNet, Lexical Unit IDs present in the KeNet are used. ### Fields Explain the fields of the instances. | field | dtype | |----------|------------| | NAME | string | | LEXICAL_UNIT | string | | FRAME_ELEMENT | string | ## Dataset Creation ### Curation Rationale Motivation behind creating this dataset is explained in the following manner: > "With this study, we aim to take the first step towards creating a comprehensive and coherent Turkish FrameNet that is able to illustrate the semantic richness and the typological properties of Turkish language. We intend to provide a certain level of correspondence between Turkish FrameNet and English FrameNet to allow using Turkish FrameNet in machine translation tasks and various other multilingual NLP processes. Another aspiration of ours is to build a FrameNet for Turkish that can be interconnected with other NLP resources in Turkish like PropBank (Kara et al., 2020) and WordNet (KeNet) (Bakay et al., 2019; Ehsani, 2018; Ehsani et al., 2018; Parlar et al., 2019; Bakay et al., 2020) in order to create state-of-the-art parsers, semantic role labelling tools and similar NLP applications with high accuracy and speed." ### Data Source During the annotation of lexical units into frames, annotators utilize from English FrameNet, TRopBank and KeNet. During the creation of example sentences, TDK dictionary is used or annotators come up with their novel examples. ### Annotations Main part of the annotation process is given as follows in the paper: > In our study, we opted out of this workflow. Instead, we divided annotators to four teams of two. Each annotator was given a domain. Their duty was creating frames within that domain by translating and adopting related frames from English FrameNet. For the complete annotation process, please refer to [original paper](https://aclanthology.org/2021.gwc-1.14.pdf). ### Quality In order to ensure inter-annotator agreement, annotator team maintain strong collobration during the annotation process and second check process is applied for every frame to eleminate inconsistancies. For the complete annotation process, please refer to [original paper](https://aclanthology.org/2021.gwc-1.14.pdf). ## Additional Information ### Version This dataset is taken from the original repository with commit id ```a63bac3``` on 16 Oct 2022. ### Dataset Curators Büşra Marşan, Neslihan Kara, Merve Özçelik, Bilge Nas Arıcan, Neslihan Cesur, Aslı Kuzgun, Ezgi Sanıyar, Oğuzhan Kuyrukçu, Olcay Taner Yıldızç ### Citation Information Please cite the following paper if you found this dataset useful: Büşra Marşan, Neslihan Kara, Merve Özçelik, Bilge Nas Arıcan, Neslihan Cesur, Aslı Kuzgun, Ezgi Sanıyar, Oğuzhan Kuyrukçu, and Olcay Taner Yildiz. 2021. Building the Turkish FrameNet. In Proceedings of the 11th Global Wordnet Conference, pages 118–125, University of South Africa (UNISA). Global Wordnet Association. ``` @inproceedings{marsan20, title = {{B}uilding the {T}urkish {F}rame{N}et}, year = {2021}, author = {B. Marsan and N. Kara and M. Ozcelik and B. N. Arican and N. Cesur and A. Kuzgun and E. Saniyar and O. Kuyrukcu and O. T. Y{\i}ld{\i}z}, booktitle = {Proceedings of GWC 2021} } ``` Uploaded and documented by Arda Goktogan: `ardagoktogan gmail com`.