90 Column Writings dataset contains 270 Turkish column writings from 9 different authors, each has 10 coloumn writings. Dataset is genereated by Kemik Natural Language Processing Group.
This dataset is created for Text2arf text presentation library, presented in the Ender Can et. al. The average length of texts is 466 words.
A sample instance is presented below.
Example:
17 bin 500 gaziyi defterden sildiler MİLLİ Savunma Bakanlığı, yeni bir yönerge çıkararak, sosyal güvencesi olan gazilerden Silahlı Kuvvetler'e ait sağlık muayene fiş ve cüzdanlarını iade etmelerini istedi. Toplam 42 bin 227 savaş gazisinden 17 bin 500'ünü kapsayan bu kararla birlikte, gazilerin askeri hastanelerde ücretsiz muayene olma, tedavi görme ve ilaç katkı payı alma hakkı ellerinden alınmış oldu. TÜRK Silahlı Kuvvetleri'nin (TSK) sağlık muayene fişleri ve sağlık cüzdanlarıyla ilgili yönergesinde değişiklik yapan Milli Savunma Bakanlığı, bir anda 42 bin 227 savaş gazisinden 17 bin 500'ünü defterden sildi. Yeni yönergeye göre, SSK, Bağ-Kur ve Emekli Sandığı gibi kurumların herhangi birinden sosyal güvencesi olan gaziler, Silahlı Kuvvetler'e ait sağlık muayene fiş ve cüzdanlarını iade etmek zorunda kalacak. Bu da, gazilerin askeri hastanelerden ücretsiz yararlanma hakkının elinden alınmasına yol açacak. Bugüne kadar herhangi bir ayrım yapılmaksızın asker, askeri personel ve emeklilerine tanınan tüm sağlık hizmetlerinden gazi ve eşleri de yararlanabiliyordu. Gaziler, askeri hastanelerde ücretsiz muayene olup, tedavi görebiliyor ve ilaç katkı payı alabiliyordu. HAKLARINI KAYBETTİLER Ancak, Milli Savunma Bakanlığı Türk Silahlı Kuvvetleri'nin sağlık muayene fiş ve sağlık cüzdanlarıyla ilgili yönergesinde değişiklik yaparak, gazilere tanınan sağlık hizmetlerinde bazı kısıtlamalara gitti. Bundan böyle, TSK'ya ait sağlık muayene fişi ve cüzdanı kullanma hakkından sadece, hiç bir sağlık güvencesi olmayan gaziler yararlanabilecek. SSK, Bağ-Kur ve Emekli Sandığı gibi herhangi bir sosyal güvenlik kurumuna tabii olan gazi ve gazi eşleri ise artık bu haktan yararlanamayacak. Sosyal güvencesi olan gazi ve eşlerine bundan böyle, sağlık muayene fişi ve sağlık cüzdanı da verilmeyecek. Ayrıca, daha önce verilmiş olan muayene fişi ve sağlık cüzdanlarını da askerlik şubelerine iade etmek zorunda kalacaklar. GAZİLERE ÇOK GÖRDÜLER Arkadaşımız Ayşegül Akyarlı, yeni yönergeyle, Milli Savunma Bakanlığı'nın 'defterden sildiği' SSK'dan emekli 72 yaşındaki Kore Gazisi Ömer Korkut'un şikayetiyle ilgili olarak bir araştırma yaptı. Bu araştırmaya göre, şu anda Türkiye'de 20'si Kurtuluş Savaşı, 9 bin 486'sı Kore Savaşı, 32 bin 721'i de Kıbrıs Savaşı olmak üzere toplam 42 bin 227 savaş gazi bulunuyor. Bunların eş ve dul eşleriyle birlikte gazilere tanınan haklardan yararlananların sayısı ise, 53 bin 36'ya ulaşıyor. Yeni yönergeden, 17 bin 500'ü gazi olmak üzere toplam 22 bin kişi etkileniyor. Şimdi, askerlik şubelerinden sosyal güvencesi olan bu 22 bin gazi ve gazi eşine ayrı ayrı mektup gönderilerek, ellerindeki Silahlı Kuvvetler'e ait muayene fişi ve sağlık karnelerini iade etmeleri isteniyor. Erhan Abiyle bir ilgim yok BAZI bilgisayar firmaların oluşturduğu Bilgisayar Kullanıcılarını Bilinçlendirme Platformu'nun (BKM) geçtiğimiz günlerde gazetelere verdiği ilanlar, sizin de dikkatinizi çekmiştir. Bu ilanlarda, toplama bilgisayarlarla ilgili tüketici şikayetlerinin iletileceği bir de internet adresi veriliyor. Hürriyet'in Tüketici Köşesi'nden esinlenerek oluşturulan bu www.tuketicinin-erhanabisi.com adlı siteyle hiçbir ilgim yoktur. Sadece toplama bilgisayarlar için değil, her türlü tüketici şikayetiyle ilgili olarak bize ulaşmak isterseniz, www.erkanabi.com adresini kullanabilir, tuketici@hurriyet.com.tr'den de e-postalarınızı gönderebilirsiniz.
Each file presents a coloumn writing and coloumn writings belong to same author are contained in the same directory.
No split is provided by the dataset creators.
The main goal for this dataset is text classification by their authors.
The authors gathered the news from internet news, collected between 2004-2005.
All the news articles presented are already published to the public. Even though some personal information might be presented in the magazine articles, all of the present information is in a legal framework.
This dataset is part of an effort to encourage text classification research in languages other than English. Such work increases the accessibility of natural language technology to more regions and cultures.
The data included here are from the news. Some of the presented articles may have been disclaimed.
Published by Ender Can, Mehmet Fatih Amasyalı
@article{ender2016Text2arff,
title={Text2arff: A text representation library},
author={Can, E and Amasyalı, MF},
journal={SİU},
pages={197-200},
year={2016},
address={Zonguldak, Turkey},
doi={10.1109/SIU.2016.7495711}
}
In this paper, we present and explain TRopBank “Turkish PropBank v2.0”. PropBank is a hand-annotated corpus of propositions which is used to obtain the predicate-argument information of a language. Predicate-argument information of a language can help understand semantic roles of arguments. “Turkish PropBank v2.0”, unlike PropBank v1.0, has a much more extensive list of Turkish verbs, with 17.673 verbs in total. The original paper can be found from here and you can access the original repository TurkishPropBank. Dataset Details For TRropBank, a total of 17,691 verbs were annotated. As the data suggests, unaccusative verbs that require a patient or theme in the ARG1 column constitute roughly 15.1% of all the annotated verbs (see Table). Based on the data, it can be inferred that Turkish has an evident preference for verbs that require an ARG0 over ones that require an ARG1 as their subject. Moreover, we can see that a significant portion of Turkish verbs, 47.9% to be exact, have the transitive framework. Turkish displays an observable preference regarding transitivity. Furthermore, having predicates that do not require any arguments, Turkish diverges from the majority of the languages whose PropBanks have been reviewed in Section 2 in the paper. Even though predicates without arguments (idiomatic structures) make up less than 1% of the total, the existence of such a divergence is significant.
Oct 16, 2022In our study, we present a polarity dictionary to provide an extensive polarity dictionary for Turkish that dictionary-based sentiment analysis studies have been longing for. Our primary objective is to provide a more refined and extensive polarity dictionary than the previous SentiTurkNet. In doing so, we have resorted to a different network from the referenced study. We have identified approximately 76,825 synsets from Kenet, which then were manually labeled as positive, negative or neutral by three native speakers of Turkish. Subsequently, a second labeling was further made on positive and negative words as strong or weak based on their degree of positivity or negativity. The original paper can be found from here and you can access the original repository TurkishHistNet. Dataset Details In this study, we have identified approximately 76,825 synsets from Kenet. Subsequently, all of these synsets were manually labeled as positive, negative or neutral by three native speakers of Turkish. Following the first labelling, a second labelling process was conducted for the words which were labeled as positive and negative in the first round. To be more specific, the words were re-labeled based on the degree of their positivity or negativity as strong or weak. Following table shows the number of synsets belonging to each category: Polarity Level
Oct 16, 2022This dataset is comprehensive wordnet for Turkish. KeNet includes 77,330 synsets and it has both intralingual semantic relations and is linked to PWN through interlingual relations. The original paper can be found from here and you can access the original repository TurkishWordNet. Dataset Details An exemplary set of synsets from KeNet is given in Table 1. In this table, examples of the four most frequent parts of speech in KeNet are listed, i.e., noun, adjective, verb and adverb, respectively. For each of these examples, the first column shows the ID of the synset. The characters that are separated with "-" from the ID gives the POS of the synset (n for noun, v for verb, a for adjective, adv for adverb). The second column lists the synset members; the synset members that are listed in the same synset are synonyms. The third column demonstrates the definitions and lastly, the fourth column presents an exemplary sentence (if there is any) including one of the synset members. Synset ID Synset Members Definition
Oct 16, 2022Introduced in 1997, FrameNet (Lowe, 1997; Baker et al., 1998; Fillmore and Atkins, 1998; Johnson et al., 2001) has been developed by the International Computer Science Institute in Berkeley, California. It is a growing computational lexicography project that offers in-depth semantic information on English words and predicates. Based on the theory of Frame Semantics by Fillmore (Fillmore and others, 1976; Fillmore, 2006), FrameNet offers semantic information on predicate-argument structure in a way that is loosely similar to wordnet (Kilgarriff and Fellbaum, 2000). In FrameNet, predicates and related lemmas are categorized under frames. The notion of frame here is thoroughly described in Frame Semantics as a schematic representation of an event, state or relationship. These semantic information packets called frames are constituted of individual lemmas (also known as Lexical Units) and frame elements (such as the agent, theme, instrument, duration, manner, direction etc.). Frame elements can be described as semantic roles that are related to the frame. Lexical Units, or lemmas, are linked to a frame through a single sense. For instance, the lemma ”roast” can mean to criticise harshly or to cook by exposing to dry heat. With its latter meaning, ”roast” belongs to the Apply Heat frame. In this version of Turkish FrameNet, we aimed to release a version of Turkish FrameNet that captures at least a considerable majority of the most frequent predicates, thus offering a valuable and practical resource from day one. Because Turkish is a low-resource language, it was important to ensure that FrameNet had enough coverage that it could be incorporated into NLP solutions as soon as it is released to the public. The original paper can be found from here and you can access the original repository TurkishFrameNet. Dataset Details In this study, a total number of 139 Frames in 8 domains were created. 16 of these frames were created specifically for Turkish while the remaining 123 are translated from English FrameNet. These frames include a total number of 2769 synsets (See Table). As we used Turkish WordNet and PropBank’s repositories, the Lexical Units were made of wordnet synsets. Thus some LUs contain more than one predicate. The total number of predicates annotated in this study is 4080. In other words, 4080 predicates were annotated into their respective frames. Sample sentences of all were marked up for the specific roles in them.
Oct 16, 2022or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up