270 Column Writings

# 270 Column Writings 270 Column Writings dataset contains 270 Turkish column writings from 18 different authors, each has 15 coloumn writings. Dataset is genereated by [Kemik Natural Language Processing Group](http://www.kemik.yildiz.edu.tr/). ## Dataset Details Selected texts comprise essays on politic, magazine and medical. Theaverage length of texts is 456 words. ### Samples A sample instance is presented below. Example: ``` Terörist mi, çeteci mi? 3713 sayılı kanunun 7. maddesi silahlı olmayan terör örgütleriyle ilgili düzenleme de yapmıştır. TCK'nın 168. maddesi devletin emniyetine karşı çete kurmak başlığı altında, cürümleri işlemek için cemiyet kuranları, bunlara girenleri cezalandırmaktadır. Terör ile Mücadele Yasası'nın 7. maddesindeki suçun oluşması için, iki ya da daha fazla kimsenin Anayasa'da belirtilen cumhuriyetin niteliklerini, siyasi, hukuki, sosyal, laik, ekonomik düzenini değiştirmek, devletin ülkesi ve milletiyle bölünmez bütünlüğünü bozmak, Türk devletinin veya cumhuriyetinin varlığını tehlikeye düşürmek, devlet otoritesini zaafa uğratmak veya yıkmak veya ele geçirmek, temel hak ve hürriyetleri yok etmek, devletin iç ve dış güvenliği, kamu düzenini veya genel sağlığı bozmak amacı ile birleşmesi gerekiyor Ancak, Çeçenler ile ilgili eylemlerde yukarıda belirtilen ilke ve unsurları yok etmeye yönelik bir hareket mevcut değil. Sanıkların eylemlerinin amacı, kendi ifadelerine göre Çeçenistan'da yapılan zulmü dünyaya duyurmak ve dünyanın ilgisini çekmek. ``` ### Fields Each file presents a coloumn writing and coloumn writings belong to same author are contained in the same directory. ### Splits The paper presents this dataset claims that "Training set contains 15 and test set contains 5different texts for each of 18 authors.". However, we could be able to only reach to the training part, and test split is missing. ## Dataset Creation ### Curation Rationale The main goal for this dataset is text classification by their authors. ### Data Source The authors gathered the news from [Hurriyet](www.hurriyet.com.tr) between . ### Personal and Sensitive Information All the news articles presented are already published to the public. Even though some personal information might be presented in the magazine articles, all of the present information is in a legal framework. ## Considerations ### Social Impact of Dataset This dataset is part of an effort to encourage text classification research in languages other than English. Such work increases the accessibility of natural language technology to more regions and cultures. ### Discussion of Biases The data included here are from the news. Some of the presented articles may have been disclaimed. ### Dataset Curators Published by Diri B., Amasyalı M. F ### Citation Information ``` @article{amasyali2002otomatik, title={Automatic Author Detection for Turkish Texts}, author={Amasyalı, MF and Diri, B}, journal={ICANN/ICONIP}, pages={138-141}, year={2003}, address={Istanbul} } ```