140 Poem

140 Poem dataset contains 140 Turkish poems from 7 different authors, each has 20 coloumn writings. Dataset is genereated by Kemik Natural Language Processing Group.

Dataset Details

The dataset consists of 140 singly authored documents written by 7 different authors, with 20 different texts written by each author. The average length of texts is 109 words.

Samples

A sample instance is presented below.

Example:

Acının tutanakçısıyım

Acının tutanakçısıyım
Anlatıp dururum aşkları
Ayrılıkları ve o destan
Yalnızlığını ömrümüzün

Göçebe, Gezgin ve Aylak
Birmiydim aklıma gelmedi 
ir çingeneyle bir bilici
Hep ayni şeydi bildiğim

Ve serseriliğimdi aşklar
Bir masalcıydım belki de
Yaşadım o büyük serüvenleri
Yolculuklar tarihimdi benim

Acılar yaşanıyordu yurdumda
Pespese yakılıyordu kentler
Bense hep oralardaydım
Daha yangın başlamadan önce

Fields

Each file presents a poem and poem belong to same author are contained in the same directory.

Splits

No split is provided by the dataset creators.

Dataset Creation

Curation Rationale

The main goal for this dataset is text classification by their authors.

Data Source

The authors gathered the poem websites.

Considerations

Social Impact of Dataset

This dataset is part of an effort to encourage text classification research in languages other than English. Such work increases the accessibility of natural language technology to more regions and cultures. It is also important for studeies about the non-formal representations of the language.

Dataset Curators

Published by Banu Diri and Fatih Amasyali

Citation Information

@article{diri2006identifying,
  author   = {Diri B., Amasyalı M. F.},
  title    = {Identifying the poets of the anonymous poems},
  journal  = {TAINN},
  adress   = {Canakkale}
  year     = {2003}
}