# FabulaNet Literature and relevant resources #
```yaml
author: Yuri
updated: Nov. 17th
```
---
#### 1. **EMNLP21** ####
##### **EMNLP21 papers** #####
**Guilt by Association: Emotion Intensities in Lexical Representations**
https://aclanthology.org/2021.emnlp-main.781.pdf
This is a paper about the emotional information implied in words' distributional profiles. Main take-away: using word embeddings can be even more efficient than using machine-learned lexica. Can be very important if we want to go language-agnostic and unsupervised.
**On the cross-lingual transferability of contextual sense embeddings**
https://aclanthology.org/2021.mrl-1.10.pdf
Might be interesting we want to go language-agnostic with contextual embeddings.
**Language-Agnostic Representation from Multilingual Sentence Encoders for Cross-Lingual Similarity Estimation**
https://aclanthology.org/2021.emnlp-main.612.pdf
An improved version of language-agnostic sentence embeddings, à la LASER. Code available, but no pretrained models.
**Emotion Classification in German Plays with Transformer-based LMs**
https://aclanthology.org/2021.latechclfl-1.8/
**Narrative Embedding: Re-Contextualization through Attention**
https://aclanthology.org/2021.emnlp-main.105.pdf
An attempt at building narrative-aware embeddings. Similar to contextualized embeddings, but sensitive to the narrative of the context.
**Casting the Same Sentiment Classification Problem**
https://aclanthology.org/2021.findings-emnlp.53.pdf
A dataset to evaluate Sentiment Analysis models.
**Stylometric Literariness Classification: the Case of Stephen King**
https://aclanthology.org/2021.latechclfl-1.21/
The authors claim a unigram-based classifier to tell high-brow from low-brow literature tested with flying numbers. They apply it to Stephen King's novels. The closest-to-Fabula project I have found at the conf.
##### **EMNLP21 demos and systems** #####
Emnlp21 had a host of demos about interpretability of transformers and neural network. They can be relevant to Fabula when we start using such tools on the large scale coprora.
*Thermostat*
Explainability tool, apparently easy python code
https://underline.io/events/192/posters/8536/poster/38760-thermostat-a-large-collection-of-nlp-model-explanations-and-analysis-tools
*LMdiff*
Interface to compare language models. Strong affiliations
https://underline.io/events/192/posters/8536/poster/38993-lmdiff-a-visual-diff-tool-to-compare-language-models
*Datasets*
Big datasets from huggingface, easy code
https://underline.io/events/192/posters/8536/poster/38996-datasets-a-community-library-for-natural-language-processing
*T3-Vis*
Training and tuning Transformers
https://underline.io/events/192/posters/8536/poster/38998-t3-vis-visual-analytic-for-training-and-fine-tuning-transformers-in-nlp
#### 3. **Human annotations** ####
*Affect in text and speech*
Chapter 3 describes the annotation corpus of our interest (Andersen's, Grimms' and Potter's fairy tales). Each sentence is annotated for *primary emotion* and *mood*, mood being the overall "feeling" of the sentence and primary emotion required assigning this emotion to a "feeler" (often a main character of the text).
https://www.proquest.com/openview/9eaecb093476e91ddadbf0b9a0d42c54/1?pq-origsite=gscholar&cbl=18750
*Characteristics of high agreement affect annotation in text*
Describing characteristics in high-agreement annotations in an Andersen subset.
https://aclanthology.org/W10-1815.pdf
*Affect data*
Alm's annotations are available for download here:
http://people.rc.rit.edu/~coagla/affectdata/index.html
#### 4. **SentiArt** ####
A method for computing affective-aesthetic potential(AAP) and other valence scores based on vector-space models instead of an emotion dictionary.
Suggests that AAP is a better predictor of human valence ratings than other methods.
https://www.frontiersin.org/articles/10.3389/frobt.2019.00053/full
https://www.frontiersin.org/articles/10.3389/fnhum.2017.00622/full
https://www.mdpi.com/2673-2688/1/1/2/pdf
#### 5. **NLP4DH Resources** ####
(Another) diachronic English Bert project https://macberth.netlify.app
Which segments of literary works attract critics' attention? https://pages.cms.hu-berlin.de/schluesselstellen/annette/index.html?lang=en
Curveship: a framework for automatic narrative style variation https://nickm.com/curveship/
#### 6. **Relevant Papers with Relevant Datasets (from no specific conference)** ####
- Fractality in English Canonical Fiction - *almost* Fabula https://www.frontiersin.org/articles/10.3389/fpsyg.2021.599063/full#h14
- Review of Book Recommendation Systems - good review and important dataset with 5 millions ratings of GoodReads already scraped: https://aclanthology.org/R19-2009.pdf
#### 7. **Fiction-editing tools** ####
Jockers, Matthew. Marlowe : https://authors.ai/marlowe/
Suggestions based on scores of bestselling authors, said to "understand the traditions, tropes, and reader expectations that come with each genre"
What does it do?:
- Visualizes plot shape alongside the most similar of 7 core narrative archetypes (dips/highs in emotional valence) - HP: having not only ups/downs in beginning/end but throughout is better
- Shows the book in Marlowe's library with most similar plotline
- Visualizes "narrative beats" (positive/negative), where "significant points of conflict are resolved and normalcy is restored (positive beats) or where new conflict is introduced and characters are sent into new turmoil (conflict beats)" - shows interval between such beats (e.g. one beat every 10%) - HP: regular beats are better
- Visualizes "pacing", "simulates the experience readers have as they move through your narrative". Identifies changes in pace, e.g. passages with shorter sentences/passages with longer sentences compared to all others
- Characters' personality traits based on their actions
- Visualizes distribution of dialogue vs. narrative
- Visualizes primary emotions (Plutchik's 8 primary emotions)
- Major subjects, 10 major subjects compared to the distributions of bestselling books. HP: bestselling works have 2-3 major subjects and 2-3 minor subjects accounting for around 30% of overall topical make-up: "To get to 40% of the average novel, a bestseller uses only four topics. A non-bestseller, on average, uses six"
- Explicit language, flags explicit (violent/sexual) language for your consideration.
- Possible clichés, flags clichés
- Repetitive phrases, flags recurring phrases
- Visualizes sentence stats and readability scores compared to top-selling books. Average sentence lenght HP: "popular fiction market tend to have a distribution where about 70% of their sentences are between 1 and 15 words long". Also shows reading-grade level and syntactic complexity. HP: "books in the popular fiction market tend to have an average complexity score between 2.0 and 3.0".
- Frequent adverbs compared to frequencies in top-selling books. "Stephen King reminds us that “the road to hell is paved with adverbs”"
- Frequent adjectives compared to frequencies in top-selling books.
- Verb choice and passive voice compared to frequencies in top-selling books. HP: auxilary verbs should be used sparingly
- Punctuation data compared to frequencies in top-selling books.
- Possible misspellings
- Subject matter book comps. Compares book to Marlowe library neighbours in terms of subject matter.
- Stylistic book comps and authorial voice. Compares book to Marlowe library neighbours in terms of style.
#### 8. **Relevant book recommendation systems** ####
- Alharthi, H. & Inkpen, D. Study of Linguistic Features Incorporated in a Literary Book Recommender System : https://dl.acm.org/doi/pdf/10.1145/3297280.3297382
Abstract: *As reading for pleasure is declining, particularly among the young, we are motivated to investigate the textual elements that may play a role in generating high-quality book recommendations. Two recommendation algorithms were trained on more than a hundred features learned from the full text of the books. Both algorithms generate more accurate lists of suggested books than many competitive recommender systems. Exploration of the key textual elements is conducted together with qualitative analysis.*
#### 9. **Literary quality** ####
- van Peer, W. (Ed.) *The Quality of Literature.* John Benjamins Publishing Company, 2008. https://benjamins.com/catalog/lal.4
- Wang, X, B. Yucesoy, O. Varol, T. Eliassi-Rad and A. Barabási. "Success in Books: Predicting Book Sales Before Publication". EPJ Data Science 8(31), 2019, n.p.
*Reading remains a preferred leisure activity fueling an exceptionally competitive publishing market: among more than three million books published each year, only a tiny fraction are read widely. It is largely unpredictable, however, which book will that be, and how many copies it will sell. Here we aim to unveil the features that affect the success of books by predicting a book’s sales prior to its publication. We do so by employing the Learning to Place machine learning approach, that can predicts sales for both fiction and nonfiction books as well as explaining the predictions by comparing and contrasting each book with similar ones. We analyze features contributing to the success of a book by feature importance analysis, finding that a strong driving factor of book sales across all genres is the publishing house. We also uncover differences between genres: for thrillers and mystery, the publishing history of an author (as measured by previous book sales) is highly important, while in literary fiction and religion, the author’s visibility plays a more central role. These observations provide insights into the driving forces behind success within the current publishing industry, as well as how individuals choose what books to read.*
#### 10. **Literary Quality Projects** ####
- Novel Perception (UK): http://novel-perceptions.thememorynetwork.com/about/
- The Riddle of Literary Quality (NL): https://literaryquality.huygens.knaw.nl
- Impact & Fiction (NL): https://impactandfiction.huygens.knaw.nl/?page_id=2
#### 11. **Genderbias in literature** #### see: https://hackmd.io/2Vs_MdL4T6y1bqI4AdN2lg