Final Project - DeepNovel

--- lang: en dir: ltr breaks: true title: Final Project - DeepNovel image: https://i.imgur.com/7P0W1kn.png tags: 04. Final Project --- <center> <img id="Demo" src="https://i.imgur.com/rm76dU6.jpg" width="100%"> </center> # DeepNovel by Tobias Becher Accompanies <a href="https://github.com/TB-DevAcc/DeepNovel">this repository</a>. ## Abstract This project is concerned with building an artificial intelligence **application**, that can **write novels and illustrate** them. In this Document we will outline the different models and techniques used to achieve a result, that should be as little distinguishable from human writing as possible. Therefore we will provide examples of generated text from short text input, as well as images, that were generated from text samples from the generated text. We will also discuss the limitations and future improvements that can be made. ## Introduction Writing has always been a very creative and complex endeavour and was therefore inherently reserved for humans. Artificial Intelligence has become better in recent years in understanding [[0]](https://openreview.net/forum?id=BygzbyHFvB), processing [[1]](https://arxiv.org/abs/1810.04805) and creating new text [[2]](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf). So far the results are certainly impressive, but can't compete with human authors because of their lack of context and overall storytelling. We aim to improve on prior work, like in OpenAis GPT2 [[3]](https://openai.com/blog/gpt-2-1-5b-release/) and in Srinivasan et als. [NLG using RL with external Rewards](https://arxiv.org/pdf/1911.11404v1.pdf), to create an application that is able to write a novel. Our application aims to provide a huge benefit for writers and authors to **rapidly decrease** their **prototyping time**. In Addition to the generated text we want to provide illustrations through text to image sythesis, like in [[4]](https://arxiv.org/pdf/1710.10916.pdf), to help authors visualize their documents better. The **challenges** that we are facing in **Text Generation** are *Specificity*, *Control*, *Storytelling* and *Consistency*. We want to achieve added Control over the output through additional Datasources from different literature genres, such as **Sci-Fi**, **Fantasy** and **Romance**, while external rewards for our RL approach should be modelled to improve storytelling. Consistency and Specificity are not addressed as of now. In the matter of **image generation** we want to improve on the *Diversity* of available Output motifs. Generation of **faces**, **fashion** and **landscapes** from text would prove to be suitable for our application. <center> <img id="IdeaFlow" src="https://i.imgur.com/NAoPhKX.png" width="100%"> </center> Figure 1: The above Chart illustrates the flow from Input to Output through various stages in our proposed process. Since the proposed application of a human-like novel writing tool with text illustration appears as a **lofty goal**, we will base our core models on existing research and implement **milestones** to ensure the completion of a proof of concept and, ideally, a minimal viable product within 3 weeks. ## Methodology ```COMING SOON``` ## POC <center> <img id="POC-Demo" src="https://i.imgur.com/hSORuLz.jpg" width="100%"> </center> <center> <img id="POC-Demo" src="https://i.imgur.com/widcR6D.jpg" width="100%"> </center> Figure 2: Demonstration of output on the DeepNovel Webapp after the novel creation process. The Text was generated from the Input "This is a" with GPT2. Illustrations have not been generated yet. ## Data ##### Text Corpora ###### Main Data [COCA](https://www.english-corpora.org/coca/) >The Corpus of Contemporary American English is a genre-balanced corpus of American English and probably the most widely-used corpus of English. >The corpus contains more than **one billion** words of text (20 million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, TV and Movies subtitles, blogs, and other web pages. ###### Auxilliary Data [OpenWebTextCorpus](https://skylion007.github.io/OpenWebTextCorpus/) >A Reproduction of OpenAI’s WebText dataset that includes Webpages, Facebook posts, reddit posts. [WordNet](https://wordnet.princeton.edu/) >WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. [NovelTM](https://github.com/tedunderwood/noveltmmeta) >Collection of 210,305 volumes of English fiction [FictionDB](https://www.fictiondb.com/) >Online Database with Fiction titles from various genres [Project Gutenberg](http://www.gutenberg.org/wiki/Category:Bookshelf) >Online Database with book titles from various languages and genres [Scifi stories text corpus](https://www.kaggle.com/jannesklaas/scifi-stories-text-corpus) >SciFi Stories collected largely from the [Pulp Magazine Archive](https://archive.org/details/pulpmagazinearchive). ##### Image Datasets ###### Main Data [COCO](http://cocodataset.org/) >Common Objects in Context contains everyday scenes with 91 object categories. ###### Auxilliary Data [ImageNet](http://www.image-net.org/) >ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synset". Currently over 14,000,000 images and 21000 synsets are indexed. [DeepFashion](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) >Contains over **800,000** diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) >Contains more than **200K** celebrity images, each with **40** attribute annotations. [Landscape Pictures](https://www.kaggle.com/arnaud58/landscape-pictures) >Contains 7 categories of Flickr Landscape images ## Related Work & Helpful Links ##### NLG [01 - Article - GPT-2](https://openai.com/blog/gpt-2-1-5b-release/) [02 - Code - 🤗 Transformers](https://github.com/huggingface/transformers) [03 - Docs - 🤗 Transformers](https://huggingface.co/transformers/) [04 - Paper - NLG with RL - Srinivasan et al. 11/2019](https://arxiv.org/pdf/1911.11404v1.pdf) ##### text2image [01 - Paper - Generative Adversarial Text to Image Synthesis](https://arxiv.org/pdf/1605.05396.pdf) [02 - Paper - StackGAN I](https://arxiv.org/pdf/1612.03242.pdf) [03 - Paper - StackGAN II](https://arxiv.org/pdf/1710.10916.pdf) [04 - Paper - MirrorGAN (text2image2text)](https://arxiv.org/pdf/1903.05854.pdf) [05 - Paper - HDGAN Photographic Text-to-Image Synthesis](https://arxiv.org/pdf/1802.09178.pdf) [06 - Paper - OPGAN Semantic Object Accuracy](https://arxiv.org/pdf/1910.13321v1.pdf) [text2emotion - Paper - Emotional Machines](https://arxiv.org/pdf/1705.07543.pdf) [Skip-thought Vectors - Paper](https://arxiv.org/pdf/1506.06726.pdf) ## Milestones Id | Milestone | Explanation | Feasibility | Business Value | Status ---|---|---|---|---|--- 1. | Text Generation | Create text from Input text | Certain | Low | :heavy_check_mark: 2. | Illustration Generation | Create images from text | Medium | Low | :x: 3. | Custom Text Generation | Improve Text Generation (Control, overall storytelling) | Low | High | :white_check_mark: 4. | Custom Illustration Generation | Adjust Image Generation for use case (Landscapes, Fashion, Faces) | Medium | Medium | :x: 5. | Improve WebApp | User management, Content for secondary sites, improve application flow, simplicity, speed | High | Medium | :heavy_check_mark: 1. Text Generation - [x] Import GPT2 - [x] Preprocess Input - [x] Generate text 2. Illustration Generation - [ ] Port StackGAN++ to TF - [ ] Import StackGAN++ - [ ] Preprocess Input - [ ] Generate image 3. Custom Text Generation - [ ] Build RL model with GPT2 base - [x] Preprocess Custom Datasets - [x] Build Input Pipeline - [ ] Train & Evaluate new model - [x] Import new model - [x] Generate new text - [ ] Include Sliders for custom control 4. Custom Illustration Generation - [ ] Preprocess custom image datasets - [ ] Build Input Pipeline - [ ] Port OPGAN to TF - [ ] Build new model with OPGAN base and SOA - [ ] Train & Evaluate new model - [ ] Generate image - [ ] Embed image in Website text 5. Improve WebApp - [x] Create Database - [x] Create User Management and profiles - [ ] Create About Us Page - [ ] Create Help Page - [ ] Create Legal Pages - [ ] Create Jobs Page - [ ] Create API - [x] Improve Speed - [x] Improve Userflow <iframe class="clickup-embed" src="https://share.clickup.com/c/h/4-3749169-5/7b6265ad34e0489" frameborder="0" onmousewheel="" width="100%" height="700px" style="background: transparent; border: 1px solid #ccc;"></iframe> ## Results ##### Text Generation 01 "This is a" :arrow_right: *see above* "I want to write a Novel, but" :arrow_right: "*I want to write a Novel, but* only to my wife," says an artist. "My friend is pregnant, and I can't, for some reason or another, let him get married. I'm stuck. I love my wife very much, and would really love to give birth to her, but that's not going to happen. So, we go back to the drawing room and talk about what we will, if we do have children, do to them. " ## Acknowledgements Page-Icon made by <a href="https://www.flaticon.com/free-icon/book_2506510" title="srip">srip</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a> Main-site Background by [Kaboompics.com](https://www.pexels.com/@kaboompics?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels) from [Pexels](https://www.pexels.com/photo/blank-paper-with-pen-and-coffee-cup-on-wood-table-6357/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels)