⬆️ ⬇️

NaNoGenMo: how computers write stories

November is considered the month of literary creativity. Every year, the NaNoWriMo ( National Novel Writing Month ) event takes place on the Internet. Participants must write a short story of at least 50,000 words by the end of the month. For 17 years, more than 20,000 people have participated in it.



In 2013, programmers had a similar competition - NaNoGenMo ( National Novel Generation Month ). The task of NaNoGenMo is to write a program that will generate a novel with a length of 50,000 words or more. At the same time, the requirements for the novelty are rather weak - any text of sufficient length will do. As you will see, it can be a storybook, play, cookbook, dictionary, or tourist guide. In fact, the work is not even obliged to be text.



image


Graphic novel "Generated detective"



The task in itself to write a program that generates text from 50,000 words is simple. For this, this code is enough:



print ' ' * 50000 


On the other hand, it will be boring to read such a story starting from the third word. NaNoGenMo members are trying to solve this problem. They come up with literary and technical moves that would keep the reader’s attention. This is much more difficult. In practice, if you can read with interest at least a few pages of the novella - this can be considered a success.



In this post I want to tell you what techniques are used for the algorithmic generation of literature, and share the most interesting works in three years.



Markov chains



Markov chains are a classic way to generate text. It is well described here . Markov chains have a problem: if you choose a large N value for N-grams (usually 5 or more), then large pieces of the original case appear in the text, if you take N less, it turns out to be frankly meaningless.



But this problem is solved: some genres of works can successfully hide their meaninglessness from the reader. For example, replicas from the dialogues of Socrates and Aristotle from your palal at first sight are difficult to distinguish from philosophical reflections. In the same way, I would not hesitate to accept the generated license agreement from greg-kennedy if I saw it in the installer of the program I need .



Other works based on Markov's chains: erotic stories from Agrajag-Petunia , Reagan's speeches (with a mixture of Schopenhauer's works) from VincentToups , autobiography in 19th century English from lizrush .



Template extension



Most often, meaningful text fragments are generated using templates. Imagine that we have the following grammar:



 sentence = '<greeting>, <world_phrase>!' greeting = ['', '', '', ' '] world_phrase = ['<happy_adj> ', '<sad_adj> ', ''] happy_adj = ['', '', ''] sad_adj = ['', '', ''] 


Based on it, we can generate many different offers - “Hello, a dull world!” Or “Good afternoon, beautiful world!”. The richer the grammar, the more interesting texts it gives.



In The Gamebook of Dungeon Tropes by maetl, dungeon descriptions are generated using templates.



Atheists who believe in God from tra38 uses data from some US census. The heroes of the novel are people who said that they are atheists, but they believe in God. One by one, they give sample lectures containing answers to questions from the census.



The templates are used in the novel “Something Something” by BenKybartas , the collection of 5000 stories by tinyworlds , the erotic novel Orgasmotron by enkiv2 (sorry, there will be no more erotica) and the sports commentary by the Creade Federation of False American Football .



Recursion



Templates allow you to get a small piece of readable text. But according to the rules of the competition, the story should not be shorter than 50,000 characters. Gracefully increase the volume helps recursion.



A simple option: in “The transorbital anaphase provine bif the pure-bred synostosis” from samcoppini, one sentence is taken (it is included in the title). And then using the dictionary gives the definition of some words. For the words from the definitions are also given definitions. And so until fifty thousand is typed.



MichaelPaulukonis's “ Heartless Giant” tells a variation on the Norse tale of a prince who saved his brothers from an evil giant. But in the end, this prince himself becomes a giant and kidnaps the children of another king. And the whole story repeats, but in a slightly different way.



The novel "Hopes and Memories" from cpressey consists of several events and a dialogue between two characters. In the course of the story, they meet a vampire, zombie, dragon and other monsters. And the rest of the time, they recall their meetings and suggest whom they will see in the future. And then they remember how they remembered some meeting, they assumed whether they would recall another meeting, and so on.



Similar in structure to the “Redwreath and Goldstar Have Traveled to Deathsgate” by erkyrath . There, too, two characters are talking. They are very polite - they ask: "Can I ask a question?" And "Do I understand correctly that ...". And so while at some point does not reach such a replica:



"I can answer that?"

“One Hundred and Sixty Five Days of Christmas” by hugovk is a continuation of the English folk song “Twelve Days of Christmas” , where they continue to give new and new gifts (and their number is also growing).



Recycling existing works



If the literary work is in the public domain, it can be used in any way. Including to generate a new work based on it. The most interesting examples are:



Moby Dick in cat language from hugovk . This is one of the most famous works of NaNoGenMo about which The Guardian, Vox and The Atlanctic wrote. Just give a quote:



Meow me meeeeow. Meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow

Compare with the original:



Call me Ishmael. I thought I was going to be on the water.

"The Adventures of Charlotte Holmes" by emdaniels . The author has changed the gender of all the characters in the stories about Sherlock Holmes. It does not sound very serious, but in fact it is not an easy linguistic task. C which the author handled not to the end (so I had to come up with the pronoun herr ).



To Charlotte Holmes he is always the man. I have seldom heard of his mentioning him under any other name. In herr eyes he eclipses and predominates the whole of him sex. Ankin for love for Ivan Adler.

Lilinx's Homer Ultra Violence . From the text of the Iliad we chose all sentences that contain the word “hit” in one of the forms. It turned out the description of a long and brutal slaughter, in which it is even incomprehensible who fights with whom.



"Moby Dick, or image “ By pteichman - another version of Moby Dick, in which some words are replaced with corresponding emoji.



Other works: "The Adventures of Tom Sawyer" with characters from the novel about Conan from MrDrews , "Hamlet" from dkurth , skipped through several stages of machine translation, "Pride and Prejudice" with a lexicon from Twitter from michelleful , "Transformation" of Kafka , in which the word is replaced by a more abstract, from jonkagstrom .



It is not necessary to use only one piece. For example, a collection of short stories , consisting of six words, from hugovk .



And in “Our Arrival” (PDF) from aparrish , proposals are selected from the hull of the “Project Gutenberg” in which some natural objects or phenomena are described. Together, they make up the diary of a certain expedition. It turned out very nicely, this story takes one of the places in my personal top story with NaNoGenMo.



Recycling other data



It is not necessary to take literary work as the basis. moonmilk made a short story from the NaNoWriMo participants' tweets (PDF) (this is the competition where people write their own stories), and jimkinsey from the questions that scientists ask in annotations of articles (PDF) :



What is this situation situation? Should there be any public rights law? How many students study abroad? In addition, it can be said that Who are user entrepreneurs?

The generated NaNoGenMo news reports from enkiv2 are difficult to distinguish from real articles.



In the “Dictionary of the D'skuban language” from samcoppini, the words from a computer-made language are defined in terms of a real vocabulary.



The novel "The Finder" (PDF) from thricedotted talks about how a computer learns to do everything that people do with the help of the wikiHow website.



Simulation



A good story needs a plot and development. One of the ways to achieve this is simulation. The author of the program creates a certain world and describes the rules by which it functions. And then simply lists the events that take place in this world. If these events are interesting, the plot also turns out to be interesting. It looks like a computer game, but you do not play it, but read the log.



Heroes of the fantasy stories series from mattfister go to different places. If they run out of food, they go hunting or fishing, and when they are tired, they set up camp. Sometimes they encounter enemies that they have to fight.



Similarly, the characters in the flexo novella roam the arena and fight when they meet.



In addition to battles, there are other simulations. In the “Evening of a Rainy Day” by cpressey, Alice and Bob play cards. In the nothings novel, Hannah solves the problem of the Tower of Hanoi (twelve disks were enough for 50,000 words). In the Flora and Fauna of amarriner, the botanist looks for a way out of the maze, meeting animals and plants along the way.



Sometimes authors pretend to move around the real world. In “Around the World for X Wikipedia Articles” from kevandotorg, Phileas Fogg and Paspartu make a world tour, telling the facts about the places they visit. Greg-kennedy ’s Eliza’s Book provides a detailed route of Moses’s forty-year journey across the desert (even a map!).



High-level plot generation



Another approach to plotting is to form the main plot points, and then expand them into a detailed text. cpressey implemented it in “Fate's Time” and described in a post how its plotting works. The history compiler starts with a simple sequence:



 [, *, ] 


Instead of an asterisk, the compiler can insert any events with its beginning and end:



 [, [, *, , *], ] 


When there are enough events, you can remove all the asterisks from the sequence, and divide the events themselves into smaller ones:



  = [, ]  = [, ]  = [, ]  = [, ] 


After that, it remains to turn each event into its final description in the short story.



Other formats



Sometimes the authors depart from the format of the story. I have already mentioned several similar ones, but here's another:





Graphic Works



Authors are not always limited to textual novels. doldrumorchids for “No People” takes pictures from Google Panoramas, recognizes objects on them, and then inserts descriptions of similar objects from works from “Project Gutenberg”.



“Seraphims” by lizadaly is a mysterious manuscript with pictures, written in an unknown language (with symbols from the Voynich manuscript).



“Something, thanksgiving and nothing” from zachwhalen and “Generated Detective” from atduskgreg - graphic novels (comics). In the first text is not, in the second it is. But both authors tried to stylize the pictures, and it turned out atmospheric.



Neural networks



A little about the bad. Recently, neural networks have learned a lot: to win go, apply filters to photos and sort cucumbers. But with the generation of texts they seem to somehow not stick. In 2015, there were several pieces written by neurons (a rethinking of Lovecraft from R-Gerard , a rethinking of Jules Verne from estayton, and something graphic from spikelynch ). All of them did not impress me. I think this means that neural networks have everything ahead, and in the future we will still see something more meaningful from them.



What's next



A few days left before the start of the new NaNoGenMo. Here is the repository for it . If you want to participate in it - create an issue in it with a title like “intent to participate”. In the issue itself, you will be able to discuss your ideas with other participants, and after completing the work there you will have to post a link to the code and the generated novel. I wish success to all who decide!



I would like to try to do something similar in Russian. But in the framework of NaNoGenMo, where they speak English, it seems to me not very appropriate. If you also have a similar desire - write in the comments to this post or me in PM. I think we could get together and create a separate branch of the competition for the Russian language.



Links



  1. Repositories of previous years: 2013 , 2014 , 2015 .
  2. A series of articles from which I learned about NaNoGenMo: 1 , 2 , 3 , 4 . In this post I used some examples from there.





')

Source: https://habr.com/ru/post/313862/



All Articles