📜 ⬆️ ⬇️

Generator of texts on the basis of patterns, Hen Ryaba and Star Wars

Is it possible at the current level of development of computing technology to solve the problem of generating literary text? It seems to me possible, at least at the level of the algorithmic theoretical description. And what have the hen ryab and Star Wars?

Small prehistory:


I am not an expert on machine intelligence, just been interested in AI problems for a long time. As I read an interesting story Azimov, about how the boy tried to reprogram his robot, and what came out of it, and I had a vague idea. The idea of ​​the algorithm grew in detail, and lived somewhere in my head, I even thought of getting a profit from it, but then I realized that for this one idea was not enough. Therefore, I decided to share the idea with smart people, if the generation of a literary text is possible, then we all will receive a profit, so it seems to me.

Actually, the essence:


Our imaginary text generation system (CIGET) is built on the expert-modular principle:
')
1. A set of templates for generating
2. Literary corps:
2.1. Synonym dictionary
2.2. Base of texts
3. Database on graphs (“tree of knowledge”)
4. Estimation module of the path
5. Generator offers
We describe these points in more detail:

1. A set of templates for generating

Actually, it all started with the simple idea that since the computer does not know how to think like a person, then you don’t have to force it. We give him a ready-made, invented and human-perceived template, from which he will produce us with his computer methods the original text.
I will try to illustrate the general principle of creating a template, using the example of a fairy tale about a chicken Ryaba:
Source textPattern
Lived once%the initial state
Grandfather and woman% Mg1 (male, humanoid),% 1
And they had%attitude to
Ryaba Chicken% LJ1 (Woman, animal)
He laid a testicle% LJ1% produces% P01 (item 01)
But not simple, but golden% A01 is different from an array of arbitrary objects of type% A01 by the characteristic% X01.P01, which is not inherent in% A01 objects
Grandfather beat - did not break% Mg1% impact_ on% P01 using% M01 method. Result% impact_a is 0
Baba beat beat - not broke% Rh1% impact_ on% P01 by% M01 method. Result% impact_a is 0
The mouse ran% 2% any_verb
tail waved, the egg fell and broke% effect on% P01 by% M02 method. Result% impact_a equals% failure% A01
Grandfather is crying, woman is crying% end state

I estimate that in order to generate an arbitrary test volume, such templates will need not so much.
The input and output states are needed to “merge” several templates into one text.
Naturally, templates alone are not enough to get the text, go to step two:

2. Literary corps

Naturally, we will need a base of “computer-friendly” texts for many reasons — to search for vocabulary sequences, to evaluate sentences in the expert system. The dictionary of synonyms is needed to diversify the vocabulary of a computer.

3. Database on graphs (“tree of knowledge”)

The database is needed to build input and output constructs from templates.
It consists of objects that are associated with different inherent states, with each connection having its own weight. For example, the connection "egg" - "white" will have a weight for example. 1 (because the eggs are usually white or yellowish) and the “egg” - “golden” will have a weight of eg. 3 (as gold eggs are usually only figuratively). For the template of our fairy tale, the key moments are that% LJ1% produces% P01 (subject 01), and that% P01 differs from an array of arbitrary objects of type% P01 by the characteristic% X01.P01, which is not inherent in objects% P01. (i.e., right here in the template, we indicate that “egg” - “white” does not suit us) In our case,% L1 = chicken. The list of outgoing states of the object “chicken” may partially look like this: {throws down a feather, laid an egg, drove chickens}, from which thousands of years ago the composition was selected. 2, but when generating text it may not be a chicken, but, for example, a “fish” which will have a completely different list of outgoing conditions, and the computer can choose any of them.

4. The module is an assessment of the path length

Since we have connections with objects with states, the search algorithm for the shortest path will help to build sentences that are more or less adequate to our perception, without phrases like “toads fly across the sky”

5. Generator offers

Markov chains are effective with well-calculated nodes. By this point we have tough
given key nodes to which any auxiliary words can be generated. You can use Markov chains, something else is possible, the main thing is that it is based on our “skeleton”.

Actually, the program takes a certain specified basis (theme, direction), makes a selection of data within the framework, builds a number of patterns according to the input-output state, sets the values ​​of variables, generates text and outputs the result.

Literary fantasy on how it might look like:


SIGET version 0.x.x. is running.
Specify the desired subject in the drop-down list:
Fantasy - Star Wars
conditional volume (from 1 to 10): 1

wait, generate ...

TOTAL:
Long ago, Luke Skywalker and Princess Leia lived in a galaxy far far away, and they had a Sitripio robot. One day, Sitripio got a source of strength, but not simple but incomprehensible. Luke Skywalker tried to figure it out - he didn't figure it out, Princess Leia tried to figure it out - didn't figure it out. Yoda passed by, took his ear and the source of power disappeared.
Luke Skywalker was upset and Princess Leia was upset, and Yoda says:
grieve not, padawan young, I see strength in you
END.


Well, in the end - links to all sorts of different interesting, found in the process of writing:


Generator “Pushkinista”: pers.narod.ru/php/php_vartext.html
interesting because it generates verses based on the lines “I remember a wonderful moment”, sometimes very interesting.
Generator syntactically correct Russian text Geniot pers.narod.ru/php/geniot.html
There are no demos directly on the site, but there is a generation fragment - no worse than “Yandex. Abstracts ”, and maybe even better

References:

Theory:

He was promoted to writing the article “Autor Writer” habrahabr.ru/post/161311 and reading Isaac Asimov’s book “Someday” lib.ru/FOUNDATION/r_kogda_nibud_.txt_with-big-pictures.html
Addiction Grammar
ru.wikipedia.org/wiki/ Grammar_dependencies
Prologue
ru.wikipedia.org/wiki/Prolog_ (program_-language)
Knowledge representation
ru.wikipedia.org/wiki/Presentation of knowledge

Ready Generators:

Number Once on the basis of Markov chains: www.manhunter.ru/webmaster/358_generator_teksta_na_osnove_cepey_markova.html
Number Two on the same principle: code.google.com/p/pymarkov
Number three by pattern: http: //forumseo.org/showthread.php? T = 1608 (actually thousands of them)

Source: https://habr.com/ru/post/163727/


All Articles