📜 ⬆️ ⬇️

Perl 6 and Markov sequences

Consider one non-numeric sequence based on the use of Markov chains in the text. The next character in the sequence will be randomly determined based on the previous two. The distribution follows the pattern contained in the source text.

use v6; use List::Utils; my $model-text = $*IN.slurp.lc; $model-text .=subst(/<[_']>/, "", :global); $model-text .=subst(/<-alpha>+/, " ", :global); my %next-step; for sliding-window($model-text.comb, 3) -> $a, $b, $c { %next-step{$a ~ $b}{$c}++; } my $first = $model-text.substr(0, 1); my $second = $model-text.substr(1, 1); my @chain := $first, $second, -> $a, $b { %next-step{$a ~ $b}.roll.key } ... *; say @chain.munch(80); 


After initialization, three parts are clearly visible in the code.

The first enters the text of the model and gets rid of non-alphabetic characters. Line 4 uses slurp to read the standard input ($ * IN) in one string variable, and lc puts everything in lower case. The first subst deletes all underscores and apostrophes. The second replaces all sequences of non-alphabetical characters with spaces.
')
The second part uses the sliding-window function from List :: Utils and Perl magic.

$ model-text.comb divides text into characters.

The sliding-window, a sliding window, goes through the list and gives out N (in this case, 3) elements, starting with each of the elements of the list. That is, first you get the 1st, 2nd and 3rd, then the 2nd, 3rd and 4th, and so on.

In the loop, we create a table of tables. External keys are the first two of three consecutive characters. Internal - the third character, and its value - how many times this character follows the first two. That is, by feeding the program the text of the album of the Aqualung group, we get the contents of% next-step {"qu"} of the form:

 {"a" => 5, "e" => 2} 


It will work out if we have “a” five times for “q” and “u”, and then “e” twice.

The third part of the code uses this data to build the sequence. We take the first two characters, and we know which character follows them. Then we create a sequence, starting with these two characters, and using as the generator -> $ a, $ b {% next-step {$ a ~ $ b} .roll}. It uses the previous two characters as a hash frequency for the third. The roll method returns one random hash key, according to its weight. In the example with “qu”, one can imagine that we are throwing a seven-sided dice, which has 5 faces - “a” and two - “e”. If you do not know which character follows the first two (for example, these two characters were unique to the text), then an undefined value is returned, and the sequence stops.
We get the first 80 characters of the sequence through the munch method.

Running the script on the texts of Aqualung, we get sequences like
“Talalainthan the hell you’ve been singing the pinnest to the laboonfeet” and “the steall gets the creature of the crokin whymn the gook sh an arlieves grac”.

The program does not have a fixed set of characters with which it should work. Anything Perl 6 recognizes as a character will be processed. By feeding the standard file “Land der Berge”, which p6eval uses as stdin, you get strings like “laß in ber bist brüften las schören zeites öst froher land der äckerzeichöne lan”.

Source: https://habr.com/ru/post/253917/


All Articles