📜 ⬆️ ⬇️

Example of using Perl REGEXP for fast text processing

A very useful recipe that facilitates communication with the command line is described here , but it was not possible to try it live, because under my system (OpenIndiana) there is no compiler for the Go language. So the idea arose to rewrite the specified program into a more universal language that exactly exists on any platform - Perl.

In the resulting sample code, I would like to demonstrate how using a pair of lines using regular expressions, you can perform a quick and efficient search.

Implementation


First, let's convert the hint functions passed to the future regular expression part:

$hints =~ s/(.)/$1\.\*\?/g;

Here $ hints is a string concatenated from all prompts, for example, 'abcd'.
The expression for the search (.) Is a single character (each, taking into account the search parameter 'g'), replace it with the same ($ 1 is the value from the first brackets of the search expression) and add the parts we need, namely:
')
After each character, add a block: '. *?', Which means: any character, zero or more times, and a marker that makes the modifier "not greedy" (more on that below).

Total output is the string: 'a. *? B. *? C. *? D. *?'

We proceed to the main part, which compares the string from the "familiar" folders with a hint, the condition:

if ($path =~ /^(.*)($hints)$/)

Here the symbol '^' is the “anchor” of the beginning of the line, the expression in the first brackets '(. *)' Is the prefix of the string, and after this expression is our pre-prepared regexp containing hints for searching, the expression ends with the second “anchor” - '$', which means coincidence with the end of the line.

Since all modifiers '*' in the line except the first one contain markers '?', The only modifier without this marker becomes “greedy”, i.e. tries to select as much of the line as possible.

In our example, we have the converted string as output: /^(.*)(a.*?b.*?c.*?d.*?)$/

In fact, the search in this case is carried out from right to left, i.e. first, from the end of the line, we look for the closest 'd' character, then to the left of it - the closest 'c' 'a' will fall into the greedy prefix. The position of the result found in the string will be determined by the length of this prefix, namely, the string $w = length($1); (here $ 1 will get the value from the first brackets of the previous regexp), the rest of the conditions (the more to the right, the better) regexp has already executed for us.

It remains only to add the LoadPaths and add functions and add processing of the launch parameters.

The full text of the script:

hg clone bitbucket.org/eugenet/perlre

Source: https://habr.com/ru/post/151681/


All Articles