📜 ⬆️ ⬇️

Imperative RegExp. Notation

Regular Expressions For All (REFA)

main idea


There are many systems to search for substrings that match a particular mask. Unfortunately, they lose their power as many factors have to be taken into account. Constructions become cumbersome, incomprehensible and difficult to maintain.
That is why I tried to create an analogue - REFA. Regular expressions for everyone.
His idea is as follows. As soon as the regular expression ceases to be obvious - break it into two. The optimizer, if possible, will still reduce it to one, so there will be no loss in speed, but the code will become clearer.

For easy reading ge.tt/9snPkzG/v/0 (format \ .odt)

Examples


Search C ++ functions

Find the implementation of all methods of the dummy class.

It is believed that the input is a large line with all the project code. You can read from a file, but it makes it difficult to understand the example.
PROGRAM “FindMethods” ^name^ = ~\w?[\w|\d]*~ BLOCK “FindClass” //    PUSH BLOCKVAR $regexp = “class ”+%classname%+”\s*\{.*\}.*;” MATCH $regexp CATCH MATCH_FAIL RETURN array() AS $list; RETURN array() AS $result; FINISH BLOCKVAR $class_code = MATCHED INCOMING = $class_code BLOCKVAR $method = ^name^+~\w*~+^name^+~\([\^name\^\w*\^name\^\w*\,?]*\)\w*~ BLOCKVAR $declarations = array(); BLOCKVAR $realisations = array(); TRY WHILE 1 MATCH PASS LIMIT 1 $method IF select(0,1) INCOMING != “;” CALL “SearchEndOfFunction” REMAINED $realisations ADD (MATCHED + RESULT $body) ELSE $declarations ADD MATCHED ENDIF END ON MATCH_FAIL OR END_OF_STRING RETURN $declarations AS $list RETURN $realisations AS $result FINISH POP ENDBLOCK BLOCK “SearchEndOfFunction” BLOCKVAR UINT $level = 0 MATCH ~[\{|\}]~ FOREACH ALL_MATCHED AS $t IF $t == “{} $level++; ELSE $level--; ENDIF IF $level == 0 BLOCKVAR STRING $ret = select(ALL_MATCHED[0], ALL_MATCHED[ITERATION]) INCOMING_BLOCK RETURN $ret AS $body ENDIF END ENDBLOCK BLOCK “AddClassName” MATCH PASS LIMIT 1 ^name^+”\w*” BLOCKVAR $ret = MATCHED $ret += “[\^name\^\w*::\w*]*”+%classname%+”\w*::\w*” $ret += REMAIN RETURN $ret ENDBLOCK BLOCK “SearchDeclaredFunctions” BLOCKVAR $dec = %declared% IMPLODE ($dec, “|”) $string $string = “[“+$string+”]” MATCH $string BLOCVAR $realistaions = array() FOREACH ALL_TILES as $tile IF ITERATION % 2 == 1 IF select(0,1) INCOMING != “;” CALL “SearchEndOfFunction” ALL_TILES[ITERATION + 1] $realisations ADD (ALL_TILES[ITERATION] + RESULT $body) ENDIF ENDIF END RETURN $realisations AS $result ENDBLOCK //   BLOCKVAR $classname = $arg1 CALL “FindClass” BLOCKVAR $ret = RESULT $result BLOCKVAR $declared = RESULT $list CALL “SearchDeclaredFunctions” $ret ADD RESULT $result RETURN $ret ENDPROGRAM 


The program was not very small, but at least more or less understandable. Regular expression is similar to this ... I do not advise.

Documentation


Data types

Int

The default type. Integer The range is −2 ^ 31 to + 2 ^ 31-1. The default value is 0.
LONG

Integer The range is −2 ^ 63 to + 2 ^ 63-1. The default value is 0
Uint

Ulong

STRING

Line. Maximum length UINT. Private fields START and COUNT.
The default value does not exist and causes an exception.
TILE

Part of the line. Private fields START, END, COUNT, PARENT_STRING.
Predefined Variables

INCOMING

The string to process. It is substituted if no variable is specified.
ICOMING is a synonym for INCOMING_CURRENT

MATCHED

The first match came up in the last match.
ALL_MATCHED

An array with all the matches of the last expression.
REMAINED

First character after MATCHED
ALL_REMAINED

First characters after everyone in ALL_MATCHED
ALL_TILES

All odd is ALL_MATCHED. The rest is the missing lines, in the correct order before the line.
ITERATION

The iteration number in the current loop. To get the iteration number in the external - save to a separate variable.
Callstack

Call stack with parameters
QUERY_LOG

Log commands that influenced one way or another on the line. Be sure to remember copying lines (all of a sudden there was a subsequent processing) Incoming data stored in a single copy.
EXCEPTION_STRING

The line explains the essence of the error. Place of occurrence, incoming parameters, result.
Minimum set

Necessary for the simplest use of the system
Match

MATCH [IGNORE {ignore_count | FIRST}] [PASS] [LIMIT {limit_count}] reg_exp [processing_string]
Check reg_exp, shift START to processing_string in MATCHED (default)
IGNORE - skip the first few matches. Default IGNORE 0
PASS - move START to the last ALL_REMAINED
LIMIT - the maximum number of matches, after which the subroutine will terminate. By default, LIMIT 0 means it will work until the end of the file.
reg_exp - may be a regular expression specified between ~, may be a variable.
processing_string - the string to process. Default INCOMING
Echo

ECHO string
Output a string to the result.
The simplest example of replacing a regular expression:
MATCH PASS ~ some_regexp ~
FOREACH ALL_TILES AS $ tile
IF ITERATION% 2
// replace all matched pieces with a string
ECHO “REPLACED”
ELSE
// all the pieces between the matched ones will be returned unchanged
Echo $ tile
ENDIF
END
IF ELSE ENDIF

IF expr then [ELSE else] ENDIF
If the expr expression is not zero, then the code will be executed then, otherwise else
Extended set

PROGRAM

A program is an atomic set of executable commands that perform the necessary task. Only programs can have parameters other than “default”.
Generally speaking, this can be a separate process (or thread) and run in parallel. There is no way to turn from one program to another. But you can use (if they are declared) methods of the neighboring program. Programs can call programs.
The program is the scope for all blocks.
By default, all commands are enclosed in a program with a zero name (it cannot be called from other programs)
PROGRAM name arg0 [arg1 arg2 ...] code ENDPROGRAM
name - the name of the program
arg0 is the string to be processed. INCOMING_PROGRAM becomes
code - program code, including declarations.
Access to the code blocks is done with the help of the construction
program_name :: block_name.
BLOCK

BLOCK name [string]
Non-self code section. It is identical to two goto jumps. If string is specified, the corresponding INCOMING is changed before starting, after returning.
PUSH POP

Push [var1 var2]
Save the state of system variables. You can also add local variables for storage (by enumeration), and explicitly exclude some system variables using the! Var
POP - restores the state to the moment before the PUSH
BLOCKVAR

A temporary variable available only in the current scope, and destroyed on exit.
RETURN RESULT

Used to return the value of temporary variables from the block / program.
RETURN name
The RESULT name is used to access the variable in the calling construct.
The value is valid until the next block is called.
Error processing

During script execution, various exceptions are possible that should not affect the execution flow. For this there is a system of exceptions.
exceptions: exception_name [OR exception_name ...]
CATCH FINISH

CATCH exceptions code [CATCH exceptions code ...] FINISH
Required for catching an error that occurred on the previous line to the first CATCH block.
It is used in situations if an exceptional situation in this area is expected and is to be processed.
TRY ON FINISH

TRY code ON exceptions code [ON exceptions code ...] FINISH
THROW

THROW exception
Generate error manually
Types of errors


')
Special constructions

~ regexp ~

Content - Regular Expression
% name%

At runtime, it will be replaced by a copy of the value of the $ name variable. (closest in stack)
# name #

Analogue define
^ name ^

A reference to a regular expression. Works inside ~~ like \ ^
^ hello ^ = ~ hel {2} o ~
~ \ ^ hello \ ^ world ~

Working with strings

array {tile} SPLIT (delimeter) [string]
tile SELECT (start, end) [string]
PASS (count) [& string]
CUT (count) [& string]
CUT_AFTER (index) [& string]
IMPLODE (array [, delimeter]) & string

Source: https://habr.com/ru/post/143026/


All Articles