⬆️ ⬇️

Regular expressions search with regular expressions

Greetings dear.



"We drove regular expressions, through regular expressions, see regular expressions, in regular expressions, regular expressions - regular expressions, regular expressions, regular expressions ..."



Not. This is not crazy nonsense. That is how I wanted to call my short review on regular expressions search using regular expressions. What, in fact, is also no less nonsense. I don’t even know if this can be useful in your life. It is better, of course, to avoid such situations when it is necessary to look for it is not clear what, it is not clear where. What is a regular expression? Yes, almost anything!

')

It may seem strange to you, but:



., ,    :. (     (  )) ~~ <script src="  - ,          .js"> 




But let's not panic, try to start, maybe that will be decent.



A regular expression is something in delimiters and possibly with modifiers at the end. For example, something like this:



/ regular expression / isux



The limiter in the PCRE regular expression can be a non-digit, non-letter, non-space character, non-backslash. [^ \ s \ w \\] In addition, this symbol must also be from ASCII: [[: ascii:]] , otherwise you can catch all sorts of interestingness of the “type” as “like” these ...

Do not just ask me who this might come to mind.



There are also pair limiters: () [] {} <> . Those. the first limiter cannot be the closing pairwise limiter: [^ \ s \ w \\\) \] \} \>]



Total we have a search condition for the first limiter:



(? = [[: ascii:]]) [^ \ s \ w \\) \] \} \>]



Unfortunately, we will not be able to check which particular character came to us as the first delimiter, but we can separate the pair <, (, [, {characters:



Regular expression to find one regular expression
/ (



(\ <) |

(\ () |

(\ [) |

(\ {) |

((? = [[: ascii:]]) [^ \ s \ w \\) \] \} \>])



)



# And then put a suitable closing limiter to it:

(. *)



(? (2) \>)

(? (3) \))

(?(four)\])

(?(five)\})

(? (6) \ 6)



# Well, you can paint the whole thing with modifiers:

([mixXsuUAJ] *)



/ xs



This regular expression will find and unzip at: ([1] => delimiter, [7] => pattern, [8] => modifiers) only one regular expression. Since the greedy quantifier is used . * who eats everything to the end, and then only backtracks to the nearest match. With a great desire, it can gut itself.



A real tin begins when we need to find and uncover more than one regular expression in one text.



First, you need to use the lazy quantifier (. *?)



Secondly, it is necessary to look for a match with an unshielded limiter, which, in turn, may be commented out by the will of fate. And how do you like the limiter option with a screened backslash in front of it? / \\ \ / \\ / is



Welcome to Hell:



((? #ignore comments like this in the regular expression)

(? (6)

(? (? =

(?: (?! \ 6). | (? <= \\) \ 6) * [^ \\] [\ (] [\?] [\ #])

(?: (?! \ 6). | (? <= \\) \ 6) * [^ \\] [\ (] [\?] [\ #]

[^ \)] *

(?-one)

))

. *?)



I will explain this code a little:



1) We cannot search [^ \ 6] , since in character class, our pointer loses its magic power. But thanks to a forward negative test, we can check any character: [^ \ 6] * => ((?! \ 6).) *

2) (? (? = String) string) - it may seem meaningless, but it is necessary in cases where you need to add something.

3) (? -1) - if there is a match, check again for a match. In this case, we are looking for, for example, a match (? # / If found, we capture to the closing bracket.



Total, at the moment, we have the following: (upd: at the end of the article there is a modified version)



Regular Expression for Regular Expression
/ # Limiter 1

((\ <) | (\ () | (\ [) | (\ () |

((? = [[: ascii:]]) [^ \ s \ w \\) \] \} \>]))

#Template

((? #ignore comments like this in the regular expression)

(? (6)

(? (? = (?: (?! \ 6). | (? <= \\) \ 6) * [^ \\] [\ (] [\?] [\ #])

(?: (?! \ 6). | (? <= \\) \ 6) * [^ \\] [\ (] [\?] [\ #]

[^ \)] * (? - 1)))

. *?)

# Limiter 2

# screened backslashes +

# unshielded limiter

(? (2) (? <! (? <! (? <! (? <! (? <! (? <! (? <! (? <! \\) \\) \\) \\) \ \) \\) \\) \\) \>)

(? (3) (? <! (? <! (? <! (? <! (? <! (? <! (? <! (? <! \\) \\) \\) \\) \ \) \\) \\) \\) \))

(? (4) (? <! (? <! (? <! (? <! (? <! (? <! (? <! (? <! \\) \\) \\) \\) \ \) \\) \\) \\) \])

(? (5) (? <! (? <! (? <! (? <! (? <! (? <! (? <! (? <! \\) \\) \\) \\) \ \) \\) \\) \\) \})

(? (6) (? <! (? <! (? <! (? <! (? <! (? <! (? <! (? <! \\) \\) \\) \\) \ \) \\) \\) \\) \ 6)

# Template modifiers

#PHP [mixXsuUAJ] javascript [gmi] python [gmixsu]

((? (6) (?: [MixXsuUAJ] *) | (? (? =. *? [MixXsuUAJ] +) [mixXsuUAJ] +))) / xs



I have not lost my enthusiasm yet, but the work has come on. If anyone has a desire - you can suffer.



Current goals and objectives:



1) The problem with brackets limiters has not been solved yet. Unfortunately for us brackets can not be escaped inside:

((regular) (expression)) isu

2) Ignore the delimiters between the # and the newline

3) Closing pair limiters in the comments.

4) It is beautiful to solve the problem with screened \ before the last limiter.



Link to the last option, for those who want to help bring things to the end



Thank you for your careful attention to computer maniacs.



Update:



Thanks to ReinRaus , at the moment we have the following picture, quite a regular expression describing itself:

Source: https://habr.com/ru/post/281547/



All Articles