📜 ⬆️ ⬇️

Catastrophic backtracking in regular expressions

Is it possible to kill a system with a simple and innocent regular routine? Yes you can.

For example:

>>> r = '(a+)*b' 

Just yes. Innocent - like yes. Certainly stupid, because the brackets and the asterisk are superfluous, but it should work. Let's try:
')
 >>> re.findall('(a+)*b', 'aaaaaab') ['aaaaaa'] 

Cool, working, let's go drink beer.

But no ...

If the string in which we are looking for does not contain the desired pattern, for example, just 'aaaaaa' , the compiler will try to find it with the following algorithm:

1. The whole drain satisfies 'a +' , but we do not find the 'b' , a step back.
2. Now 'a +' satisfies only 'aaaaa' (everything except the last one), the latter satisfies the repetition '*' , 'b' is still not found, a step back.
3. The first 'a +' satisfies only 'aaaa' , the last two 'aa' satisfy the repetition of '*' , 'b' is still not found, a step backwards.
4. The first 'a +' satisfies 'aaaa' , the last but one 'a' satisfies one repeat of '*' , the next 'a' still does not repeat another '*' , 'b' ...

Well, you understand - and so on. Up to complete search of a line.

Some statistics:
 >>> timeit.timeit("import re; re.findall('(a+)*n', 'a'*20)", number=1) 0.24741506576538086 >>> timeit.timeit("import re; re.findall('(a+)*n', 'a'*22)", number=1) 0.99283289909362793 >>> timeit.timeit("import re; re.findall('(a+)*n', 'a'*24)", number=1) 3.9849259853363037 

As you can see, the execution time grows exponentially. For larger strings, Python 2.6 on FreeBSD promptly throws out RuntimeError, but on Windows, that same Python cheerfully looks through all the combinations. Strings with more than 30 characters did not dare to test :)

The post is inspired by today's question on stackoverflow and the long-read and successfully forgotten note Runaway Regular Expressions: Catastrophic Backtracking .

PS It turns out that on Habré there is already an article about backtracking and atomic groups, there the described problem is considered deeper and more severe - Atomic grouping, or Not one step back! :)

Source: https://habr.com/ru/post/131915/


All Articles