📜 ⬆️ ⬇️

evalidate: secure processing of custom expressions

Why do you need


Different filtering is everywhere. For example, the firewall netfilter (iptables) has its own syntax for describing packets. In the .htaccess apache file, your language, how to determine who should be given access to the directory, who does not. The DBMS has its own very powerful language (SQL WHERE ...) for filtering records. In mail programs (thunderbird, gmail) - its own interface for describing filters, in accordance with which letters will be scattered in folders.

And everywhere - your bike.

For the accounting program, it may be convenient for you to allow the user to choose who will be paid a salary (all women, as well as men aged 25 to 32 years, or up to 50 years if the man has the name Vasya). And to each suitable raise by user expression (+ 2000 rubles + 20% of the previous salary + 1000 rubles for each year of experience)
')
For an online store (or its admin) - find all laptops with memory from 4 to 8 Gb, which are more than 3 in stock, but not Acer, or even Acer, if they cost less than 30,000 rubles.

Of course, you can add your complex system of filters and criteria, make a web interface for them, but would it be easier to do everything in a couple of lines?

src="(RAM>=4 and RAM<=8 and stock>3 and not brand=='Acer') or (brand=='Acer' and price<30000)" success, result = evalidate.safeeval(src,notebook) 



I want and prickly


The obvious way to add any logic to the program is via eval () . The solution is the simplest, most flexible, but there are big pitfalls - security. What if a custom expression will do os.system ('rm -rf /')?

An example of how you can "fill up" a python through eval ():
stackoverflow.com/questions/13066594/is-there-a-way-to-secure-strings-for-pythons-eval
nedbatchelder.com/blog/201206/eval_really_is_dangerous.html (translation: habrahabr.ru/post/221937 )
tav.espians.com/a-challenge-to-break-python-security.html

Right way


Often, the “correct way” is recommended in the boards - use the python itself to parse the code from the text form into the AST tree, and then parse the tree yourself, separating the grain from the goats. But how? And then the main problem of bicycle marketing comes to the arena - until you find a suitable bicycle, or at least a good drawing ... it's easier to invent the bicycle itself.

Evalidate


Meet evalidate , my little bike for this purpose. Someone can use it himself (I tried to make it quite flexible), and the rest of the source code can serve as an example of how this problem can be solved (well, how can you not write code, of course).

We put pip
 pip install evalidate 


A simple example:

We place a text search line on the bookstore's website (the value is passed to the src variable - here they are hardcoded so that the web application does not fence, but they are quite safe to take from the user's request), and users can search for books by any available criteria in any combination. Instead of separate buttons to show “books that are not available”, “cheap books”, “expensive books”, “Books of authors who died before World War II, lived in Australia or any of the African countries that (books) we have in stock more than in 10 copies, and cost less than $ 1 per 100 pages of the book "- just one text field.

 import evalidate depot = [ { 'book': 'Sirens of Titan', 'price': 12, 'stock': 4 }, { 'book': 'Gone Girl', 'price': 9.8, 'stock': 0 }, { 'book': 'Choke', 'price': 14, 'stock': 2 }, { 'book': 'Pulp', 'price': 7.45, 'stock': 4 } ] #src='stock==0' # books out of stock src='stock>0 and price>8' # expensive book available for sale for book in depot: success, result = evalidate.safeeval(src,book) if success: if result: print book else: print "ERR:", result 


In this case, in src we have "user" code, which can be somehow bad. In the example, two variants of the “good” code, the first one shows the books that we don’t have in stock, the second one - the expensive books that are available. If you try to slip a bad code (just which is not parsed, code with reference to variables that we don’t have in context, code that uses unresolved operations, for example Call (function call)), then success will be False, and the program will report an error ( But will not fall, and will not execute the bad code).

Alternatively, you can use evalidate.evalidate () to get an AST tree that is generated via ast.parse (either exception if the code is not parsed or contains unauthorized operations), and then compile it and execute it via eval ().
 node = evalidate.evalidate(src) code = compile(node,'<usercode>','eval') result = eval(code,{},data) 


Well, also look at the module code (it’s a blessing), and make your bike :-)

Appeal to the community


Evalidate includes its default set of “safe” (?) Python operations. Simply, in my personal opinion, they are safe. This means that within 15 minutes it did not occur to me how to do something terrible, using only these operations. But maybe you will come? Or, maybe it is worth adding some more operations to the list that will make the default configuration more flexible (they will allow using a richer expression language) and at the same time will not create vulnerabilities? Any ideas?

Source: https://habr.com/ru/post/248117/


All Articles