📜 ⬆️ ⬇️

Template language for the universal signature code analyzer

The process of signature analysis of the code in our project PT Application Inspector is divided into the following steps:


  1. parsing into a language dependent view (abstract syntax tree, AST);
  2. AST to language independent uniform format;
  3. direct comparison with the templates described on DSL.

The first two stages were described in previous articles “ Theory and practice of source code parsing using ANTLR and Roslyn ” and “ Treating tree structures and unified AST ”. This article is devoted to the third stage, namely: various ways of describing patterns, developing a specialized language (DSL) for their description, as well as examples of patterns in this language.



Content




Ways to describe templates




Hardcoded


Templates can be recorded manually in the code. You do not need to develop a parser. This method is not suitable for non-developers, but can be used to write unit tests. Also, to write new templates requires recompilation of the entire program.



JSON, XML or other markup language


Parts of the mapped AST can be directly saved and loaded from JSON or other formats. With this approach, templates can be loaded from the outside, but the syntax will be cumbersome and not suitable for editing by the user. However, this method can be used to serialize tree structures. (The ways to serialize tree structures in .NET and their workarounds will be discussed in the next article.)



Own template description language, DSL


The third approach is to develop a special domain-specific language that can be easily edited, which would be concise, but with enough expressive power to describe the existing and future templates. The disadvantage of this approach is the need to develop syntax and parser for it.



Feasibility


As mentioned in the first article, not all templates can be simply and conveniently described using regular expressions. DSL is a mixture of regular expressions and frequently used constructs from popular programming languages. In addition, this language is intended for a specific subject area and is not intended to be used as a standard.



Syntax


In the second article of the cycle, we said that the basic constructs in imperative programming languages ​​are primitive types (literals), expressions (expressions) and statements (statements). When developing DSL, we did the same. Examples of expressions:



Instructions are created by adding a semicolon to the end of an expression.


Literals are primitive types, such as:



These literals make it possible to describe simple constructs, but with the help of them it is impossible, for example, to describe ranges of numbers, regular expressions. To support such more complex cases, extended constructions have been introduced (PatternStatement, PatternExpression, PatternLiteral). Such constructions are separated by special brackets <[ and ]> . A similar syntax was borrowed from Nemerle (in it such brackets are used for quasi-quoting, that is, for converting the code inside them into AST Nemerle).


Examples of supported extended designs are listed below. For some constructions, syntactic sugar is also provided to reduce the recording:




Pattern Examples



Hardwired password (all languages)


(#.)?<[(?i)password(?-i)]> = <["\w*"]>




Weak random number generator (C #, Java)


new Random(...)


The vulnerability lies in the use of an unsafe random number generation algorithm. So far, such cases are tracked through the search for the constructor of the standard class Random .



Debug Leakage (PHP)


Configure.<[(?i)^write$]>("debug", <[1..9]>)




Insecure SSL connection (Java)


new AllowAllHostnameVerifier(...) <[||]> SSLSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER .


Use "logical OR" for whole syntactic constructions.



Password in comments (all languages)


Comment: <[ "(?i)password(?-i)\s*\=" ]>


Search for comments in the source code. And in C #, Java, PHP, as you know, single-line comments begin with a double slash // , and in SQL-like languages ​​- with a double hyphen -- .



SQL injection (C #, Java, PHP)


<["(?i)select\s\w*"]> + <[~"\w*"]>


Simple SQL injection: concatenation of any string starting with select and not a string expression on the right side.



Cookies without security attribute (PHP)


session_set_cookie_params(#,#,#)


Setting a cookie without a security flag, which is specified in the fourth argument.



Empty exception block (all languages)


try {...} catch { }


An empty exception block. In C #, the module will find the following code:


 try { } catch { } 

In T-SQL, this is:


 BEGIN TRY SELECT 1/0 AS DivideByZero END TRY BEGIN CATCH END CATCH 

And in PL / SQL this:


 PROCEDURE empty_default_exception_handler IS BEGIN INSERT INTO table1 VALUES(1, 2, 3, 4); COMMIT; EXCEPTION WHEN OTHERS THEN NULL; END; 


Insecure Cookie (Java)


 Cookie <[@cookie]> = new Cookie(...); ... ~<[@cookie]>.setSecure(true); ... response.addCookie(<[@cookie]>); 


Adding cookies without security flag set. Despite the fact that this pattern is more correctly implemented in taint-analysis, it was possible to implement it with the help of a more primitive matching algorithm. It uses the attached variable @cookie , the negation of the expression and an arbitrary number of statements.



Interception of an unclosed cursor (PL / SQL, T-SQL)


PL / SQL

 <[@cursor]> = DBMS_SQL.OPEN_CURSOR; ... <[~]>DBMS_SQL.CLOSE_CURSOR(<[@cursor]>); 

T-sql

 declare_cursor(<[@cursor]>); ... <[~]>deallocate(<[@cursor]>); 

An unclosed cursor can potentially be exploited by a less privileged user.


In T-SQL there will be such code:


 DECLARE Employee_Cursor CURSOR FOR SELECT EmployeeID, Title FROM AdventureWorks2012.HumanResources.Employee; OPEN Employee_Cursor; FETCH NEXT FROM Employee_Cursor; WHILE @@FETCH_STATUS = 0 BEGIN FETCH NEXT FROM Employee_Cursor; END; --DEALLOCATE Employee_Cursor; is missing GO 


Overly extended permissions (PL / SQL, T-SQL)


grant_all(...)


This disadvantage is fraught with the fact that the user may be granted more privileges than is required.


There will be such code:
GRANT ALL ON employees TO john_doe;



Conclusion


To demonstrate the operation of our module, we have prepared a video that shows the process of searching for specific patterns in the code in various programming languages ​​(C #, Java, PHP) in our product PT Application Inspector. The correct handling of syntax errors, which was touched upon in the first article of our series, is also demonstrated.



')

In the following articles we will tell:


Source: https://habr.com/ru/post/300946/


All Articles