📜 ⬆️ ⬇️

Do WAFs dream about statistical analyzers?

One of the most popular trends in the field of protecting applications of the current decade is virtual patching (VP) technology, which helps protect a web application from exploiting known vulnerabilities at the web application firewall level (here and further WAF is a dedicated solution operating on a separate node between the gateway to the external network and the web server). VP technology is based on building rules for filtering HTTP requests on the WAF side based on the results of the static application security analysis tools (static application security testing, SAST). However, due to the fact that SAST and WAF tools rely on different models of application presentation and different decision-making methods, there are still no truly effective solutions for their integration on the market. In the framework of SAST, work with the application is carried out according to the white box model and, as a rule, formal approaches are used to search for vulnerabilities in the code. For WAF, the application is a black box, and heuristics are used to detect attacks. This does not allow VPs to be effectively used to protect against attacks in cases where the exploitation conditions of the vulnerability go beyond the trivial `http_parameter = plain_text_attack_vector` scheme.

But what if “making friends” of SAST and WAF in such a way that information on the internal device of the application obtained with the help of SAST became available on the WAF side and gave him the opportunity to detect attacks on detected vulnerabilities - not guessing, but proving the fact of the attack?

The glitz and poverty of traditional VP


The traditional approach to automating the creation of virtual patches for web applications is to provide WAF information about each vulnerability detected with SAST, including:


To determine the values ​​of the parameters of the HTTP request and the dangerous elements of the vulnerable parameter, both a simple enumeration of all possible elements and a generating function (as a rule, based on regular expressions) can be used. Consider an ASP.NET page snippet vulnerable to XSS attacks:
')
01 var condition = Request.Params["condition"]; 02 var param = Request.Params["param"]; 03 04 if (condition == null || param == null) 05 { 06 Response.Write("Wrong parameters!"); 07 return; 08 } 09 10 string response; 11 if (condition == "secret") 12 { 13 response = "Parameter value is `" + param + "`"; 14 } 15 else 16 { 17 response = "Secret not found!"; 18 } 19 20 Response.Write("<b>" + response + "</b>"); 

As a result of the analysis of this code, the symbolic formula of the conditional set of its values ​​will be derived for the attack vector:

{condition = "secret" ⇒ param ∈ { XSShtml-text }}, XSShtml-text — XSS- TEXT, HTML.

Both exploit and virtual patch can be derived from this formula. Based on the virtual patch descriptor, WAF generates filtering rules that allow you to block all HTTP requests that may lead to exploitation of the found vulnerability.

This approach, of course, allows you to defend against a number of attacks, but it also has significant drawbacks:


These shortcomings lead to the fact that the VP technology, focused on protection from particular cases, does not effectively protect against all possible attacks on vulnerabilities detected using SAST tools. In addition, traffic filtering rules constructed in this way often result in the blocking of regular HTTP requests and disruption of the protected application. Let's change the vulnerable code a bit:

 01 var condition = Request.Params["condition"]; 02 var param = Request.Params["param"]; 03 04 if (condition == null || param == null) 05 { 06 Response.Write("Wrong parameters!"); 07 return; 08 } 09 10 string response; 11 // CustomDecode    base64-URL-base64 12 if (CustomDecode(condition).Contains("secret")) 13 { 14 response = "Parameter value is `" + CustomDecode(param) + "`"; 15 } 16 else 17 { 18 response = "Secret not found!"; 19 } 20 21 Response.Write(response); 

The only difference with the previous example is that now both query parameters undergo some conversion and the condition on the `secret` parameter is relaxed until the substring is turned on. The formula of the attack vector as a result of the analysis of this code will look like:

(String.Contains (CustomDecode (condition)) ("secret")) ⇒ param ∈ (CustomDecode { XSShtml-text })

At the same time, for the function CustomDecode in the corresponding vertex of CompFG, the analyzer will display a formula describing the Base64-URL-Base64 transformation chain:

(Base64Decode (UrlDecode (Base64Decode argument)))

According to the formulas of this type, it is still possible to build an exploit (we described this in detail in one of our previous articles ), but the classical approach to building virtual patches is no longer possible here because:


Since all the problems of the traditional VP grow from the inability to work with the application at the WAF level on the white box model, it is obvious that in order to eliminate them it is necessary to realize this possibility and refine the approach so that:


This is how virtual execution time patching technology was born.

Runtime virtual patching


At the core of the runtime virtual patching (RVP) technology is the model of the application under investigation called computation flow graph (CompFG) used in the source code analyzer PT Application Inspector (PT AI). This model was described in detail as part of the AppSec Slums Master Class on PHDays VII. CompFG is built during application analysis as a result of abstract interpretation of its code in semantics, similar to traditional symbolic calculations. The vertices of this graph contain generating formulas in the target language, specifying the set of valid values ​​for all data streams present at the corresponding points of execution. These threads are called execution point arguments. For example, the top of the vulnerable point of the implementation of the above example in CompFG looks like this:



One of the properties of CompFG is its concretizability - the ability to calculate the sets of specific values ​​of all arguments at any point of the application, setting values ​​for all input parameters.

The RVP workflow is divided into two phases, corresponding to the stages of the application life cycle — deployment (steps D) and execution (steps R):



Deployment phase


Before deploying the next version of the application, it is analyzed using PT AI, as a result of which three formulas are derived from each vertex of CompFG describing the vulnerable execution point:


All formula sets are grouped based on the fact that the vulnerability belongs to the control flow of one or another entry point into the web application. The concept of an entry point is specific to each of the PT AI supported web frameworks and is defined in the analyzer's knowledge base.

After that, the report with detected vulnerabilities and related formulas is uploaded in the form of code in a special domain language, based on the syntax of S-expressions and allowing the CompFG formulas to be described in a form independent of the target language. The formula for the value of the vulnerable point argument of the previously reviewed sample code is as follows:

(+ ("Parameter value is `") (FromBase64Str (UrlDecodeStr (FromBase64Str (GetParameterData (param))))) ("`")),

and the formula for its reachability is:

(Contains (FromBase64Str (UrlDecodeStr (FromBase64Str (GetParameterData (condition))))) ("secret")).

The resulting report is loaded into PT Application Firewall (PT AF), and on its basis a binary module is generated, allowing to calculate all formulas present in it. The decompiled code for calculating the condition of accessibility of the vulnerable point of the considered example looks like this:



In order for the calculation of the formulas to be possible, on the PT AF side it is necessary to have (optional):


The first option gives the maximum speed, but involves a huge amount of manual work on the part of the developers of WAF for describing calculators (even if limited to only the functions of standard libraries). The second option allows you to calculate all the functions that may occur in the report, but also increases the processing time of each HTTP request due to the need to call the runtime environment to calculate each function. The best option here is when the first option is used for the most common functions, and all the rest are calculated using the second one.

It is quite possible that a formula will encounter a function where the analyzer cannot “fail” (for example, calling a method related to a missing project dependency or to native code) and (or) which is also impossible on the PT AF side (for example, function of reading data from external sources or server environment). Such functions are marked in the formulas with the unknown flag and processed in a special way (see below).

Stage of operation


At the operational stage, for each HTTP request, WAF delegates its processing to the generated binary module. The module analyzes the request and determines its associated entry point into the web application. For this point, the formulas for all the vulnerabilities detected as a result of its analysis are selected - and then calculated in a certain way.

First, the formulas of both conditions are calculated: the reachability of the vulnerable point and the values ​​of all its arguments. Instead of variables, the values ​​of the corresponding query parameters are substituted into each formula, after which its value is calculated. If expressions with unknown flag are present in the formula, it is processed as follows:


If, as a result of the calculation, a false value of the original formula was obtained, this means that this HTTP request cannot lead the application to a vulnerable point with dangerous values ​​of all its arguments. In this case, the RVP simply returns the processing of the request to the main WAF module.

In the case of the fulfillment of the conditions of attack on vulnerability, it is the turn to calculate the value of the argument of the vulnerable point. The algorithm used for this depends on the class of vulnerability to which the point being processed belongs. Common to them is only the logic of processing formulas containing unknown nodes: unlike conditional formulas, such argument formulas cannot be calculated in any way, which is immediately reported to WAF - and then the transition to the next vulnerable point is made. As an example, consider the most complex of the algorithms used to detect injection class attacks.

Injection detection


The class of injections includes any attacks whose purpose is to violate the integrity of the text in any formal language (HTML, XML, JavaScript, SQL, URLs, file paths, etc.) formed on the basis of data controlled by the attacker. The attack is carried out through the transfer to the application of specially formed input data, the substitution of which in the attacked text will lead to going beyond the limits of the token and the introduction into the text of syntactic structures that are not provided for by the application logic.

In the event that the current vulnerable point of the application belongs to this class of attacks, the value of its argument is calculated using the algorithm of the so-called incremental calculation with abstract interpretation in the semantics of taint analysis. The essence of this algorithm is that each expression of the formula is calculated separately, from bottom to top, and the result of the calculation obtained at each step is additionally marked by the boundaries of “pollution”, based on the semantics of each calculated function and the rules of the traditional taint analysis . This allows you to select in the final result of the calculation all fragments that were obtained as a result of any transformations of the input data (tainted fragments).

For example, for the code above and the HTTP request with the parameters `? Condition = YzJWamNtVjA% 3d & param, for example, for example



Further, the obtained value is divided into tokens in accordance with the grammar of the argument of the vulnerable point, and if more than one token fell on any of the tainted fragments, then this is a formal sign of a detected attack (by definition, an injection):


At the end of the calculation of the formulas for all vulnerabilities related to the current entry point, the request processing is transferred to the main WAF module along with the detection results.

Advantages and features of RVP


The approach to the protection of an application implemented in this way based on the results of the analysis of the security of its code has a number of significant advantages over the traditional VP:


A prototype of the PT Application Inspector and PT Application Firewall integration module in the form of the HTTP module of the IIS web server under the .NET platform was developed for running in the technology and confirming its effectiveness. You can watch a demonstration of his work with the reviewed code sample on YouTube . Performance tests on one and a half dozen open CMS showed more than acceptable results: the processing time of HTTP requests using RVP turned out to be comparable to the time they were processed by traditional (heuristic) WAF methods. The average percentage of slowing the response of a web application to requests was:


Despite the obvious advantages over traditional VP, RVP still has a number of conceptual limitations that I would like to get rid of:


However, it was possible to eliminate these shortcomings by shifting some of the RVP functionality to the application side and applying the technologies that underlie the self-protection of the runtime applications (runtime application self-protection, RASP).

But about this - in the second part of the article :)

Source: https://habr.com/ru/post/338110/


All Articles