Protection of the site against attacks using Nemesida WAF: from signatures to artificial intelligence

The article will discuss the practice of protecting a vulnerable web application - from the signature method to artificial intelligence using the Web Application Firewall (commercial and Opensource versions). We will use Nemesida WAF as a commercial solution, and NAXSI as a non-commercial solution. The article contains general and technical information on the work of WAF, as well as a comparison of attack detection methods, analysis of their features and disadvantages.

Attack detection

The first and main task of any WAF is to determine the most accurate attack with the minimum number of false positives (false positive). In NAXSI, there is only a signature-based mechanism for determining attacks (behavioral analysis is in the initial state, so we will not consider it), in Nemesida WAF - three: signature-based, qualitative behavioral analysis and machine learning. Speaking about the complex method of determining attacks, we mean a symbiosis of these three methods. Why three? Let's see.
')

Signature method for determining attacks

Despite the rapid development of technology, most attacks are detected by the signature method, and the quality of all methods based on signature analysis (including machine learning) depends on how well the signatures are written. Consider an example of defining a web application attack by the signature method:

index.php?id=-1'+union+select+1,2,3,4,5+--+1

In this case, the signature of the attack will be the entry of the "union + select" chain.

An example of an attack that NAXSI will miss:

index.php?id=-1'+Union+Select+1,2,3,4,5+--+1

NAXSI will skip such an attack, because when processing a request due to an error in the code, the first letter of the “stop word” indicated in the upper case is not taken into account, and the request does not match the “union” and “select” signatures.

NAXSI will skip the direct access to get the DBMS version:

id=version();+--+

This also applies to other service functions - “CURRENT_USER ()”, “DATABASE ()”, “ROW_COUNT ()” and others. NAXSI does not convert (does not normalize) “URLENCODED” data or binary strings for comparison with the signature database, so such attacks will also be skipped:

id=concat_ws%23%0a(0b00111010,database%0b(%0b),database%09(%09)
id=1 anD 0 unio%6e %23def%0a sELEc%74%23zxc%0a

And such attacks also miss:

1: <iframe/onload='this[«src»]=«javas cript:al»+«ert``»';>
2: <img/src=q onerror='new Function`al\ert\`1\``'>

It is worth noting that NAXSI signatures do not take into account all modern methods of masking a peylod and require significant refinement. And given the regular expressions on the MySQL command syntax, there is a considerable likelihood of adding to the white list (with false positives) a bunch of “dangerous occurrences”, for example:

MainRule "rx:select|union|update|delete|insert|table|from|ascii|hex|unhex|drop" "msg:sql keywords" "mz:BODY|URL|ARGS|$HEADERS_VAR:Cookie" "s:$SQL:4" id:1000;

False Positive

In order to reduce the number of false positives to zero, you must accurately set the threat level for each signature (scoring). Consider a rule with incorrect scoring (for example, “order” and “by” operators), leading to the appearance of false positives:

The New World Order is a book written by HG Wells

The reason for False Positive in the example above is the high scoring of the entry of MySQL statements without regard to applicability and the request zone.

Here is an example of a rule with correct scoring (chain entry):
index.php?id=1+order+by+10+--+

Signature analysis: conclusions

1. Developing signatures requires high competencies and an understanding of how an attacker works. We have such knowledge of employees of the security analysis department.
2. Signatures must be constantly updated.
3. Using signatures without specialized processing (behavioral analysis) will lead to false positives (False Positive).

Even with an accurate and complete set of rules, the signature method of detecting attacks has 2 main drawbacks, which will lead to the appearance of false positives, or will completely miss the attack:

False positive if the signature method detects the inclusion of the union and select operators in the URI will have a high scoring, which will lead to an erroneous blocking of the query: /weareunion/sub/select_your_choice.php
False negative if the signature or chain of attack signatures has low scoring, but allows you to get "sensitive" information: some.php? Size = version% 28% 29% 20;% 20-

To eliminate both shortcomings, we need an advanced attack definition model — behavioral analysis and machine learning.

Behavioral analysis

Let's go straight to practice:

1: index.php?id=1
2: index.php?id=3-2
3: index.php?id=-1
4: index.php?id=1'
5: index.php?id='1
6: index.php?id=1 and sleep(5)

In this example, we see an attempt to detect the SQL injection by manipulating the parameters, adding the quotation mark and the “sleep” function. By themselves, these scattered signs do not contain an explicit attack vector, but their combination clearly indicates that the attacker is trying to “probe” the web application. The mathematical model summarizes the signs of user behavior over a period of time and based on this blocking occurs, which allows not to miss the start of the attack, while requests of legitimate visitors to the site are not blocked.

Artificial Intelligence

Machine Learning is an extensive subsection of artificial intelligence that studies methods for constructing algorithms capable of learning. There are two types of training:

precedent training, or inductive learning, based on the identification of general regularities according to particular empirical data
deductive training, involving the formalization of expert knowledge and their transfer to a computer as a knowledge base. Deductive learning is usually referred to the field of expert systems, so the terms machine learning and learning from precedents can be considered synonymous.

In simple words, we use all the accumulated experience (both in the field of protection of web applications and penetration testing) to build the basis of the learning model, that is, we use deductive training.

An additional source of attack is penetration testing laboratories.

In 2013, the first penetration testing lab “Test lab” was launched, which is a copy of a real corporate network of a virtual company containing common vulnerabilities and configuration errors. Over 4 years of laboratory development, their concept has not changed, only its dimensions have changed. The latest laboratories are distributed networks of the head office and branches, and the number of nodes has increased to 50 units (servers, workstations, network equipment etc). In contrast to the CTF, the emphasis in the “Test lab” is made on realism, and the actions of the attackers are identical to those of the external intruder, which allowed 18,000 participants to gather around the “Test lab”.

For us, the laboratories have become not only just for fun, but also an excellent testing ground for debugging and improving the work of Nemesida WAF. Just imagine - 40-50 Mbit / s of traffic of "clean" attacks, which need to be processed and filtered without delay.

Nemesida WAF

Being a company that provides services and solutions in the field of practical information security - penetration testing, security analysis, etc., having a high-quality search base for vulnerabilities (in 8 out of 10 pentests are completed with access to critical data), we made Nemesida WAF - an integrated detection of attacks based on artificial intelligence, accurately detecting and blocking attacks on web applications with almost 0% of false positives.

Personal Area

Nemesida WAF is provided either as a cloud service (when traffic to a protected application passes through a protected module located in our infrastructure), or as a standalone solution (when WAF is installed in a client's infrastructure).

Source: https://habr.com/ru/post/334998/

All Articles