
In the
previous article, we described how the artificial intelligence Nemesida WAF helps with absolute accuracy to detect attacks on web applications with a minimum number of false positives. In this article, the new mechanism of work of Nemesida AI will be considered, which allows to increase the accuracy of detecting attacks by 2 times compared to the signature method, and also to reduce the number of false positives to 0.01%.
Theory: An Approach to Improving Recognition Accuracy
The work of WAF in the signature analysis mode (including in the “chain of rules” mode) is accompanied by a large number of false positives (False Positive), and the result of creating a table of legitimate queries or rules for excluding blocking will increase the possibility of such a WAF bypass. Even with a minimal list of legitimate queries, the likelihood of circumvention is extremely high. Topical WAF workarounds include:
- (/*!%55NiOn*/ /*!%53eLEct*/);
- (/?id=1/**/union/*&id=*/select/*&id=*/pwd/*&id=*/from/*&id=*/users);
- (-1 UnIoN SeleCT pAssWord fROM USers);
- (union%20%64istinctRO%57%20select).
The introduction of artificial intelligence in Nemesida WAF solved both problems. Now to improve the learning mechanism of Nemesida AI, we use the following strategy:
1. fixing the level of false positives on the value of 0.01%;
2. increase to the maximum detection level for a given level of false positives.
')
Thus, the parameters of the classifier are selected taking into account the fulfillment of each of the conditions, and the result of solving the problem of forming training samples of two classes based on the vector space model (legitimate traffic and attacks) directly affects the quality of the classifier.

The training sample of illegitimate traffic is based on the existing database of attack signatures obtained using the manual and semi-automatic testing mode of web applications, and legitimate traffic based on requests coming to the protected web application and recognized by the signature analyzer as legitimate. This approach allows you to adapt the Nemesida AI training system for a specific web application, reducing the number of false positives to a minimum. The volume of the legitimate traffic sample that is formed depends on the amount of free RAM of the server on which Nemesida WAF operates. The recommended parameter is 120.000 requests with 10 GB of free RAM for training models.

The AdaBoost algorithm (abbr. From adaptive boosting) is a machine learning algorithm proposed by Yoav Freund and Robert Schapire. It is a meta-algorithm; in the process of learning it builds a composition from basic learning algorithms to improve their efficiency. AdaBoost is an adaptive boosting algorithm in the sense that each successive classifier is built according to objects that are poorly classified by previous classifiers. In our case, the basic algorithm is decisive trees. AdaBoost causes a weak classifier in a loop. After each call, the distribution of weights is updated, which correspond to the importance of each of the objects in the training set for classification. At each iteration, the weights of each wrongly classified object increase, thus the new classifier “focuses its attention” on these objects.
TF-IDF (from TF - term frequency, IDF - inverse document frequency) - a statistical measure used to assess the importance of a word in the context of a document that is part of a document collection or corpus. The weight of a word is proportional to the number of words used in the document, and inversely proportional to the frequency of words in other documents in the collection. The TF-IDF measure is often used in problems of text analysis and information retrieval, for example, as one of the criteria for the relevance of a document to a search query, when calculating the measure of document proximity for clustering.
During the iterative replenishment of the training sample for the class of illegitimate traffic, the following pattern was revealed: with an increase in the number of attacks in the database, the accuracy of their detection also increases, while an excessive increase in the number of attacks in the database can lead to a significant increase in false positives. Given the large number of experiments performed, a base has been formed by now, containing 189,316 different requests with different signs of an attack on web applications. Thus, to date, it has been possible to achieve the required level of false positives (0.01%), increasing the accuracy of detecting new (not recognized by the signature method) attacks.
Practice: testing and generation of training data sets
To determine illegitimate requests, the existing attack signature database is used, supplemented by manual and semi-automatic testing of web applications. As a test, WebApp uses popular CMS, as well as systems that knowingly contain vulnerabilities (for example, DVWA).

The first is generated legitimate traffic that does not contain malicious requests, signs of attack or exploitation of vulnerabilities. This allows you to select a legitimate model of user behavior.
An example of a legitimate query that does not contain SQL injection operation vectors:GET /vulnerabilities/sqli/?id=1&Submit=Submit HTTP/1.1
Host: waf.office.pentestit.ru
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://waf.office.pentestit.ru/vulnerabilities/sqli/
Cookie: PHPSESSID=e91108c6l9jcqv9ob813kore73; security=low
Connection: close
Upgrade-Insecure-Requests: 1

The second stage is testing for penetration of a web application using both specialized tools and “manual” analysis to identify and exploit vulnerabilities.
An example of a query that exploits a SQLi vulnerability:POST /vulnerabilities/sqli/ HTTP/1.1
Host: waf.office.pentestit.ru
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://waf.office.pentestit.ru/vulnerabilities/sqli/
Content-Type: application/x-www-form-urlencoded
Content-Length: 49
Cookie: PHPSESSID=e91108c6l9jcqv9ob813kore73; security=medium; _ym_uid=1512232172282659138; _ym_isad=2; _ym_visorc_45042173=w; _ym_visorc_25686548=w; _ym_visorc_36267400=w
Connection: close
Upgrade-Insecure-Requests: 1
id=3 and 0 union select @@version,2&Submit=Submit
As a result of the query, the DBMS version will be displayed:First name: 10.1.26-MariaDB-0+deb9u1
These requests are marked as illegitimate and enter the Nemesida WAF artificial intelligence processing system. Vulnerable parameters are tested by many types of requests (payloads), which are both vectors of exploitation and attacks, which make it possible to identify vulnerabilities.
An example query containing the 'sleep' or 'benchmark' functions:SLEEP(5)#
SLEEP(5)--
SLEEP(5)="
SLEEP(5)='
or SLEEP(5)
or SLEEP(5)#
or SLEEP(5)--
or SLEEP(5)="
or SLEEP(5)='
waitfor delay '00:00:05'
waitfor delay '00:00:05'--
waitfor delay '00:00:05'#
benchmark(50000000,MD5(1))
benchmark(50000000,MD5(1))--
benchmark(50000000,MD5(1))#
or benchmark(50000000,MD5(1))
or benchmark(50000000,MD5(1))--
or benchmark(50000000,MD5(1))#
Web application testing technique using OWASP ZAP (fuzzing):

After several iterations of the first and second stages, the existing training model is tested to reduce the number of false positives. The more illegitimate requests are processed, the more accurate the detection of anomalies, including file content, after which the base of the collected anomalies in the form of an attack model as the basis for training for a specific web application is automatically uploaded to the server from Nemesida WAF:

Conclusion
The improved mechanism of learning artificial intelligence in Nemesida WAF allows 2 times more accurately detect attacks on a web application, while reducing to almost zero the number of false positives. In addition, Nemesida WAF contains a Virtualpatch system and a built-in vulnerability scanner, which, in case of detection of vulnerabilities in the protected web application, quickly install their own patch for the vulnerable component.
Nemesida WAF is a powerful tool to protect the site from attacks, and to experience this in practice, request a ready-made virtual machine with the pre-installed
Nemesida WAF test version or activate cloud protection for 2 weeks for free.