Security scanners: automatic validation of vulnerabilities using fuzzy sets and neural networks

Now in the world there are a large number of information security scanners of various companies (including MaxPatrol , XSpider and the Positive Technologies Application Inspector code analyzer). Such tools differ in price, quality of scanning, types of identified vulnerabilities, methods of their search and dozens of other parameters.

When creating scanners, an important role is played by methods of testing their work , a special place in which is occupied by a competitive analysis of similar products.
')
As a rule, the result of any security scanner is a list of detected vulnerabilities, obtained in the process of analyzing a web application. The fact that heuristic algorithms are used in scanners leads to the problem of having false positives and filling the list with non-existent vulnerabilities in reality (false positives). And this, in turn, leads to the need to allocate a security expert to check the operation of the scanner.

To confirm the presence of a vulnerability, it is proposed to use “reference” lists of vulnerabilities contained in similar web applications. The analyst can use such lists to identify the most likely vulnerabilities of the product being tested and filter out the obvious false positives.

Statement of the task of fuzzy classification of vulnerabilities

In practice, we propose to solve the problem of confirming vulnerabilities from the list issued by the scanner as a task of comparing them with some standards. If all objects — both standards and candidates for vulnerabilities — can be unambiguously parameterized, represented as a vector, then the problem can be reduced to the classical problem of classifying the elements of a set .

Input data:

The set of Vulners of all web application vulnerabilities is given, which can be set by their vectors by signs v _i . Vulners has many Candidates - vulnerability candidates found by the scanner.
Each candidate vulnerability can be attributed to two classes: I - confirmed (Ver) and II - unconfirmed (NVer) vulnerabilities.
There are many Eth - reference vulnerabilities in class I.
A number of Scales are specified - measuring scales for assessing the properties of vulnerabilities, both clear and fuzzy.

Required:

Build functions linking clear and fuzzy scales to allow different interpretations of classification results.
Build the Classificator function, which for each vulnerability indicates an assessment of its belonging to the classes of confirmed and unconfirmed vulnerabilities.

As measuring scales for assessing the properties of information systems can be used:

A clear scale is a set of real numbers from the [0, 1] segment, which can be easily converted to any other kind of clear numerical sets — discrete, continuous, unbounded — using various conversion functions.
The fuzzy scale of an F-set of ordered fuzzy variables of the form FP = {fp _i }, where fp _i are linguistic variables that describe the values of the properties of an object.

Clear and fuzzy “universal” measuring scales

Input Encoding

For any classification method, vulnerabilities must be pre-coded, that is, represented by the vector v = {v _i } from Vulners. To do this, a formal coding rule must be specified, according to which it is possible to evaluate individual properties of real vulnerabilities on a clear scale S.

Let us define the vulnerability trait coding matrix M _Vulners , the rows of which are separate properties of vulnerabilities (vulner property), the columns indicate the numerical code (code) of a certain property, and the possible values of the properties are indicated in the cells of the matrix. To construct such a matrix, only significant properties that uniquely distinguish one automatically found vulnerability from another should be selected. It is clear that for each information security scanner the classification of vulnerabilities can be different. However, most of them contain such properties as, for example, the type of vulnerability, the protocol by which it can be exploited, the implementation channel inside this protocol, the type of vulnerable object, the path to an object on the server, a network request with an attack vector. All possible values of each property are encoded with non-negative integers, where zero is allocated as an indefinite property value, which will take into account, among other things, missing, new or not yet provided property values.

Matrix M _Vulners can be presented in tabular form. The values of the properties can also be fuzzy values and for use in further calculations they need to be dephased.

Neural network construction, its training and presentation of results

We will configure the neural network with three values:

Config = <inputs, {layer ^l }, outputs>,

where inputs is the number of input parameters, {layer ^l } is the set of non-negative integers indicating the number of neurons in the hidden layer number l, and outputs is the number of output parameters.

A vector (s _I , s _II ) with parameter values on a clear scale S _p can be interpreted as follows:

The values of the parameters indicate the degree of confidence from 0 to 1 in the belonging of the vector of signs of vulnerability to each class.
The values of the parameters, being multiplied by 100%, indicate the probability that the vector of the signs of vulnerability belongs to each class from 0 to 100%.
Parameter values phased with the special function Fuzzy (x, S _f ) indicate a linguistic assessment of the level of the vector of signs of vulnerability to each of the classes on a fuzzy scale S _f = {Min, Low, Med, High, Max}.

Software implementation of the classifier

For the practical use of neural networks when solving problems of fuzzy classification in the case of a different number of classes and network structures, the FuzzyClassificator software modules developed under the GNU GPL v3 license were developed. You can download the current version of FuzzyClassificator on GitHub.

For ease of use of modules in automation systems, program configuration is performed via the command line interface. The GitHub program description section contains detailed technical information about interface commands, module operation, and input data. The FuzzyClassificator modules require Pyzo, a free and open source development tool based on Python 3.3.2 and includes many subroutines to implement scientific computing, in particular the PyBrain library, a subroutine for working with neural networks.

The main software modules that implement the approaches proposed in the article and the mathematical apparatus:

FuzzyClassificator - implements the command-line user interface, receives and processes input data, sets training and classification modes, and provides results.
PyBrainLearning - defines methods for working with fuzzy neural networks, combining the capabilities of the PyBrain library and the author's FuzzyRoutines library.
FuzzyRoutines - contains routines for working with fuzzy sets and fuzzy scales.

Upper A-0-level of the functional IDEF0-model of the FuzzyClassificator program

Level A0 IDEF0-model. The main stages of the FuzzyClassificator

Level A1 IDEF0-model. FuzzyClassificator Stages Processes

The learning mode consists of the following steps:

1. Initialization of program objects with user-defined values.

2. Processing input data and preparing the neural network for training:

processing of the file with data on the vectors of attributes of standards;
preparing data for training in PyBrain format;
initialization of the parameters of the new PyBrain neural network or its loading from the specified file.

3. Training of the neural network at given standards:

initialization of the PyBrain trainer module;
network training with the help of a trainer and saving its configuration into a PyBrain file.

The classifying mode consists of the following steps:

1. Initialization of program objects with user-defined values.

2. Processing the input data and preparing the neural network for data analysis:

processing the file with data about the vectors of attributes of candidates;
loading the configuration of a trained PyBrain neural network from the specified file.

3. Analysis by the neural network of candidate feature vectors:

activation of the neural network and the calculation of the levels of vectors belonging to different classes;
interpretation of the results on fuzzy scales and the formation of a report file.

The input data with the vectors of attributes of standards and candidates are specified as plain text files with a tabulation as a value separator. For example, to set the data for training, you can prepare the ethalons.dat file containing the first header line and then the lines with the values of the reference feature vectors and their belonging to one or another class.

Values can be set on both clear and fuzzy scales.

Ethalons.dat file

input1 input2 input3 1st_class_output 2nd_class_output 0.1 0.2 Min 0 Max 0.2 0.3 Low 0 Max 0.3 0.4 Med 0 Max 0.4 0.5 Med Max 0 0.5 0.6 High Max 0 0.6 0.7 Max 0

And as data for analysis, a file candidates.dat can be prepared, which also contains a header line and lines with the values of candidates feature vectors:

File candidates.dat

 input1 input2 input3 0.12 0.32 Med 0.32 0.35 Low 0.54 0.57 Med 0.65 0.68 High 0.76 0.79 Min

The program creates a file with a report containing information about the neural network configuration and the classification results for each feature vector from the set of candidates.

After learning the neural network on the above examples, with the parameters specified by the command line:

python FuzzyClassificator.py --learn config=3,3,2,2 epochs=1000 rate=0.1 momentum=0.05

and then, in classification mode with command line parameters:

python FuzzyClassificator.py --classify config=3,3,2,2

The output is a report file .

Report file

 Neuronet: C:\work\projects\FuzzyClassificator\network.xml FuzzyScale = {Min, Low, Med, High, Max} Min = <Hyperbolic(x, {'a': 8, 'c': 0, 'b': 20}), [0.0, 0.23]> Low = <Bell(x, {'a': 0.17, 'c': 0.34, 'b': 0.23}), [0.17, 0.4]> Med = <Bell(x, {'a': 0.34, 'c': 0.6, 'b': 0.4}), [0.34, 0.66]> High = <Bell(x, {'a': 0.6, 'c': 0.77, 'b': 0.66}), [0.6, 0.83]> Max = <Parabolic(x, {'a': 0.77, 'b': 0.95}), [0.77, 1.0]> Classification results for candidates vectors: Input: ['0.12', '0.32', 'Min'] Output: ['Min', 'Max'] Input: ['0.32', '0.35', 'Low'] Output: ['Low', 'High'] Input: ['0.54', '0.57', 'Med'] Output: ['Max', 'Min'] Input: ['0.65', '0.68', 'High'] Output: ['Max', 'Min'] Input: ['0.76', '0.79', 'Max'] Output: ['Max', 'Min']

If we analyze the data from the candidates.dat file, then we can say with a high degree of confidence that an expert person, relying only on data from the ethalons.dat file, would give out similar classification results.

Conclusion

So, we managed to combine the mathematical apparatus of the theories of fuzzy systems and neural networks to solve the practical problem of classifying vulnerabilities. From the work done we can draw several conclusions:

Mathematical methods for classifying classes based on neural networks are applicable in the case of the classification of vulnerabilities.
To obtain adequate results, it is necessary to correctly construct a coding matrix and select the best properties for modeling vulnerabilities.
For the task of classifying vulnerabilities, it is recommended to use a neural network of perceptrons with two hidden layers and in a configuration that depends on the number of input parameters: in the first the number of neurons is equal to the number of input parameters, and in the second it is two times less.
The advantage of the proposed approaches is the use of universal fuzzy scales of linguistic variables that are applicable both for estimating the values of feature vectors and for interpreting the final levels of class membership.
The proposed method of fuzzy classification and the FuzzyClassificator software modules that implement it are universal, can be easily adapted and customized for specific objects of classification.

We will be happy to answer your questions in the comments. For more details, like the description of the device, see: math-n-algo.blogspot.ru/2014/08/FuzzyClassificator.html .

Author: Timur Gilmullin , Positive Technologies.

Source: https://habr.com/ru/post/246197/

All Articles