How we analyze vulnerabilities using neural networks and fuzzy logic

Image: Daniel Friedman , Flickr

In our blog on Habré, we write a lot about the implementation of DevOps practices in the development and testing of information security systems created in the company. The task of an automation engineer is not always to install and maintain a service, sometimes it is necessary to solve time-consuming research tasks.
')
To solve one of these tasks - analysis of vulnerabilities during competitive analysis tests , we developed our own universal classifier . How this tool works, and what results it allows to achieve, and will be discussed in our today's material.

Some theory

For a start, let's look at what a classification is in the general case. The classification of a derived object means the relation of this object to one of the two classes, depending on how “similar” it is to the standard used in the subject area. That is, for the classification task it is necessary to construct a certain function (classifier), which would indicate the level of “similarity” of our object to reference examples from different classes (for more information , click here ).

Euler-Venn diagram for the classification of vulnerabilities

To solve a wide class of classification problems, it is proposed to use several theories:

fuzzy set theory;
tool for fuzzy evaluation of properties of objects: fuzzy scales;
theory of neural networks.

Fuzzy set theory

Lotfi Zadeh became the founder of the theory of fuzzy sets and fuzzy logic in the 60s of the last century. The meaning of the concept “fuzzy set” is best illustrated by a simple example of explaining what “many” is. One copy of something — not many, two — as well, but three, four, or five — may already be many. For the mathematical description of a fuzzy value, a so-called membership function is used, which for each object of the considered area puts in correspondence a number characterizing the value of belonging to a given fuzzy set.

Fuzzy scales

This is an ordered set of fuzzy sets, that is, each of them must bear some semantic load. An example is the well-known level scales. Here is a universal fuzzy scale, consisting of five levels:

S = {Min, Low, Med, High, Max}

When operating with level scales, we have the ability to determine when a value is at some level. Such fuzzy scales allow you to interpret the values of specific properties as a number (for more details , click here ).

Neural networks

It is known that in a biological neuron cells can accumulate electrical impulses that are transmitted to synapses, connecting several neurons to each other. Depending on the threshold of sensitivity of the cell, the electrical signal is transmitted or not transmitted further.

Mathematical neural networks are also constructed in the same way. The input of the neuron can be any numbers - both clear and fuzzy, they are multiplied with weights. For each neuron, a “trigger threshold” is set - the sum of the products of the inputs and weights is transmitted to the input of the activation function, which gives the result for a particular neuron. Such neurons, located one after another, are called a neural network (for more information , click here ).

To improve the quality of vulnerability analysis by our products, we needed to learn how to determine their belonging to one of two classes - confirmed or unconfirmed vulnerabilities. To this end, many experiments were carried out, which culminated in the creation of an optimal neural network for solving this problem.

It consists of four layers, at the input of which numbers are fed, and at the output we get two clear or fuzzy numbers that characterize the level of belonging to one of the classes - for example, the minimum level of "similarity" or "maximum" (more on the link ).

Classification automation

To automate the process of classification of objects, we have developed a special tool - FuzzyClassificator. This is a fuzzy neuroclassifier, which is based on a neural network that processes clear and fuzzy values. The code for this tool is available on GitHub , Pyzo and PyBrain are required for its work (for more information , click here ).

Now we use the FuzzyClassificator tool to solve a specific applied problem of classifying vulnerabilities. They are an excellent example of objects that have a fuzzy nature and which even a person cannot unambiguously classify.

There are only two stages in the operation of any system based on a neural network - its training and classification. At the first stage, to solve our classification problem, we scan a variety of different CMS with a variety of security scanners. At the output, these scanners give a lot of information about vulnerabilities in the CMS - at this stage it is impossible to say whether they are real or whether we are dealing with false positives. We place the obtained data into the TFS database, from where it can be received and encoded in a form that is understandable for a neural network.

Then the neural network is trained on the reference data, after which it can be used on the data obtained during the tests of security scanners.

What is the result

Previously, we had to deal with manual vulnerability analysis - only in this way it was possible to understand whether our products worked correctly, whether a vulnerability actually was found and if it was so serious. The neural network helps to save up to 70% of the time for analysis. In particular, this has increased the number of scanned CMS and analyzed security scanners for competitive analysis.

This process was automated in the TeamCity system used by us. Testers use a special interface to run the FuzzyClassificator and use the neural network in the learning and classification mode.

An example of a system report at the training stage is as follows:

It includes data on the quality of the neural network being trained - how badly can the network be mistaken in the analysis. The report in the "combat" mode of vulnerability analysis looks like this:

All vulnerabilities are summarized in a table reflecting the levels of confidence of the neural network in the actual presence of a particular vulnerability or its falsity, as well as recommendations for interpreting this data. Example - in the figure above, the first neural network is ready to confirm the vulnerability with a minimum level of confidence, and reject it with a maximum, therefore it recommends rejecting this error, mark it as Rejected, that is, it is false positive for the scanner. After the neural network has produced a result, it sends it also to the TFS database.

Limitations and improvements

Like any tool, our FuzzyClassificator has its limitations. Correct classification with its help:

strongly depends on the chosen method of encoding the input data;
requires a good knowledge of the subject area for which the classification is performed;
requires considerable effort in preparing “good” input for training.

At the moment for the tool code and all its low-level methods, we have already carried out algorithm optimization, but we are not going to stop there. In our immediate plans:

translation tool to CPython;
implementation of code execution on the GPU.

Materials on the topic:

PS The story about our experience in creating a fuzzy classifier was presented as part of the DevOps mitap, which took place in the fall of 2016 in Moscow.

Video:

Slides

The link presents presentations of 16 reports presented during the event. All presentations and video presentations are added to the table at the end of this topic-announcement .

Author : Timur Gilmullin

Source: https://habr.com/ru/post/323436/

All Articles