📜 ⬆️ ⬇️

Anomaly detection in network monitoring data using statistical methods

When there are too many observable metrics, tracking all graphs on your own becomes impossible. Usually, in this case, for less significant metrics, checks for reaching critical values ​​are used. But even if the values ​​are well chosen, some of the problems go unnoticed. What are these problems and how to detect them - under the cut.



Disclaimer
The author, although he has a mathematical education, is in no way connected with either Data Mining or statistical analysis. This material is the result of a study conducted to determine the possibility of writing an anomaly search module (even if it is weak) for the monitoring system being developed.

What are we looking for in two pictures


Examples of anomalies on the charts

Source Anomaly.io

Of course, in reality, things are not always so simple: only on b), d) and e) an obvious anomaly.

Source cyberleninka.ru


Current state of affairs


Commercial products are almost always presented as a service, using both statistics and machine learning. Here are some of them: AIMS , Anomaly.io (an excellent blog with examples), CoScale (integration capabilities, for example with Zabbix), DataDog , Grok , Metricly.com and Azure (from Microsoft). Elastic has a machine-based X-Pack module.
')
Open-source products that can be deployed in:


In my opinion, open-source search quality is significantly inferior. To understand how the search for anomalies works and whether it is possible to correct the situation, it is necessary to plunge into statistics a little. Mathematical details are simplified and hidden under the spoilers.

Model and its components


For the analysis of the time series using a model that reflects the expected features (components) of the series. Usually a model consists of three components:


You can include additional components, such as cyclical , as a trend multiplier,
abnormal (Catastrophic event) or social (holidays) . If the trend or seasonality in the data is not visible, then the corresponding components from the model can be excluded.

Depending on how the components of the model are interconnected, determine its type. So, if all components add up to get an observable series, then they say that the model is addictive, if they multiply, then they are multiplicative, if something multiplies, and something disappears, then mixed. Typically, the type of model is chosen by the researcher based on a preliminary analysis of the data.

Decomposition


By choosing the type of model and the set of components, you can proceed to decompose the time series, i.e. its decomposition into components.


Source Anomaly.io

First, select the trend, smoothing the original data. The method and degree of smoothing are chosen by the researcher.

Row smoothing methods: moving average, exponential smoothing and regression
The easiest way to smooth the time series is to use half the amount of the neighboring ones instead of the original values.

sn= fracxn+xn12


If you use not one, but several preceding values, i.e. arithmetic average of k-adjacent values, then such smoothing is called simple moving average with a window width k

sn= fracxn+xn1+...+xnkk


If for each previous value to use some kind of a coefficient that determines the degree of influence on the current one, then we obtain a weighted moving average .

A slightly different way is to use exponential smoothing. The smoothed series is calculated as follows: the first element coincides with the first element of the original series, while the subsequent ones are calculated by the forum

sn= alphaxn+(1 alpha)sn1


Where α is the smoothing coefficient, from 0 to 1. As it is easy to see the closer α is to 1, the more the resulting series will be similar to the original.

To determine the linear trend, you can take the method of calculating the linear regression yt=a+bxt+ varepsilont least squares method :  hatb= frac overlinexy overlinex2 ,  hata= baryb barx where  barx and  bary - arithmetic average x and y .

Source Wikipedia


To determine the seasonal component from the source series, we subtract the trend or divide it into it, depending on the type of model chosen, and smooth it again. Then we divide the data by season (period), usually it is a week, and we find the average season. If the length of the season is not known, then you can try to find it:

Discrete Fourier Transform or Auto-Correlation
Honestly, I did not understand how the Fourier transform works. Who would be interested in looking at the following articles: Detect Seasonality using Fourier Transform in R and Simple words about the Fourier transform . As I understand it, the original series / function is represented as an infinite sum of elements and the first few significant coefficients are taken.


To search for auto-correlation, simply shift the function to the right and look for a position so that the distance / area between the original and the shifted function (highlighted in red) is minimal. Obviously, for the algorithm, the shift step and the maximum limit should be specified, when reached, we consider that the period search failed.


For more information about models and decomposition, see the articles “Extract Seasonal & Trend: using decomposition in R” and “How To Identify Patterns in Time Series Data” .

Removing the trend and seasonal factor from the source series, we get a random component.

Types of anomalies


If we analyze only the random component, then many anomalies can be reduced to one of the following cases:


Conclusion


Of course, many algorithms for finding anomalies have already been implemented in the R language, intended for statistical data processing, in the form of packages: tsoutliers , strucchange , Twitter Anomaly Detection, and others . Read more about R in articles. And you already apply R in business? and My experience of introducing R. It would seem, connect packages and use. However, there is a problem - setting the parameters of statistical checks, which, unlike critical values, are far from obvious to most and do not have universal values. The way out of this situation can be a selection of brute force (resource-intensive), with a rare periodic refinement, independently for each metric. On the other hand, most non-seasonal anomalies are well defined visually, which suggests the use of a neural network on rendered graphs.

application


Below, I provide my own algorithms that work comparable to Twitter Breakout in terms of results, and somewhat faster in speed when implemented in Java Script.

Algorithm piecewise linear approximation of the time series
Approximation
  1. If the row is very noisy, then we average, for example. on 5 elements.
  2. The result includes the first and last points of the series.
  3. Find the most distant point of the row from the current polyline and add it to the set.
  4. Repeat until the average deviation from the broken line to the original row is less than the average deviation in the original row or until the limit number of vertices of the broken line is reached (in this case, the approximation probably did not succeed).

Algorithm for finding the shift in the data
Shear detection
  1. Approximate the original row of broken line
  2. For each line segment, except the first and last:
    • Find its height h , as the difference of y-coordinates of the beginning and end. If the height is less than the ignored interval, then this segment is ignored.
    • Both adjacent segments L and R divide by two, each plot L1 , L2 , R1 , R2 approximate by its straight line and find the average distance from the straight line to the row - dL1 , dL2 , dR1 , dR2 .
    • If a |dL1dL2| and |dR1dR2| significantly less than h then consider that a shift is detected

Algorithm for Finding Changes in Distribution
Distribution change detection
  1. The original series is divided into segments, long, depending on the number of data, but not less than three.
  2. Each segment is searched for a minimum and maximum. Replacing each segment with its center, two rows are formed - minima and maxima. Further rows are processed separately.
  3. The series is linearly approximated and for each of its vertices, except the first and last, a comparison of the data of the original series, lying to the right and to the adjacent vertices of the polyline, is carried out by the Kolmogorov-Smirnov test. If a difference is found, the point is added to the result.

Kolmogorov-Smirnov test
Let be x1,x2,...,xn1 and y1,y2,...,yn2 two sets of numbers and is required to assess the materiality of the differences between them.

Initially, the values ​​of both rows are divided into several (about a dozen) categories. Next for each category is calculated the number fx the values ​​from the series x and divided by row length n1 . Similarly for a number y . For each category we find |fxfy| and then the total maximum dmax in all categories. The test value of the criterion is calculated by the formula  lambda=dmax fracn1n2n1+n2 .

The level of significance is selected.  alpha (one of 0.01, 0.05, 0.1) and by it is determined by the critical value of the table . If a  lambda more critical values, it is believed that the groups differ significantly.

Source: https://habr.com/ru/post/344762/


All Articles