Measuring the intensity of the incoming flow of events in the decay model

In the class of flow algorithms there is a subclass that solves the problem of finding heavy elements (heavy hitters). In general, this task is formulated as “identifying the most frequently occurring events in the incoming stream and measuring their intensity”. In this publication, the employee of Qrator Labs, Artem janatem Shvorin, proposes an efficient algorithm for solving this problem.

Introduction

Algorithms for finding heavy elements help to solve problems, such as combating network overload, detecting network anomalies and attacks, and managing dynamic routing. For example, the well-known NGINX web server allows you to limit the intensity of requests to a specific resource, and in order to do this, the intensity must be measured quantitatively.

In this publication we want to show the reader another approach to measuring the intensity of the flow of events in the presence of many different (not identical) flows of events. Let a set of types of events be given. It is necessary to evaluate how often an event of this type occurs, and to pay attention to cases when an event of one type repeats “too often”.

An important characteristic of this tool is its high efficiency, so that it does not become a bottleneck in the entire system for analyzing and filtering traffic. The point in this case is to handle one event for several dozen clock cycles of the central processor of modern computers.
')
In addition to the intensity measurement itself, it is required to select the types of events that occur most frequently. Their intensity must be measured more accurately, while rare types of events do not play a large role, and their intensity need not be estimated with high accuracy.

The paper presents the Decay-based count algorithm developed by us, based on the model of decay of a radioactive substance (see the “Decay Model” section), with high efficiency solving the task of searching for heavy elements.

The task of finding heavy items

Classification of tasks

In the class of problems described, the following subclasses can be distinguished:

Threshold- $t$ . It is required to allocate the streams having the bigger intensity than the set share $t$ intensity of all incoming traffic.
Top- $k$ . It is required to select a specified number $k$ most intense streams.
Selection of flows whose intensity exceeds a certain absolute value.

The proposed algorithm refers to the last of the listed subclasses.

Target architecture

Depending on the platform on which this threading algorithm is supposed to run, there is a set of hardware limitations. Sometimes you need to embed the necessary functionality in the network equipment (switch, switch, etc.). The proposed algorithm is supposed to be used on a universal processor, but for some values of the parameters it can be embedded in network equipment.

Task parameters

The absolute value of the intensity of flow, which is considered "dangerous." The task of the algorithm is to identify flows whose intensity exceeds a predetermined threshold.
The key size is an identifier by which the type of event is determined. In this implementation, as in many other algorithms, it is required to store the key values in the table, so the key size affects the memory costs.
The method of calculating the intensity of one stream by the arrival times of similar events. In essence, this is an algorithmic definition of what the intensity of a stream is. In this case, the exponential moving average is calculated, which has a single parameter - the characteristic time $\ tau$ during which the weight of the event after its arrival is taken into account.

Solution accuracy

The quality evaluation of the algorithm can be a relative or absolute error in estimating the intensity of flows. Also used $(\ varepsilon, \ delta)$ -approximation as an accuracy estimate: if with probability $1- \ delta$ the error is not more than $\ varepsilon$ , then they say that the algorithm has an accuracy characteristic $(\ varepsilon, \ delta)$ .
If the error is qualitative, not quantitative, such as the inclusion or non-inclusion of this stream in the list of the most intensive in the task of top- $k$ , then the probabilities of false positive and false negative are taken for evaluation.

Overhead

As a rule, the main characteristic of the algorithm is the maximum intensity of the incoming stream, which it is capable of processing. That is, only the processing time of one event is important. However, in practice, other hardware limitations also affect, such as the size of the memory used to store the accumulated information about the stream. Especially hard hardware constraints occur when the functionality is embedded in network equipment. But even on ordinary computers, the presence of a large amount of RAM DRAM does not always help, because access to DRAM can take an unacceptably long time. In order to achieve the required performance, it is necessary to compromise with the accuracy of measurement and limit the size of the memory used so that it fits in the processor cache. In this implementation, with meaningful values of the parameters, the table is placed in the L2 cache of the processor. As a result, it was possible to ensure that the processing time for one event was several dozen processor cycles.

Methods for assessing the intensity of flow

To estimate the intensity of repetitions of events of a given type, it is necessary to count the number of events over time. To do this, you need to fix the time of occurrence of the event and somehow save it. Usually, a counter is used for this purpose, which is incremented when the corresponding event occurs. Then the intensity is estimated as the ratio of the counter value to the time interval during which the measurement is performed. More accurate methods of measuring the current intensity values use different variants of the moving average . For example, exponential moving average (exponential moving average, EMA) is a kind of weighted moving average with exponentially decreasing weights.

The paper proposes a new method for calculating the EMA. It uses the model of exponential decay, which describes such a physical phenomenon as radioactive decay. This model has several advantages compared with the traditional approach. First, data presentation is more economical in memory. Secondly, the decay model allows for an efficient implementation of the update counter operation — it requires a small number of integer arithmetic operations and one cached memory access.

Multiple Threading Methods

In the task of searching for heavy elements, the problem is not only the high intensity of the input traffic, but also a large number of different flows (types of events) that have to be monitored. A naive implementation involves setting up a separate counter for each thread, which leads to a significant memory consumption. In this case, there is a danger not only in the exhaustion of all the available memory, but in a significant slowdown in the speed of work due to cache misses. Therefore, as a rule, they refuse from the exact solution of the problem of searching for heavy elements, discarding a part of the accumulated information about flows. Many methods are known for limiting the amount of memory used and reducing the event processing time, some of which are listed below:

Packet sampling
Space saving algorithm
HashParallel
HashPipe
Count-min sketch

Task formalization

Before proceeding to the description of the algorithm, it is necessary to formulate with sufficient mathematical rigor the problem of measuring the flow of events, which we will do in this section.

The sequence of events of the same type is defined as a set of elementary events.

\ {event_k \} _ {k \ in I}

$\ {event_k \} _ {k \ in I}$ where index

$k$ runs over a finite or infinite discrete interval

$I \ subset \ mathbb {Z}$ .

In the simplest case, the event is determined only by the time of its occurrence:

$event_k = t_k$ , with the events ordered in time:

$t_ {k_1} <t_ {k_2}$ for

$k_1 <k_2$ . In most implementations of event recording systems, time is considered to be a discrete value:

$t_k \ in \ mathbb {Z}$ However, for theoretical reasoning it is convenient to generalize and consider time as continuous:

$t_k \ in \ mathbb {R}$ .

The main question that event recording systems must answer is an estimate of the intensity of the flow. Intensity can be strictly formalized for a uniform flow of events. Uniform flow

\ {t_k \}

$\ {t_k \}$ is defined as follows:

$t_k = \ varphi + pk, \ quad k \ in \ mathbb {Z},$

Where

$p> 0$ ,

$\ varphi \ in [0, p)$ - flow parameters - period and phase, respectively. That is, events occur at regular intervals. Then the intensity of such a flow, by definition, is expressed as

$r = 1 / p$ .

For non-uniform flows, formal determination of intensity

r = r (\ {t_k \})

$r = r (\ {t_k \})$ may differ depending on the problem statement. The decay model remains operable and useful in this case, however, the assessment of the quality of the calculation of the true intensity is given below only for the case of uniform flows.

In some models, it is required to additionally take into account some characteristic of the event, for example, its weight

$c_k$ - the value of the contribution of this event to the measured intensity. Then the event is determined not only by the time of its occurrence, but also by weight:

$event_k = (t_k, c_k)$ .

In typical implementations of event recording systems, a counter is started.

$s$ which in some way accumulates information about the flow of events, and at any time it is possible to obtain an estimate of the flow intensity from its current value

$\ hat {r} = \ hat {r} (s)$ such that

$r \ approx \ hat {r}$ . Counter update occurs on arrival of the next event.

$event_k$ :

$s \ leftarrow update (s, event_k),$

Where

$update ()$ - some function that is defined by the implementation. Below are a few examples with options for implementing the function

$update ()$ :

A simple count of the number of events: $update_1 (s, event_k) = s + 1$ ;
Counting the number of events based on weight: $update_c (s, event_k) = s + c_k$ ;
Calculate the exponential moving average (EMA) with the parameter $\ beta$ . Here is the counter $s = (v, t)$ stores two values: the actual value of the EMA and the time of the last update.
$update_ {EMA} ((v, t), event_k) = (v ', t'),$
Where $v '= \ beta + (1- \ beta) ^ {t_k-t} \ cdot v, \ quad t' = t_k$ .

In some tasks, it is required to distinguish event streams. Suppose there are many different types of events, numbered index

$i = 1, \ dots N$ , and it is required to consider separately message flows of each type. Then the event is described as

$event_k = (t_k, i_k)$ (or, given the weight of the event,

$event_k = (t_k, i_k, c_k)$ ) where

$i_k$ - type of event. Many indexes

$k$ related to this type of event

$i$ denote

I_i = \ {k \ mid i_k = i \}

$I_i = \ {k \ mid i_k = i \}$ .

Decay model

The decay model is described as follows:

$v (t) = v_0 e ^ {- \ lambda t},$

Where

$v_0$ - “amount of substance” at time zero,

$v (t)$ - quantity at time

$t$ ,

$\ lambda$ - model parameter (the so-called decay constant). The name of this model reflects the fact that it describes the physical phenomenon of radioactive decay .

Additionally we introduce the following notation:

$\ tau = 1 / \ lambda, \ quad \ alpha = e ^ {\ lambda}.$

In our case, as a model parameter instead of

$\ lambda$ more convenient to use

$\ tau$ , insofar as

$\ tau$ will have the dimension of time and informally can be defined as a characteristic time interval during which the history of events is collected.

We will by definition assume that each type of event corresponds to a value

$v$ , which has the physical meaning of “amount of substance” and depends on time, which increases suddenly by one when an event occurs, and decreases at the rest of the time according to the above exponential law.

In fig. 1 shows how the value changes over time.

$v$ for uniform flows with different intensity and for uneven flow.

Figure 1: Value

$v$ for different threads

Magnitude

$v$ not explicitly stored in memory, but can be computed at any time. Stored same value

$s$ such that at time

$t_ {now}$ magnitude

$v$ is expressed as follows:

$v = \ alpha ^ {s-t_ {now}}.$

This representation corresponds to the decay model.

Calculating a counter in a decay model is equivalent to calculating an exponential
moving average with an accuracy of the normalizing factor (constant). An important feature of this view is that the only value stored in the counter replaces the pair (the time of the last event, the accumulated value) that is stored with the traditional approach.

Counter update

Nontrivial operation is to update the counter value.

$s$ upon the occurrence of the next event. In this case, you need to replace

$s$ on a new meaning

$s'$ which is determined from the following equality:

$\ alpha ^ {s'-t_ {now}} = \ alpha ^ {s-t_ {now}} + 1.$

Here on the left is the value

$v$ immediately after the event, and on the right - the value

$v$ immediately before the event, increased by the contribution of the event (equal to one). Below will be proposed effective ways to calculate the result of the update.

In terms of the function

$update ()$ from the definition, the update operation is expressed as:

$update_1 (s, event_k) = t_k + \ log_ \ alpha (1 + \ alpha ^ {s-t_k}).$

Here is the execution time of the operation.

$t_ {now}$ coincides with time

$t_k$ the arrival of the message.

Intensity determination

Let there be a uniform flow of intensity events.

$r$ that is, events occur with a period

$p = 1 / r$ . Uniform flow is defined according to the definition.

If the measurement is made immediately after the occurrence of the next event, that is

$t_ {now} = t_n$ for some

$n$ , then accumulated over a long time counter value

$s$ will be expressed as follows:

$\ alpha ^ {s-t_ {now}} = \ sum_ {k = - \ infty} ^ n \ alpha ^ {t_k-t_ {now}} = \ sum_ {k = 0} ^ \ infty \ alpha ^ { -kp},$

from where

$\ alpha ^ {- \ Delta s} + \ alpha ^ {- p} = 1,$

Where

$\ Delta s = s-t_ {now}$ - the relative value of the counter.

In a more general case, the moment of measurement may be between events:

$t_ {now} = t_n + \ psi$ ,

$\ psi \ in [0, p)$ . In this case, the value of the counter will differ in the lower side by the value

$\ psi$ :

$\ alpha ^ {- (\ Delta s + \ psi)} + \ alpha ^ {- p} = 1.$

The task of measuring the intensity is to evaluate the intensity by the counter value. Assuming that the flow is uniform, it is possible to obtain estimates of the true intensity of the uniform flow from above and below, substituting extreme values into the previous equation

$\ psi = 0$ and

$\ psi = p$ :

$r ^ - \ le r <r ^ +,$

Where

$\ begin {align} r ^ + & = r ^ + (\ Delta s) = \ frac1 {\ log_ \ alpha (1+ \ alpha ^ {- \ Delta s})} \\ r ^ - & = r ^ - (\ Delta s) = \ left \ {\ begin {array} {ll} \ frac1 {- \ log_ \ alpha (1- \ alpha ^ {- \ Delta s})} & \ mbox {for} \ Delta s > 0 \\ 0 & \ mbox {otherwise} \ end {array} \ right. \ End {align}$

Both scores

$r ^ +$ and

$r ^ -$ monotonously depend on the value of the counter

$s$ (see Fig. 2), therefore, for example, comparing the current value of the counter with the threshold value does not require additional calculations.

Figure 2: Function Graph

$r ^ -$ and

$r ^ +$ for

$\ tau = $ 1$

In the decay model, the intensity of arbitrary (not necessarily uniform) fluxes, by definition, is given by the above estimates.

$r ^ +$ and

$r ^ -$ . Moreover, if the measurement occurs immediately after processing the event, then the intensity is considered to be exactly equal to the lower estimate.

Limits of applicability of the decay model

There are restrictions on the intensity value, which can be correctly measured in the decay model.

First, if the time of occurrence of events is measured as a discrete value, then the flow period cannot be less than one. That is, the flow rate should not exceed

$r_ {max} = 1$ . Hence the restriction on the relative value of the counter.

$\ Delta s = s-t_ {now}$ - it must not exceed the value

$\ sigma_ {max}$ which is determined from the formula:

$r ^ - (\ sigma_ {max}) = r_ {max}.$

Magnitude

$\ sigma_ {max}$ estimated as follows:

$\ sigma_ {max} = \ tau \ ln \ tau + \ omega (\ tau),$

Where

$0 \ le \ omega (\ tau) \ le1 / 2$ .

Secondly, the assessment of the intensity of weak (low-intensity) flows is difficult: for small relative values of the counter

$\ Delta s$ upper and lower estimate accuracy

$r ^ +$ and

$r ^ -$ worsens, and with negative values

$\ Delta s$ the lower intensity estimate degenerates to zero.

There is also a limitation on the operating time of the decay model implementations associated with overflow of counters. Since the value of the counter cannot escape from

$t_ {now}$ more than

$\ sigma_ {max}$ , the system operation time is actually determined by the digit capacity of the counter and the duration of one clock cycle. If a 64-bit register is used to store the counter, it will not overflow for 100 years. But in implementations with low-digit registers, a reset mechanism must be provided for.

Algorithms for multiple flow accounting

Features of the decay model

The peculiarity of using EMA as a counter value is that when the flow of events stops, the accumulated value quickly (exponentially in time) degrades and becomes indistinguishable from zero. In the decay model, this fact is used to automatically reset the counter: although the counter value

$s$ will increase all the time with the arrival of events, some time after the cessation of the flow of events

$v = \ alpha ^ {s-t_ {now}}$ can be considered equal to zero, without obviously changing the value of the counter. Time

$T_ {vanish}$ , for which any previously accumulated value falls to a conditional zero, depends on the parameter

$\ tau$ and the required accuracy. It is expressed as

$T_ {vanish} = T_ {min} + \ sigma_ {max}$ where

$T_ {min}$ - time of decay of the value accumulated as a result of a single event. In the section "Tabular implementation" is given the exact expression

$T_ {min}$ , but here it is enough to bear in mind the following fact:

$T_ {vanish} = O (\ tau \ ln \ tau)$ with fixed accuracy.

This implies an upper estimate for the size of the storage of counters when taking into account a multitude of flows for the case when information will not be lost at all - it is enough

$T_ {vanish} \ cdot r_ {max}$ cells There are many applications of the task of searching for heavy elements, where large values are not required.

$\ tau$ , and all storage is placed in RAM or even in the cache of the L3 or L2 processor. In this case, a small access time to the storage is provided, so that the task becomes feasible at high values of the input stream intensity.

A hash table is suitable for implementing storage, where the key is the type of event. In this case, the cells whose counter value is

$s$ satisfies the condition

$s-t_ {now} \ le T_ {min}$ .

Decay-based count numeric implementation

Value update operation

We introduce the following notation:

$\ rho_ \ tau (x) = \ tau \ ln (1 + e ^ {x / \ tau}) = \ log _ {\ alpha} (1+ \ alpha ^ x).$

Then the update operation is expressed as follows:

$s' = t_ {now} + \ rho_ \ tau (s-t_ {now}).$

Thus, the task is reduced to the effective calculation of the approximation of the function

$\ rho_ \ tau$ . Time is convenient to represent an integer, for example, measure it in processor cycles, so you need to build an integer mapping

${R: \ mathbb {Z} \ to \ mathbb {Z}}$ such that:

$R_ \ tau (T) \ approx \ rho_ \ tau (T) \ quad \ mbox {for} T \ in \ mathbb {Z}.$

For this problem, the accuracy is not relative, but absolute, because mainly time operations are used with time values. Obviously, due to the integer representation of time, an error of less than 0.5 clock cycle is unattainable.
In addition, the operation of measuring the current time, as a rule, gives some error. For example, if there is a way to measure time with an accuracy of 10 clocks, it is enough to require that

$R_ \ tau$ gave an approximation of about the same accuracy:

$\ left | R_ \ tau (T) - \ rho_ \ tau (T) \ right | \ le 10.$

You can suggest several different ways to calculate

$R_ \ tau$ different complexity and degree of effectiveness.

Calculation $R_ \ tau$ : FPU

The easiest way to use floating point arithmetic and directly calculate

$\ rho_ \ tau$ by definition. The execution time of this operation is about 100 processor cycles, which is quite a lot compared to the method proposed below.

Calculation $R_ \ tau$ : tabular implementation

The idea of the table implementation is to save the pre-calculated function values in the table.

First, using identity

$\ rho_ \ tau (-x) = -x + \ rho_ \ tau (x),$

enough to build

$R_ \ tau (t)$ only for

$T \ le0$ .

Secondly, because

$\ rho_ \ tau (x) \ to 0 \ mbox {for} x \ to- \ infty,$

exists

, such that

$T_ {min}> 0$

$R_\tau(T)=0$ at

$T\le-T_{min}$ . Where

can be found from the following considerations:

$T_{min}$

$T_{min}=\lceil t_{min}\rceil,\quad \rho_\tau(t_{min})=1/2,$

from where

Thus, it suffices to determine

$T_{min}=\lceil-\log_\alpha(\alpha^{1/2}-1)\rceil=\lceil-\tau\ln(e^{1/{2\tau}}-1)\rceil.$

for

$R_\tau(T)$

${T_{min}<T\le0}$ .

Function graph

is presented in fig. 3. The peculiarity of this function is that with a change in the parameter

$\rho_\tau(x)$

will be the same scaling on both axes. Figure 3: Function Graph

$\ tau$

$\rho_\tau(x)$

Obvious implementation

is to build a table of

$R_\tau$

cells where the precomputed values will be stored. Since all the values of the function on this interval are between 0 and

$T_{min}$

, the final size of the table is

$\rho_\tau(0)=\tau\ln2$

bits. The algorithm for updating the value is as follows:

$T_{min}\log_2(\tau\ln2)$

$\begin{align} t_{now}& \leftarrow getTime()\\ \delta & \leftarrow \min\{|s-t_{now}|, T_{min}\}\\ \delta' &\leftarrow R(-\delta)\\ s' &\leftarrow \delta' + \max\{s, t_{now}\} \end{align}$

The absolute accuracy of this method is 1/2, since it gives the best integer approximation of a real-valued function.

Performance Measurement Results

The following three implementations of the exponential moving average were compared with each other:
1. naive implementation of EMA;
2. model of decay through FPU (that is, with a call to the functions exp () and log () of the mathematical library);
3. model of decay by tabular method.

The source code of the test for C: pastebin.com/N1Xg9dud .

Execution time of one call to update () function

(in this case, the table for calculating

$\tau=100000$

is placed in the cache L1) in implementations 1, 2 and 3 is100,125,11processor cycles, respectively. Using an exponential moving average is a convenient way to count events and estimate intensity. However, this operation is computationally expensive, which greatly limits the possibilities of its application. Our implementation of the algorithm based on the decay model is, firstly, beautiful, and secondly, more efficient than the naive implementation of the EMA calculation. Efficiency is ensured by two factors: tabular calculation of the transcendental function and more economical representation of the counter in memory.

$R_\tau$

Thanks

This publication was prepared by us in a trial mode as part of a project to highlight the mechanisms of the Qrator filtering network. Thanks to Anton Orlov, Artem Gavrichenkov, Evgeny Naradov and Nikita nikitadanilov Danilov for discussions and ideas.

Source: https://habr.com/ru/post/334354/

All Articles

Measuring the intensity of the incoming flow of events in the decay model

Introduction

The task of finding heavy items

Classification of tasks

Target architecture

Task parameters

Solution accuracy

Overhead

Methods for assessing the intensity of flow

Multiple Threading Methods

Task formalization

Decay model

Counter update

Intensity determination

Limits of applicability of the decay model

Algorithms for multiple flow accounting

Features of the decay model

Decay-based count numeric implementation

Value update operation

Calculation R tau R_ \ tau : FPU

Calculation R tau R_ \ tau : tabular implementation

Performance Measurement Results

Thanks

More articles:

Calculation $R_ \ tau$ : FPU

Calculation $R_ \ tau$ : tabular implementation