📜 ⬆️ ⬇️

Time Series Prediction

Hey.
I want to talk about one task that I was very interested in at one time, namely, about the task of forecasting time series and solving this problem using the ant algorithm.

For a start, briefly about the task and the algorithm itself:


Forecasting time series implies that the value of a certain function in the first n points of the time series is known. Using this information, it is necessary to predict the value at the n + 1 point of the time series. There are many different methods of forecasting, but today the Winters method and the ARIMA model are among the most common. Read more about them here .
')
The fact that such an ant algorithm has already been said quite a lot. For those who are too lazy to climb, for example, I will retell here . In short, the ant algorithm is a simulation of the behavior of the ant colony in their quest to find the shortest path to the food source. Ants, when moving, leave behind a trail of pheromone, which affects the likelihood of an ant choosing a given path. Given that the ants will go a short way more times in the same period of time, there will be more pheromone on it. Thus, over time, more and more ants will choose the shortest path to the food source.
For clarity, insert the picture:

image

Now, we proceed directly to solving the problem of prediction using the ant colony method.
The first problem we face is that it is necessary to present a time series in the form of a graph on which we will run an ant algorithm.
Two possible solutions were found:
1. Present a time series in the form of a multigraph where from each point of the time series you can go to each set of specific increments. (To facilitate the task, we will take the normalized values ​​on the interval from -1 to 1). This was the first approach we tried. He showed a good result on the time series of small dimension, but with the increase in dimension, both the accuracy of the forecast and the performance began to fall sharply, so they refused this option.
2. Present the time series as a set of linked graphs, where each graph is responsible for its own increment value of the time series value. in other words, we have a graph that is responsible for the increase of -1, -0.9 ... and so on to 1. The step, of course, can be reduced or increased, which will affect the accuracy of the forecast and the resource-intensiveness of the task (ultimately this option turned out to be the most successful .)

On this set of linked graphs, an ant algorithm was launched (on each graph its own), which put off the pheromone on the edges corresponding to the known values ​​of the time series. Moreover, when pheromone was postponed on column i, the pheromone was also deposited on graphs i-1 and i + 1, but in a much smaller amount (in our case, 1/10 of the base amount of pheromone), thus, the ants identified the most frequently occurring growth sequences a row, and due to the postponement of pheromone on adjacent graphs, the possible error and initial noise of the time series were leveled.

We tested this algorithm on artificially prepared time series with different levels of periodicity and noise. The result was twofold. On the one hand, at noise levels up to 0.3, the algorithm shows high prediction results, comparable to the results of the ARIMA model. At higher noise levels, there is a large scatter of results: the forecast is very accurate, then completely wrong.

At the moment we are working on the selection of the optimal values ​​of the parameters of the algorithm and some methods for its improvement, which I will write about as soon as they are sufficiently verified.

Thank you all for your attention.

Upd: I will try to answer your questions.
A multigraph is a graph, each vertex of which is connected to each.

Chaotic ranks, as already mentioned below, are not accidental. You can look at the images of the Lorentz series in 3-dimensional space and see the cyclical movement. It is difficult to simply determine this cyclical nature, and at first glance the series looks random.

The values ​​of the time series are normalized on the interval -1 ... 1 and written in a graph. Graph - in this case, the table of transitions from vertex to vertex. Pheromone is deposited on the edges (in the table cells).

In the case of concatenated graphs, several tables are used, each of which is responsible only for its own transition value.

Depending on the amount of pheromone in one or another cell, one or another value of the time series is selected as a result of the forecast.

The algorithm was tested mainly on the Lorenz series.

At the moment it is too early to talk about how much better or worse he is. It seems that the algorithm is subject to finding pseudoperiods and with an increase in the noise level the number of false periods increases.
On the other hand, with a good choice of parameters, the forecast accuracy is quite high (deviation up to 7-10 percent, which is not bad for a chaotic series.)

To test on real data we proceed later. Pictures will try to prepare and add in the near future.

Thanks for attention.

Source: https://habr.com/ru/post/111158/


All Articles