"Distribution in request" or "we get rid of search"

A good search is the absence of a search. Consider an example of replacing a full brute force query

At one time, about 3 years ago, it became necessary to optimize the configuration of 1C and eliminate its bottlenecks in one company. One of these bottlenecks was the seemingly innocuous mechanism of distribution of goods in the implementation of the series. The bottom line is that the lines were distributed quite a lot and it was very slow. Not millions at a time, of course, but this very distribution for one document could take up to a minute.

I specifically quote the query on T-SQL, since I think that Habravets will be closer.

In general, this case made everyone very sad, because at the same time, the accountant forwarded the documents, other operators also formed the shipment documents, and when the “big” client was dispatched, life stopped dying for a while.
')
By the way, the size of 1C base for 2-3 years at that time was ~ 500 GB, orders from one client per day could come a dozen or two, and in some of them there could be more than 1000 lines, in general, “Sales of goods and services "Per 1000 lines - it was nothing supernatural. Reindexing, updating statistics, shrink and other necessary procedures were carried out regularly, but now this is not about that. Let's return to our optimization object. At that time, the distribution mechanism was trivially simple:

The request received balances by series (Nomenclature - Series - Quantity).
Another request received a table of goods for shipment (Item - Customer order - Quantity).
Passed an ordinary search for each item on the principle of " As long as the AmountDistribution > 0 Cycle ... ... ...".

Since I have always adhered to the position that the fact of busting on large amounts of data is in itself a bottleneck, I did not even plan to consider the possibility of “improving” the brute force algorithm. An alternative was needed. Also at that time, I had long since got my hand in the optimization of complex queries and was entrenched in the conclusion that there was not a single task that could not be solved solely by a query and knew for sure that a qualitative query (query packet) would be the most effective solution in 99% of cases than any post-processing of query results. The question remained only in finding this solution).

I went out for a smoke break with a fairly trivial condition of the problem (to distribute the number of dimensions of one table to the number of dimensions of the other) and 2 theses:

We have 2 tables that are already collected by the query.
SQL does not know any "allocate." SQL knows only “more”, “less”, “equal” (exaggerated). It is necessary to give him some parameter for comparison. A numerical parameter, according to which it will be clear how much more can be distributed into a conditional string.

And at this very moment, when I mentally recited the second thesis, the word " more " and prompted me to a decision. Next, drawing with a stick in the snow, I did not have time to finish it, as I ran to try my hypothesis in the query console.
Consider the following simple example:

We have storage cells with the quantity of the goods contained in them on the one hand (A, B, C, D) and the product itself (X, Y, Z), which must be “somehow” decomposed into these cells, but so that No more goods were put into the cell than might be in it.

A - 10 places
B - 1 place
C - 5 places
D - 8 seats

X - 13 pcs
Y - 1 pc
Z - 4 pcs

The result should be the distribution table:

AX-10
BX-1
CX-2
CY-1
CZ-2
DZ-2

To do this, we need to determine the order of distribution, it turned out to be simple to do so:

select t1.Cell, t1.Qty, ISNULL(sum(t2.Qty),0)+1 as OrderBottom, ISNULL(sum(t2.Qty),0)+t1.Qty as OrderTop into OrderedCells from Cells as t1 left join Cells as t2 on t1.Cell > t2.Cell Group by t1.Cell, t1.Qty

By the way, here you can also take into account the order of distribution, if, for example, in some cells, the goods must be put in the first place. It is solved by changing the conditions in the compound.

The same with the goods:

 select t1.Goods, t1.Qty, ISNULL(sum(t2.Qty),0)+1 as OrderBottom, ISNULL(sum(t2.Qty),0)+t1.Qty as OrderTop into OrderedGoods from Goods as t1 left join Goods as t2 on t1.Goods > t2.Goods Group by t1.Goods, t1.Qty

For ease of understanding, I will decompose all these positions individually in a table and lay one over another in the order of distribution:

We just need to write the boundary conditions. Now it remains to simply join these tables and get our result:

 select OrderedCells.Cell, OrderedGoods.Goods, case when OrderedGoods.OrderTop < OrderedCells.OrderTop then OrderedGoods.OrderTop else OrderedCells.OrderTop end - case when OrderedGoods.OrderBottom > OrderedCells.OrderBottom then OrderedGoods.OrderBottom else OrderedCells.OrderBottom end + 1 as Qty from OrderedCells inner join OrderedGoods on OrderedCells.OrderBottom <= OrderedGoods.OrderTop and OrderedCells.OrderTop >= OrderedGoods.OrderBottom

At once I will make a reservation that the request has intentionally added more fields than necessary. It would have been possible to get along with one distribution boundary (cumulative) and not to do a “+1”, but it seemed to me that in this form it is more clearly understandable. We do not consider query optimization in this topic, therefore the indices are not described here either. Well, more complex distribution algorithms (for several dimensions, for example) are solved only by changing the conditions of the connection and checking the boundaries.

That's all. As a result, instead of waiting minutes on the same amounts of data, this request was executed in milliseconds.

Please forgive me for the abundance of lyrics in this article. I wanted to give not a mathematical solution to a narrow problem, but to share a conceptual approach to solving such problems, namely the course of my thoughts.

Source: https://habr.com/ru/post/318240/

All Articles

"Distribution in request" or "we get rid of search"

More articles: