Search for a Hamiltonian cycle in a large graph (traveling salesman problem). Part 1

1. Statement of the problem

The full weighted graph of 500 vertices is given by the adjacency matrix.
It is necessary to find the Hamiltonian cycle in this graph as low as possible total cost.

2. Decision

1. Greedy algorithm

Everything is simple, we run from any vertex, select the minimum cost of the edge from which we can go, go to vertex x, eliminate our vertex, do the same at vertex x, etc.
Simple and fast.
However, the answer can be very, very far from the truth.
The greedy works well on a random graph, but here we don’t know anything about the graph.
Therefore, we go further.

2. Honest bust

The task of the traveling salesman, she is also searching for the Hamiltonian cycle of the minimum total cost - NP-heavy, so that a complete search for such a graph can last forever, so this idea is not suitable.

3. Branch and bound method

The second idea that came to mind is to use an improved brute force, throwing back obviously non-optimal solutions at each step of the algorithm.
')
One of the most efficient algorithms of this kind is the branch and bound method (“search with return”, “backtracking”), developed by Little, Merty, Sweeney, Karel in 1963.

Let S (0) be the set of all admissible closed routes of the traveling salesman problem with n cities and cost matrix c.
The Little method is based on dividing a set into two disjoint subsets and on calculating the estimates of each of them. Further, the subset with the minimum estimate (cost) is divided into two subsets and their estimates are calculated. At each step, the subset with the smallest of all estimates obtained at this step is selected and divided into two subsets. In the end, we obtain a subset containing one cycle (a closed route satisfying the constraints imposed), the cost of which is minimal.

The algorithm consists of two stages:

First stage

Bringing the cost matrix and calculating the lower estimate of the cost of the route r.

1. Calculate the smallest element in each row (reduction constant for a string)
2. Go to the new cost matrix, subtracting from each line its constant of reduction
3. Calculate the smallest element in each column (reduction constant for a column)
4. Go to the new cost matrix, subtracting from each column its constant of reduction.
As a result, we have a cost matrix, in which in each row and in each column there is at least one zero element.
5. Calculate r as the sum of reduction constants for columns and rows.

The second (main) stage

1. The calculation of the penalty for non-use for each zero element of the reduced cost matrix.
Penalty for non-use means an element with an index (h, k) in the matrix, means that this edge is not included in our route, and therefore the minimum cost of "non-use" of this edge is equal to the sum of the minimum elements in row h and column k.

a) We are looking for all zero elements in the matrix
b) For each of them we consider his penalty for non-use.
c) Select the element that corresponds to the maximum penalty

Why maximum? Because it means that exclusion from the route of this edge will lead to a maximum increase in the cost of the optimal route, which we are just looking for.

2. Now we divide our set S (0) into two - Sw (containing edge h, k) and Sw / o (not containing edge h, k)
3. Calculate cost estimates for routes that are included in each of these sets.
a) For Sw / o, everything is simple: since we do not take an edge (h, k), then for it the cost estimate is equal to the cost estimate of the set S (0) + penalty for not using the edge (h, k)
b) When calculating costs for the set Sw, we take into account that since the edge (h, k) enters the route, it means the edge (k, h) cannot enter the route, therefore in the cost matrix we write c (k, h) = infinity, and since we have already “left” from point h, and we have “already arrived” at point k, then no edge that leaves h, and not one edge that comes to k, cannot be used, so we delete from cost matrix row h and column k. After that, we present the matrix, and then the cost estimate for Sw is equal to the sum of the cost estimates for S (0) and r (h, k), where r (h, k) is the sum of the reduction constants for the modified cost matrix.
4. From the sets Sw and Sw / o, the one that has the highest rating is chosen.

So we continue, until in the cost matrix there will not be one unmarked row and one unclosed column.

Now the results.
If at each step you choose only from two sets (Sw and Swo), then the algorithm works quite sane time, but the accuracy is just terrible (it loses a LOT of a greedy item of claim 1)

If, on each step, to choose the best set of all those obtained for this step and not used before,
then for small graphs (about 40-50) vertices, good accuracy is obtained, but it was not possible to wait for 500 vertices.

Therefore, the following idea comes to mind:

4. The method of branches and boundaries with heuristics

Yes, really, why don't we introduce heuristics? Indeed, in the branch and bound algorithm, we actually build a tree, at the nodes of which we decide to take an edge (x, y) or not, and hang two children - Sw (x, y) and Sw / o (x, y). But the best option for the next iteration is selected only by evaluation. So let's choose the best not only by assessment, but also by the depth in the tree, because the deeper the selected item, the closer it is to the end of the count. Thus, we can finally wait for an answer.

The first and easiest variant of cost = mark heuristics is k * depth. k we select, depending on the average weight of the edge, so that the depth does not play the decisive role. Also add, for example, that depth is at least b, i.e. if our vertex is at a distance q from the root, and q <b, then depth = b, otherwise depth = q. For b, we choose a number such that the honest algorithm of branches and boundaries (without heuristics) reaches such a depth in adequate time. In my case it was b = 30.

Run, wait.

During the night of work we get an answer that wins the greedy algorithm ~ 2%.
Little, very little, considering that the greedy algorithm works for a couple of seconds and is written in 5 minutes.

And now the winner is:

Greedy algorithm with the local method of boundaries and branches

The algorithm is as follows:

1. Run the greedy algorithm, we get some route.
2. Divide the route into several parts.
3. In each part we make a dummy edge - from the last vertex of the route to the first one, which is forbidden to touch
4. On each of these parts, run the branch and bound method, without heuristics.
5. Combine the parts optimized by the branch and bound method, opening dummy edges and connecting the last vertex of the n-1 part with the first n part.

As it is easy to understand, this algorithm has several advantages.

- an honest method of branches and boundaries, without using heuristics;
- easy to parallel, we win in the time of work;
- the greedy algorithm tells us only parts of the partitions; after combining, we use only NUMBER_OF_PARTS-1 edges, given to us by a greedy algorithm, and not optimized by the branch and bound method.

Result.
Operating time on 25 parts - 3 minutes, winning over the greedy ~ 7%
10 pieces - 4 hours, winning ~ 15%

a continuation

Source: https://habr.com/ru/post/160077/

All Articles