Story about participating in the contest Intel Accelerate Your Code

In November 2012, the parallel programming competition from Intel was launched, and even a separate post on Habré was devoted to this. We learned about the competition from our teacher, Evgeny Kalishenko. He reads a course on "high-performance and parallel computing" at St. Petersburg Academic University and became the head of our team.

The aim of the competition was to write and optimize the solution of one algorithmic problem within a few weeks. Before describing her condition, we note several features of the competition.

Features of the competition

First, the solutions were tested on a computer with several dozen cores. This was different from the competition of well-known student ACM format competitions, where it is useless to parallelize the code: the total time spent by the cores is considered to be the runtime. The type of the proposed task was very similar to the tasks proposed at these Olympiads.

Secondly, a reference solution was given for the problem. This is the code that solves the problem guaranteed correctly, but very slowly. One possible strategy was to optimize and parallelize this solution. It was possible to completely rewrite the algorithm, but in this case it was necessary to preserve its semantics. The correctness of the participant's decision was determined by a simple diff between the answers given by the competitive and reference decisions on several tests. Looking ahead, I will say that we have chosen something between these two options: we borrowed the code responsible for input and output of the results from the reference solution, and completely rewritten the algorithm ourselves. With this we have secured ourselves from the stupid errors associated with the inconsistency of the format of the output of floating-point numbers, extra spaces and spelling errors in the output. Now about the proposed task.
')

Condition

In the input files are given: a list of flights with an indication of the companies that organize them; list of groups of airlines forming "alliances". To output to the output file you need the cheapest routes that meet several requirements. The “legend” is this: we fly from our hometown to the conference, and then we return back. At the same time, we may want to relax on the way to one of these cities, looking there before or after attending the conference. There are restrictions on the time of our arrival at the conference, at the time of departure from there and on the desired time of rest. Calculate the need for the best route for each of the cities that we want to visit, as well as for the option that we will not rest. Another condition: to fly in succession with two flights of one airline turns out to be 30% cheaper, and two flights of airlines from the alliance - by 20%. In addition, we do not want to wait for a transplant for too long, and the maximum waiting time is limited.

The reference solution launches an almost exhaustive search of various ways, and the option to optimize it immediately disappeared. Naturally, I wanted to use standard algorithms on graphs, but it was necessary to overcome several difficulties. The main one was that the algorithm took into account that the price of a flight depends on the previous one and the next. Known algorithms for finding the shortest path (Dijkstra, Floyd, Bellman-Ford) require that the edge weights be constant.

Algorithm

To begin with, we note that making cities - tops, and flights - with ribs - is a bad idea. Each flight departs at a certain time, and you need to consider that you can fly out of the airport only after we have arrived at it. Therefore, the peaks will be flights, and the edges will connect those that can fly sequentially.

Next you need to take into account discounts. Divide each vertex into five. Each of these five vertices will have a so-called “type”. If we want to take some flight, and before it we flew the flight of the same airline, then we agree that our path passes through the top of the first type. If the sequence of flights from one airline begins with this one, then we agree to go through the top of the second type. We will do the same with alliances, adding two more types of vertices. The fifth type will correspond to the fact that our discount on this flight is zero: the previous and the next flights are served by a company that is different from the current one and is not in the alliance with it. In the code, the numbering order is different.

Enumeration of vertex types

enum { FROM_ANY = 0, FROM_THIS = 1, FIRST_THIS = 2, FROM_ALLIANCE = 3, FIRST_ALLIANCE = 4 };

Connecting vertices with edges is simple: for each city we consider all arriving and departing flights. For each pair of flights, we determine whether we can fly first and fly second (given that we have a limit on the transfer time). If we can, then we iterate over all types of vertices corresponding to these flights, and we connect with edges the ones that are matched by type. For example, if the airlines that organized these flights are different, then the edges to the top corresponding to the first type (“the previous flight of the same company”) should not go. But at the top of the second type (“the first flight in a sequence from one company”), you just need to empty the ribs, since the next flight can be anything. Carefully sort out 25 pairs of types, we understand which pairs require connections, and which ones do not. We consider the price, taking into account the discounts for this flight and each type, we assign it to the edges entering the vertices, and the graph is almost ready.

Code linking vertices for a pair of flights

 void wire_two_flights(Graph &g, Flight* f1, Flight* f2, int i1, int i2) { #define P_B(t) push_back(make_pair(getCostWithDiscount(f2, t), i2 + t)) if(f1->company == f2->company) { g[i1 + FIRST_THIS].P_B(FROM_THIS); g[i1 + FROM_THIS].P_B(FROM_THIS); g[i1 + FROM_ALLIANCE].P_B(FROM_THIS); } else if(alliances[f1->company][f2->company]) { g[i1 + FIRST_ALLIANCE].P_B(FROM_ALLIANCE); g[i1 + FIRST_ALLIANCE].P_B(FIRST_THIS); g[i1 + FROM_ALLIANCE].P_B(FROM_ALLIANCE); g[i1 + FROM_ALLIANCE].P_B(FIRST_THIS); g[i1 + FROM_THIS].P_B(FROM_ALLIANCE); g[i1 + FROM_THIS].P_B(FIRST_THIS); } else { g[i1 + FROM_ANY].P_B(FROM_ANY); g[i1 + FROM_ANY].P_B(FIRST_THIS); g[i1 + FROM_ANY].P_B(FIRST_ALLIANCE); g[i1 + FROM_ALLIANCE].P_B(FROM_ANY); g[i1 + FROM_ALLIANCE].P_B(FIRST_THIS); g[i1 + FROM_ALLIANCE].P_B(FIRST_ALLIANCE); g[i1 + FROM_THIS].P_B(FROM_ANY); g[i1 + FROM_THIS].P_B(FIRST_THIS); g[i1 + FROM_THIS].P_B(FIRST_ALLIANCE); } }

However, it is still necessary to take into account that our path must go through the conference and place of rest. Let's first solve the task of finding the shortest path in the graph between two vertices A and B and passing through vertex C. This can be done as follows: copy the graph G, getting two copies of it G and G '(let's call them layers). Connect the vertex C with the vertex C '(copy C) of the graph G'. After that, you can look for a path between the vertices A and B '- it is guaranteed to pass through C, since there are simply no other ways to jump from one layer to another.

You can say that it is easier just to find a path from A to C, and then from C to B. But our approach can be generalized to the case when the path must pass through at least one of the many vertices, flights that can take us to the conference . In addition, more layers will be needed to take into account the place of rest. It will also be necessary to take into account two possible procedures for attending a conference and a place of rest. But these are details. The main thing is that the graph has increased as a result of such operations no more than a constant times.

Adding a pair of fictitious vertices to the beginning and end of the path and connecting them with all the candidates for the initial and final flight, we get a ready graph on which we can run the Dijkstra algorithm. To solve each of the subtasks corresponding to different places of rest, there will be a graph, which will increase the cost of its construction. With this and other factors that slow down our algorithm, still had to fight.

Optimization

How can you feel by reading the name of the competition, its condition and the recommendations of Intel, the competition was dedicated to optimization, and not the ability to find an asymptotically quick solution. If you studied at the university the theory of complexity, or at least the basic algorithms, you can remember how, when evaluating the effectiveness of algorithms, constants, and sometimes even logarithms and polynomials, are discarded (if the problem is complex and the solutions that are now exponential). In practice, the situation is sometimes different. A fast algorithm is only half the success.

We mention only some of the optimizations we used. First, the problem with multiple copying of the graph was solved. The count was stored as adjacency lists, and it was possible to prepare a “frame”, after which it was possible to add edges to it separately for each subtask (each city of rest). After solving the next subtask, the edges added during its solution were effectively removed from the adjacency lists, and the graph was ready for new use.

Secondly, it turned out that the built-in function for obtaining absolute time from its calendar representation (mktime) is very slow, and the time in the input data was given just in the calendar format. Three hundred thousand calls of such a function - and the operation time is increased by ten seconds. Fortunately, the time inside one month on the calendar is linear. Therefore, it is possible to cache the absolute value of the first second of each month at the first demand, after which within this month the transfer of time from one format to another requires only a few arithmetic operations.

Fast time conversion

 vector<time_t> tc(100000); time_t get_time(int year, int month) { size_t key = 13 * year + month; if(key >= tc.size()){ tc.resize(key * 2); } if (tc[key] == 0) { tm time; time.tm_year = year - 1900; time.tm_mon = month - 1; time.tm_mday = 0; time.tm_hour = time.tm_min = time.tm_sec = 0; time_t res = mktime(&time); tc[key] = res; } return tc[key]; } time_t convert_to_timestamp(int day, int month, int year, int hour, int minute, int seconde) { time_t res = get_time(year, month); res += (day) * 60 * 60 * 24; res += hour * 60 * 60; res += minute * 60; res += seconde; return res; }

The third significant optimization was related to reading data. This stage took much more time than the stages of building the graph and finding the path in it, and we used memory mapped IO - this quickly - and wrote our memory manager who is able to malloc, but not free. He asks the OS for a huge chunk of memory, and then distributes it in parts to the string reading functions that need memory.

Parallelism

Our solution was not completely parallel. Any attempt to run at least some subtasks in several threads led to either slowing down or not accelerating, even if these subtasks did not use shared data at all and did not require locks. We have convinced ourselves that our solution is computationally easy, but it constantly requires a lot of calls to different memory areas, so the RAM bandwidth was a bottleneck. Whether this is so, we did not have time to reliably find out, since the last days were spent on optimizing more important details. Although there are a couple of OpenMP directives in the code below, we even deleted them at the last moment before sending the final version.

Code

The code of our solution is a completely unreadable confusion with macros in C ++, but we still give a link to it. In such tasks, this is a completely natural state of the code, but we, of course, do not call for writing large projects in the same style.

Findings.

We managed to take third place in this competition in Russia, and the speed of our decision was third throughout Europe. A file with a hundred megabytes of flight data was processed by our program in a few seconds. The final assessment was influenced not only by this speed, but also by the documentation provided, as well as social activity aimed at instilling a love of optimizations in general and popularizing this competition in particular. As you can see from our story, we didn’t use any complex optimizations, we didn’t have to pick up compilation keys or fight for every bit. We hope that our story has convinced you that the excellent speed of the code can be achieved by careful development of the algorithm and optimization of the most critical parts of it, and no fanaticism is required for this.

We wish you all interesting tasks and effective solutions!

PS
The entire list of winners is on the Intel website .

Source: https://habr.com/ru/post/171707/

All Articles