Yandex. Blitz. Why and what algorithmic problems need to be able to solve, working in the search

It is rare that a candidate passes only one technical interview - usually there are several of them. Among the reasons why they can be given to a person is not easy, you can call the one that every time you have to communicate with new people, think about how they perceived your answer, try to interpret their reaction. We decided to try to use the contest format to reduce the number of iterations for all participants in the process.

For Blitz, we chose purely algorithmic tasks. Although the ACM system is used to evaluate the rounds, unlike sports programming, all tasks are as close as possible to those that are constantly being solved in the production of Search. Those who successfully solve at least four tasks out of six can consider that they have completed the first stage of selection in Yandex. Why algorithms? In the process of work, tasks, projects, programming languages, platforms often change - those who own algorithms can always reorganize and quickly learn new things. A typical task at the interview is to create an algorithm, prove its correctness, suggest ways of optimization.

Qualification can be passed from 18 to 24 September inclusive. In this round, you will need to write programs to solve six problems. You can use Java, C ++, C # or Python. For everything about everything, you will have four hours. In the final round, those who will cope with at least four qualifying tasks will compete. The finals will be held simultaneously for all participants - September 30, from 12:00 to 16:00 Moscow time. Results will be announced on October 4th. So that everyone could understand what they will encounter on the Blitz, we decided to sort out a couple of similar tasks on Habré.

1. Robust Execution Manager

The first task is related to one of the data processing systems developed in Yandex, the Robust Execution Manager, REM. It allows you to build complex processes, including many different operations, and to establish dependencies between them. Thus, the solution of many problems is connected with the processing of user logs containing information about specified queries, displayed documents, completed clicks, and so on. Therefore, the calculation of the search factor CTR (click through rate, the ratio of the number of clicks on the document to the number of hits) depends on the tasks of processing user logs, and the task of calculating CTR depends on the task of delivering the values of this factor to the search runtime. Thus, REM allows building data processing chains in which the data provider is not obliged to know who will use this data in the future.

In the process of execution of processing chains may be amended. For example, after the construction of search logs, it may become clear that a significant amount of data was delivered with a significant delay, and the search logs need to be rebuilt. Naturally, after rebuilding the logs, it is also necessary to restart all the processes that depended on them. Another example of the same kind is getting into user logs of queries made by our search robot. These requests are not user requests, and therefore make noise in the calculation of search factors, calculations of experiments, and so on. Therefore, in this case, the logs also need to be updated, and then restarted all processes of their processing. Therefore, REM supports the operation of updating data and then restarting all tasks that depend on this data.

1.1. Formulation of the problem

The task is to implement this particular operation. The input will be an acyclic directed graph with a highlighted vertex. The graph describes some data processing, with the tops representing the stages of this process, and the edges - the dependencies between them. The selected vertex corresponds to one of the subtasks that must be restarted due to changes in the data. After the restart, it will also be necessary to restart all the subtasks that depend on it from the data. You must write a program that displays a list of these subtasks, and in the order in which they will need to be restarted.

1.2. Decision

Consider to begin with the solution of the problem in the case when the vertices from the answer do not need to be ordered. Then the problem is solved quite simply: let's run from the selected vertex any algorithm for traversing the graph (for example, traversing in depth), and write down all the visited vertices in response. Indeed, if the tasks with numbers $j_1$ , $j_2$ , ..., $j_k$ depend on the data from the task $i$ , then after restarting the task $i$ it is necessary to restart all these tasks, as well as those tasks that already depend on them. In addition, each vertex must be displayed exactly once. Both of these conditions are provided by standard graph traversal methods.

So the solution for this problem may look like.

#include <iostream> #include <vector> using Graph = std::vector<std::vector<int>>; void DFS(const Graph& graph, const int vertice, std::vector<bool>& used, std::vector<int>& result) { used[vertice] = true; result.push_back(vertice); for (size_t idx = 0; idx < graph[vertice].size(); ++idx) { const int adjacent_vertice = graph[vertice][idx]; if (!used[adjacent_vertice]) { DFS(graph, adjacent_vertice, used, result); } } } void Solve(const Graph& graph, const int from, std::vector<int>& result) { std::vector<bool> used(graph.size(), false); DFS(graph, from, used, result); } int main() { int vertices, edges; std::cin >> vertices >> edges; Graph graph(vertices + 1); for (int edge = 0; edge < edges; ++edge) { int from, to; std::cin >> from >> to; graph[from].push_back(to); } int reset_vertice; std::cin >> reset_vertice; std::vector<int> result; Solve(graph, reset_vertice, result); for (std::vector<int>::const_iterator it = result.begin(); it != result.end(); ++it) { std::cout << *it << " "; } std::cout << std::endl; return 0; }

Now we know how to form the set of vertices that make up the answer, but we do not know how to organize it. Consider an example: let it be from a subtask $A$ dependent tasks $B$ and $C$ . In this case, we do not care in what order to run the subtasks. $B$ and $C$ . But what if between them there is an explicit or implicit (through other subtasks) dependence?

In the situation depicted in the picture, you first need to perform the task $B$ , and only then $C$ . But standard workarounds do not allow this. We need a different algorithm!

To solve the problem that arises, we use topological sorting of the vertices included in the response, that is, we introduce some order on this set of vertices, for example: $v_1$ , $v_2$ , ..., $v_k$ and it will be true that if $j> i$ then $v_j$ does not depend directly or indirectly on $v_i$ . It is known that such an ordering is possible if the graph does not contain cycles. Topological sorting of vertices is achieved using a depth traversal: when all its descendants are visited for a certain vertex, it is added to the top of the list. The final list will be the desired sorting of the graph vertices.

Thus, the problem is solved by a detour in depth, during which we simultaneously build a set of vertices representing the answer, and carry out topological sorting of these vertices. Topological sorting ensures that the subtasks corresponding to the vertices of the graph are listed in the correct order.

This is how the complete solution to our problem will look

 #include <iostream> #include <vector> using Graph = std::vector<std::vector<int>>; void DFS(const Graph& graph, const int vertice, std::vector<bool>& used, std::vector<int>& result) { used[vertice] = true; for (size_t idx = 0; idx < graph[vertice].size(); ++idx) { const int adjacent_vertice = graph[vertice][idx]; if (!used[adjacent_vertice]) { DFS(graph, adjacent_vertice, used, result); } } result.push_back(vertice); } void Solve(const Graph& graph, const int from, std::vector<int>& result) { std::vector<bool> used(graph.size(), false); DFS(graph, from, used, result); } int main() { int vertices, edges; std::cin >> vertices >> edges; Graph graph(vertices + 1); for (int edge = 0; edge < edges; ++edge) { int from, to; std::cin >> from >> to; graph[from].push_back(to); } int reset_vertice; std::cin >> reset_vertice; std::vector<int> result; Solve(graph, reset_vertice, result); for (std::vector<int>::const_reverse_iterator it = result.rbegin(); it != result.rend(); ++it) { std::cout << *it << " "; } std::cout << std::endl; return 0; }

2. The number of different binary search trees

This task is not directly related to the libraries and programs developed in Yandex, but it is very interesting and therefore often becomes a topic for discussion at coffee points in a spare minute. It is a somewhat simplified version of task 9D with codeforces .

2.1. Formulation of the problem

It is necessary to calculate the number of different binary search trees, at the vertices of which the numbers from 1 to $N$ . For example, if $N = 3$ , there are exactly five different trees shown in the image below:

If a $N = 5$ , the number of trees is 42. That is why the task is very popular with our developers, among whom there are many connoisseurs of Douglas Adams' creativity.

2.2. Dynamic programming solution

This task is quite simple for those who are familiar with the methods of dynamic programming. Denote by $T_k$ the number of different binary search trees formed by numbers $1$ , $2$ , ..., $K$ . Suppose we know $T_0$ , $T_1$ , $T_2$ , ..., $T_N$ and want to calculate $T_ {N + 1}$ .

It is clear that at the root of any tree there will be some number from the set ${1, ..., N + 1.}$ We denote this number by $t$ . Then, due to the properties of the search tree, the left subtree contains the numbers $1, ..., t-1$ , and right - ${t + 1, ..., N + 1}$ . The number of different possible ways to form the left subtree then equals $T_ {t-1}$ , and right - $T_ {N - t + 1}$ . Therefore, the total number of trees composed of numbers $1$ , $2$ , ..., $N + 1$ and having a root meaning $t$ equals product $T_ {t-1} \ cdot T_ {N - t + 1}$ . In order to find the total number of trees, it is necessary to sum this product for all possible values. $t$ :

$T_ {N + 1} = \ sum_ {t = 1} ^ {N + 1} T_ {t-1} \ cdot T_ {N - t + 1} = \ sum_ {t = 0} ^ {N} T_ {t} \ cdot T_ {N - t}$

Let's give an example implementation, for a change in the Python programming language

 N = int(raw_input()) result = [1, 1] for count in xrange(2, N + 1): result.append(0) for root in xrange(1, count + 1): result[-1] += result[root - 1] * result[count - root] print result[N]

2.3. Relationship with Catalana numbers

The problem can be solved using the obtained formula. However, those who are familiar with combinatorics could find out in this expression a formula for calculating Catalan numbers . Therefore, it is possible to obtain a response using the formula using binomial coefficients:

$T_ {N + 1} = C_ {N + 1} = \ frac {C_ {2N + 2} ^ {N + 1}} {N + 2}$

In the process of solving the problem of binary trees, we obtained a recurrent formula that is completely analogous to the recurrent formula that arises when solving the problem of the number of regular bracket sequences of length $2N$ which, as a rule, is used to introduce Catalan numbers in courses on combinatorics.

The Yandex Blitz will be held on the Contest platform, where we usually run the Algorithm. To participate you need to register . We will definitely send you a reminder about the start of the rounds.

Source: https://habr.com/ru/post/337690/

All Articles