Every year on December 31, David Eppstein publishes a review of preprints over the past year on data structures and algorithms published on arxiv.org . The links can be found with materials for 2010 and 2011 ( my translation ) years.The cs.DS section is developing at a good pace: this year there were 935 preprints on algorithms and data structures, while in 2011 there were 798. The section does not yet reach a hundred per month, although in July (98 preprints) this threshold was very is close.
This is my personal list of a dozen preprints that I find particularly interesting. As usual, I don’t include in it my own works and some others that
I wrote about earlier . In addition, there are no results (for example, a
faster algorithm for finding the maximum flow ) that did not appear on arxiv.org.
')
Here they are, in chronological order:
Calculating PageRank for sublinear time . Christian Borgs, Michael Brautbar, Jennifer Chayes, and Shang-Hua Teng.
arXiv: 1202.2771 and WAW 2012 (9th Workshop on Algorithms and Models for the Web Graph). This algorithm is not much faster than the linear one (it allows you to find all pages whose pagerank is greater than Delta, for O (n / Delta)), but it seems interesting to me to prove such estimates in the natural model of calculations for hyperlinked page graphs.
Finding the "most dishonest" coin for the least amount of flips . Karthekeyan Chandrasekaran and
Richard Karp ,
arXiv: 1202.3639 . The problem solved in this article is very fundamental, and the result is the optimal solution, instead of the approximations known up to this point with an accuracy of a constant coefficient.
A fast, universal algorithm for string hashing. Owen Kaser and Daniel Lemire,
arXiv: 1202.4961. Algorithm for hashing strings using 0.2 CPU cycles for processing one byte while still having theoretically guaranteed characteristics.
The known algorithms for solving EDGE CLIQUE COVER are probably optimal. Marek Cygan ,
Marcin Pilipczuk and
Michał Pilipczuk ,
arXiv: 1203.1754 and
SODA 2013 . There is a simple
FPT algorithm for covering the edges of the graph with a minimum number of clicks, the time of which, however, is proportional to 2
2 o (k) poly (n), where k is the number of clicks and n is the number of vertices in the graph.
Undoubtedly , the task of improving this assessment has been around for quite some time. Last year, it was shown that the best
kernelization of the task cannot be achieved, and in this preprint it is shown that such an assessment is optimal regardless of whether we use kernelization or not. At least, provided that the
hypothesis of exponential time is true
.The approximability of the solution of the vertex coverage problem in power law graphs. Mikael Gast and Mathias Hauptmann,
arXiv: 1204.0982 . It seems to me that there should be more work on algorithms that use the properties that the page graphs on the Web or the users on social networks have. This algorithm is a good example: it improves the attainable degree of approximation for the vertex coverage problem for any graph, the distribution of the degrees of the vertices in which obeys a power law.
Clustering is complicated only when it is not needed . Amit Daniely, Nati Linial and Michael Saks,
arXiv: 1205.4891 . As in the previous article, this one describes a step from the complexity of the problem in general to the estimation of complexity for some more characteristic instances of the problem. The main idea is that the complex for clustering algorithms are only those data that are difficult to clustering in principle. The definition of “good clustering” given in the article seems to be very sensitive to data with a lot of outliers or interference, but this may well be a topic for further study.
Randomized parallel algorithm for solving a system of linear equations of size nxn over O (n 2 ) . Joerg Fliege,
arXiv: 1209.3995 . The real breakthrough here lies in the algorithm for solving systems of equations over finite fields, created by Prasad Raghavendra and
described earlier last year in the Richard Lipton blog. This article describes how to apply it to real numbers.
Improving the estimation of a deterministic solution of a dynamic graph connectivity problem. Christian Wulff-Nilsen,
arXiv: 1209.5608 and SODA 2013. A data structure is presented that supports the execution of update requests (creating or deleting edges) in O (log
2 n / log log n) amortized requests and the "whether there is a path between vertices u and v ”for O (log n / log log n) in the worst case, where n is the number of vertices in the graph. (Eppstein had a funny pun — the previous best estimate for a graph update request was O (log
2 n) and this result is just
log shaving (actually, log log shaving)).
8/5 approximation of the solution of the traveling salesman problem . András Sebö,
arXiv: 1209.3523 and
IPCO 2013 . This article deals with the modification of the traveling salesman problem, in which the start and end points of his journey do not necessarily coincide. I told students about
the Christofides algorithm enough times to know the difference between the path and cycle problem, but the fact that the approximation factor for estimating the path length is significantly larger than for estimating the cycle length somehow passed by me.
Approximation of k-medians by pseudo-approximation. Shi Li and Ola Svensson,
arXiv: 1211.0243 . The k-median problem here is to find exactly k cluster centers, minimizing the sum of distances from each point to the center of the cluster to which it belongs, an approximation to this problem is to find k centers, minimizing the objective function with some accuracy, and pseudo-approximation - finding
approximately k centers. As the authors write, adding even one extra point significantly changes the type of the objective function, so the fact of the applicability of the pseudo-approximation to this problem is quite surprising.