Algorithms and search data structures. Lectures and courses from Yandex

Today we are completing the New Year's series of posts devoted to lectures of the School of Data Analysis. The last, but not least important course is “Algorithms and search data structures”.

This course covers basic algorithms and data structures, including hashing, complexity and computational models, search trees, B-trees, geometric search problems, dynamic connectivity in graphs, and more.

We took into account what we were asked in the comments to the previous courses - now, if you wish, you can not only watch / download lectures separately, but also download everything all at once in the form of an open folder on Yandex.Disk . By the way - in the previous posts the same updates also appeared (here are the links for convenience: “ machine learning ”, “ discrete analysis and probability theory ”, “ parallel and distributed computing ”).
')

Lectures are given by Maxim Aleksandrovich Babenko, Deputy Director of Computer Science, Assistant of the Department of Mathematical Logic and Theory of Algorithms of the Mechanics and Mathematics Faculty of Moscow State University. MV Lomonosov, Candidate of Physical and Mathematical Sciences.

Full course in the form of a folder on Yandex.Disk

Lecture 1. Complexity and computational models. Accounting value analysis (start)

Key resources: memory and time. O-symbolism. Examples of calculation models: Turing machine, RAM machine. Difficulty on average and worst cases. Example: sorting task. Sort by selection. Theoretical information lower bound for complexity. Allowing trees. Lower bound for complexity in the model of resolving trees. Variable-sized arrays: additive and multiplicative reallocation schemes. Analysis of the multiplicative scheme for an array of variable size using the banking method.

Lecture 2. Analysis of accounting values (end)

Analysis of accounting costs of operations: potential function, true and accounting costs. Stacks and queues. Implementation based on a variable-size array and based on a linked list. Simulate the queue using two stacks. The task of maintaining a dynamic maximum in the stack and queue. Mutable and immutable data structures. Data structures with history storage (persistent). Immutable-stack and immutable-queue. The problem of a multiple future when analyzing accounting values in persistent structures.

Lecture 3. Quick sort and merge sort functions

The concept of the "divide and conquer" method. Merge-Sort Algorithm. Merge two ordered lists. Estimation of complexity. K-way Merge-Sort to work in external memory. Sort merge without using additional memory. The general scheme of the algorithm Quick-Sort. Two options for implementing Partition. Examples of unsuccessful selection of support elements. Randomized choice of support element. The complexity of Quick-Sort in the worst and average cases. The depth of recursion in the worst and average cases. Elimination of tail recursion. The problem of an optimal merge tree. Huffman codes. Merge two ordered sequences of different lengths. Theoretical information lower bound. Binary search "from the edge" (galloping).

Lecture 4. Ordinal statistics. Heaps (start)

Finding order statistics using a randomized modification of the Quick-Sort algorithm. The linearity of the expectation of work. Approximate medians. The choice of the k-th ordinal statistics for the linear in the worst case. Trees with heap properties. Almost complete binary trees: vertex numbering, navigation. Binary heap. The operation of sifting down and up. The implementation of the operations of insertion, deletion and search for a minimum. Conversion of an arbitrary array of keys into a heap (operation Make-Heap), linearity of the operating time. Heap-Sort sorting algorithm.

Lecture 5. Heaps (beginning). Hashing (start)

K-ary heaps, the dependence of the complexity of operations on the choice of k. Binomial (binomial), left (leftlist) and skew heaps.

Lecture 6. Hashing

Hash functions. Collisions. Collision resolution by the chain method, the method of successive samples and the method of double hashing. Hypothesis of simple uniform hashing, estimation of the average length of a chain. Universal families of hash functions, estimation of the average length of a chain. Building a universal family for integer keys. Perfect hash functions. Build a perfect hash function using the universal family. Interface set with errors. Bloom filter. Estimation of the probability of false positives. The dictionary interface with errors. Modification of the bloom filter.

Lecture 7. Search Trees (start)

Definition of the search tree. Insert and delete items. Inorder-traversal of the tree. Red-black trees: definition and basic properties. Implementing insert operations for red and black wood. Splay-trees. The splay operation: zig, zig-zig and zig-zag steps. Implementing insert, delete, merge, and split operations for splay trees.

Lecture 8. Search trees (end). Cartesian trees

Cartesian trees (duchi). The uniqueness of the Cartesian tree for a given set of different keys and priorities. The logarithmic estimate of the expectation of the height of the duchi. Merger and separation operations for duch. Operations of insertion and removal of elements for duches. Construction of a Cartesian tree in linear time, subject to pre-sorting of keys.

Lecture 9. B-trees. The system of disjoint sets

B + trees: definitions and basic properties. Search, insert and delete operations for B + trees. Systems of disjoint sets. Implementation using the forest. Ranks of vertices, rank heuristics. Logarithmic rank score by number of elements. Randomized Rank Heuristics. Heuristics of compression paths. Estimation of the cost of transactions (without proof).

Lecture 10. RMQ and LCA tasks

Tasks RMQ (range minimum query) and LCA (least common ancestor). Reduction from the RMQ problem to the LCA problem, a Cartesian tree. Tarzhan's algorithm for the offline version of the LCA task. The simplest algorithms for the online version of the LCA problem: a complete and sparse answer table. Farah-Colton-Bender Algorithm for the problem ± 1-RMQ. Reduction of the LCA problem to the ± 1-RMQ problem: Euler tree traversal.

Lecture 11. Tasks of geometric search

Location problem, stabbing problem. Spacing trees. Reduction of a system of intervals to a two-dimensional problem. The task of finding points in the corridor. Priority search tree. The task of finding points in a rectangle. The tree of segments along the X coordinate, ordered by Y lists of points at each vertex. The complexity of O (n log n) to build and O (log ^ 2 n) for the query. Reduce search time to O (log n). The task of simultaneous search in a set of ordered lists. Fractional cascading.

Lecture 12. Dynamic connectivity in graphs

The problem of dynamic connectivity: insert and delete edges, requests for connectivity. A special case of the problem for the case of forests. Euler traversal trees: merge and split. Use of depreciation and set of spanning forests for solving with complexity O (log ^ 2 n).

Source: https://habr.com/ru/post/208716/

All Articles

Algorithms and search data structures. Lectures and courses from Yandex

More articles: