[Bookmark] Algorithms and data structures in the Linux kernel, Chromium and not only

Many students, for the first time confronted with the description of some clever thing, such as the Knut-Morris-Pratt algorithm or red-black trees, immediately ask themselves the questions: “Why such complications? And this, besides the authors of textbooks, does anyone need something? ” The best way to prove the benefit of algorithms is life examples. And, ideally, concrete examples of the application of widely known algorithms in modern, widely used, software products.

Let's see what can be found in the code of the Linux kernel, Chromium browser, and even in some projects.

Data structures and algorithms in the Linux kernel

Virtually any large software product can not do without its own implementations of known algorithms. Here is a small list of algorithms that are used in the Linux kernel. In some cases, we provide excerpts from comments on their implementation.
')
1. A linked list , a doubly linked list , a non-blocking linked list (lock-free linked list).

2. B + tree . Pay attention to the comments. They have valuable ideas suggested by practice.

Relatively simple implementation of B + trees. I wrote it when I understood how such trees work. Then he brought the code to mind so that it could be used.

Here applies one non-standard technique, which is not found in the books. Namely, the values increase from right to left, and not as is customary in the classical implementation - from left to right. All occupied cells in the node are located on the left; all unused ones contain the NUL value. In most actions, a single round of all cells is performed and stopped at the first cell that contains a NUL.

3. List with priority of elements (priority sorted list). A similar data structure is used, for example, in the implementation of mutexes and drivers .

4. Red-black trees are used in planning subsystems, for managing virtual memory, for tracking file descriptors, entries in directories and in other cases.

5. Interval trees.

6. Residual trees (radix trees) are used for memory management, in NSF search mechanisms, and in network subsystems.

A typical example of using residual trees is storing pointers to structures describing memory pages.

7. The priority heap has found application in the mechanism of resource allocation (control groups).

Simple queue with binary heap-based priority. It supports only the insertion of elements, the size is set at creation and does not change during the work. The implementation is based on the description, which can be found in Chapter 7 of the first edition of the book Algorithms: Construction and Analysis, by T. Kormen, C. Leiserson, and R. Rivest.

8. Hash functions , in comments to the implementation of which references are made to the works of Donald Knuth and to the research of Chuck Lever.

Donald Knuth recommends the use of multiplicative hashing of prime numbers, dividing a certain interval in accordance with the rule of the golden section. The upper bound of the interval is the maximum integer represented by the machine word. Chuck Lever confirms the effectiveness of this approach.

9. In some places, for example, in the code of this driver , own hash functions are used.

A hash function that uses a rotating hash algorithm (rotating hash). Chapter 6.4. third volume of "The Art of Programming" D. Knut.

10. Hash tables are used to implement inodes, to verify the integrity of the file system and in other cases.

11. Bit arrays are used to work with flags, interrupts, and so on. Details about them can be found in the fourth volume of "The Art of Programming" D. Knuth.

12. Semaphores , cyclic locks (spin lock).

13. Binary search is used in interrupt handling, cache searches and in other cases.

14. Binary search in B-trees (binary search with B-trees).

15. Depth search (depth first search) and a variant of the algorithm used in the configuration of the directory .

A modified depth search pass is performed on the namespace tree, starting (and ending) in the node specified by the start_handle parameter. The callback function is called whenever it is possible to find a node corresponding to the parameter specifying the type. If callback returns a non-zero value, the search is immediately terminated and this value is returned to where the search was originated from.

16. Bread search (first search) is used to check the correctness of a lock at runtime.

17. Merge sorting on linked lists is used in the garbage collection subsystem, to manage the file system and in other cases.

18. Sorting bubble . Surprisingly, it is also used.

19. Algorithm for searching substrings in the string Knut-Morris-Pratt .

Implemented Knut-Morris-Pratt string comparison algorithm, its execution time depends linearly on the input data. Details can be found in the book "Algorithms: Construction and Analysis" by T. Cormen, C. Lazerson, R. Rivest and C. Stein, in the section "Knut-Morris-Pratt Algorithm".

20. Algorithm for searching for a substring in the Boyer-Moore string with explanations and recommendations regarding possible alternatives.

The theoretical sources for the implementation of the Boyer-Moore algorithm are the following sources.

A Fast String Searching Algorithm, RS Boyer and Moore. Communications of the Association for Computing Machinery, 20 (10), 1977 .
» Handbook of Exact String Matching Algorithms, Thierry Lecroq, 2004 .

Data structures and algorithms in the Chromium web browser

Let's see what is interesting can be found in the code Chromium.

1. Oblique tree (splay tree).

The tree, among other things, is parameterized by the allocator. Memory for lists is allocated either in the accelerated access zone (implemented in the Zone class, pay attention to zone.h), or in the usual free space.

2. Voronoi diagrams are involved in the demo.

3. The Bresenham algorithm (Bresenham) is used in the tabs control subsystem.

And here are some algorithms and data structures that are in the third-party code included in Chromium.

4. Binary trees .

5. Red-black trees .

Julian Walker on red-black trees:

Red-black trees are an interesting topic. It is assumed that they are simpler than AVL-trees (their direct competitor), and at first glance it is. It's all about quick and easy insertion of new elements. However, if you delve into the removal of elements, the red-black trees present many surprises. At the same time, as opposed to additional difficulties, both the insertion and deletion of elements can be implemented so that they are performed in one pass.

6. AVL-trees (AVL tree).

7. The search algorithm for the Rabin-Karp string (Rabin-Karp) is used for data compression.

8. Calculate the number of words allowed by the acyclic finite state machine.

9. Bloom filter implemented by Apple Inc.

10. Algorithm Brezenham .

Algorithms in standard libraries of popular programming languages

We believe that it is useful to pay attention to library implementations of algorithms. The creators of programming languages, obviously, believe that it is worth the time and effort spent. In addition, such standard solutions, thoroughly tested, tested in practice by many developers, are usually preferable to similar self-written ones. Although, of course, it all depends on the needs of the programmer and the characteristics of a particular algorithm or data structure.

1. If we consider, for example, the languages C and Java, then in the first case more independent implementations of basic things can be found, in the second, thanks to the extensive standard API, this is less common. In the C ++ STL library, you can find the implementation of lists, stacks, queues, maps, vectors, and algorithms for sorting, searching, and working with a bunch.

2. Java API includes the implementation of a huge number of data structures and algorithms.

3. The Boost C ++ library contains algorithms like the search for a substring in the Boyer-Moore and Knut-Morris-Pratt line.

Algorithms for resource allocation and planning

These are interesting heuristic algorithms. The implementation of each of them depends on the needs of the system and can be based on different data structures.

1. The algorithm of extrusion of elements that have not been used the longest (Last Recently Used, LRU) can be implemented in various ways. The Linux kernel has a list-based implementation.

2. Among other similar algorithms are the following. This is the “First In - First Out” algorithm (First In First Out, FIFO), the algorithm for displacing the least frequently used elements (Last Frequently Used, LFU), the algorithm for distributing the load of a distributed computing system by iterating over and arranging its elements in a circular cycle (Round Robin, RR).

3. The VAX / VMS system used one of the FIFO variants.

4. The “hourly” algorithm (Clock) of Richard Carr has found application in Linux for organizing the replacement of memory pages.

5. The Intel i860 processor uses a random replacement policy.

6. Adaptive Replacement Cache (ARC) caching algorithm is used in some IBM data warehouse controllers and was used in PostgreSQL before a patent problem occurred.

7. The twins algorithm for memory allocation (buddy memory allocation), described by D. Knuth in the first volume of “The Art of Programming”, is used in the Linux kernel, in a parallel memory allocation system jemalloc, used in FreeBSD and in Facebook .

Basic utilities * nix-systems

1. The grep and awk utilities implement the Thomson-Mac-Noton-Yamada algorithm (Thompson-McNaughton-Yamada) for constructing non-deterministic finite automata based on regular expressions. This approach is even more efficient than implementing the appropriate mechanisms in Perl.

2. In tsort , a topological sort is used.

3. In the fgrep utility code, you can find a substring search algorithm in the Aho-Corasick line.

4. GNU grep utility implements the Boyer-Moore algorithm .

5. In Unix crypt (1) there is a variant of the encryption algorithm used in the Enigma machine.

6. The Unix diff utility created by Doug McIlroy, based on a prototype written in conjunction with James Hunt, has a higher performance than the standard dynamic programming algorithm used to calculate Levenshtein distance.

Compilers

1. Ascending parsing algorithm (Look-Ahead LR parser, LALR). Implemented in yacc and bison.

2. The dominators calculation algorithms (dominators) are used in most optimizing compilers based on SSA.

3. The lex and flex utilities compile regular expressions into NFA.

Image compression and error correction

1. Lempel-Ziv compression algorithms for graphic GIF format are implemented in various image processing applications - from simple * nix- convert utilities to much more complex software systems.

2. The run-length encoding algorithm (RLE) is used to create PCX files (used in the well-known Paintbrush), compressed BMP and TIFF files.

3. Wavelet compression is the basis of the incredibly common JPEG 2000 format. Every digital camera that can save images in JPEG 2000 format is a “carrier” of this algorithm.

4. The Reed-Solomon error correction algorithm is involved in the Linux kernel . In addition, it is used when storing information on CDs, in devices for reading bar codes. What can I say, it, along with the convolution algorithm, was used to transmit images from the spacecraft

The problem of the feasibility of Boolean formulas

Since 2000, the execution time of algorithms that solve the problem of the feasibility of Boolean formulas has steadily decreased. It's all about applying new approaches to solving such problems. In particular, we are talking about an algorithm for conflict-guided learning of clauses (Conflict Driven Clause Learning). It is a combination of the Boolean Constraint propagation algorithm of Boolean constraints from Davis, Logemann, Loveland, and the disjunct teaching method, which originates in constraint programming and artificial intelligence.

The applications of such algorithms (SAT solvers) are very extensive. For example, it is an automatic proof of theorems, testing hardware and software. IBM, Intel and many other companies have their own implementations of SAT-solvers. Many package managers also use a similar algorithm to resolve dependencies.

findings

There are a great many algorithms and examples of their practical application. I want to believe, our story about some of them convinced those who wondered: "Why is this all?", That the algorithms are worth it to spend time getting to know them. And if the algorithms and data structures are your long-time friends, we hope you have found something new and useful for yourself here.

Oh, and come to work with us? :)
wunderfund.io is a young foundation that deals with high-frequency algorithmic trading . High-frequency trading is a continuous competition of the best programmers and mathematicians of the whole world. By joining us, you will become part of this fascinating fight.

We offer interesting and challenging data analysis and low latency tasks for enthusiastic researchers and programmers. Flexible schedule and no bureaucracy, decisions are quickly made and implemented.

Join our team: wunderfund.io

Source: https://habr.com/ru/post/277143/

All Articles