Disadvantages of pure functional programming

From the author: the translation of the article “Functional programming is unpopular because it is strange” caused a heated discussion. In a few comments it was very fairly noted that when discussing the shortcomings of functional programming, it would be good to rely on modern and developed functional languages (in the original article the examples were on C ++ templates) and that Haskel, for example, has been widely used in the industry for the last five years. In this regard, I would like to draw attention to two very substantive articles from another blog (from the author of the book F # for Scientists ): (i) " Disadvantages of pure functional programming " and (ii) " Why Haskel is so little used in the industry ." I would like to present the translation of the first one below.

1. In pure functional languages, there is no effective unordered dictionary and set

Since the 90s, the use of dictionaries in development has broken all records. Dictionaries are now the standard collection that each developer expects to find in his standard library.

Purely functional or persistent data structures, such as those found in the well-known Okasaki monograph, are an excellent tool. In particular, the persistence they provide means that you can access old versions of collections without worrying about how collections are modified. In many cases (especially in tasks like logic programming or compiler writing) this can make decisions shorter and simpler — partly because it makes tracking changes trivial. But for persistence you have to pay a big price in terms of performance: purely functional dictionaries are usually about 10 times slower than well-implemented hash tables, and I have seen cases when the difference has reached 40 times. For some applications, this is prohibitively slow.
')
Moreover, most functional languages (OCaml, Haskell, Scala) are unable to express a fast, variable generic hash table, because they do not have a killer combination of reified generics , value types, and a quick write barrier for the garbage collector.

ATTENTION! Beware of people trying to assert that purely functional Haskell dictionaries are fast by comparing them with variable hash tables of the same Haskell . The correct conclusion from this comparison: these are Haskell's variable hash tables are slow.

2. There are no purely functional weak hash tables.

In an imperative language with garbage collection, the relationship between nodes and edges of a graph can be expressed using weak hash tables . The garbage collector will collect unreachable subgraphs for you. Since there are no purely functional weak hash tables, in a pure functional language you have to write your garbage collection (in the sense of translation, as I understand it, in the sense of deleting unreachable subgraphs with your hands).

However, this is not the most serious drawback, given that most developers have never used weak hash tables ...

3. There are no purely functional thread safe collections.

By definition, immutable collections cannot support mutability, including thread-safe. So, if you want a shared mutable collection (such as an in-memory database), then there is no purely functional solution.

4. Most of the algorithms on graphs look worse and work much slower if they are written in a functional style.

Functional programming is an excellent tool for some types of tasks, but, according to my observations, algorithms on graphs are a place where purely functional solutions are often worse in terms of simplicity of the code and in terms of performance.

Compare the Prima algorithm on 12 lines of Python and the Prima algorithm on 20 lines of Haskell . And why, in fact, Haskell uses the Prima algorithm in the library? Probably, because the Kruskal algorithm is based on the " disjoint set system " data structure ( union-find collection , aka Disjoint-set data structure), and in functional languages there is no effective implementation.

5. The inertia of traditional imperative data structures and algorithms is huge.

In addition to graph algorithms, there are many areas of computer science in which 65 years of scientific literature focused almost entirely on imperative solutions. Consequently, programmers in imperative languages can stand on the shoulders of giants and use these practices, while programmers in functional languages often have to start from scratch. After all, just a couple of years ago, memoization in Haskell was the topic of a thesis.

Once I suggested to several programmers on Haskell (and some of them had dissertations on this language) to write on it an effective parallel quick sort (quicksort) —and this is what came of it .

6. As it turns out, all existing implementations of functional languages — both clean and unclean (impure) —were designed to allocate too much memory.

Around 1960, McCarthy invented Lisp. The basic data structure was a single-linked list. Each element of the list was allocated separately in a heap. All modern functional languages have evolved from this idea. In the 70s, Scheme used essentially the same data presentation strategy as Lisp. In the 80s, SML added a bit of unboxing due to the inclusion of tuples in the language as whole blocks of memory. In the 90s, OCaml added some more unpacking due to the inclusion of real numbers in the language of the unpacked arrays. Haskell added the ability to decompress certain data types. But to this day, not a single functional language has unpacked tuples by default (for some reason, the author has chosen for some reason such a way to say “tuples located on the stack by default”). Even F #, based on .Net, in which any value types can be created, still uses packed .Net tuples. Consequently, all modern functional languages bear the burden of frequent memory allocations in a heap without any need for that. Therefore, they load their garbage collectors much more than necessary. This is a serious problem, not only because it makes single-threaded code slow, but also because the garbage collector is a shared resource and, as a result, the load on the garbage collector leads to poor scalability of parallel programs.

It should be noted that declaring such behavior a “flaw” can be controversial. Xavier Leroy from the OCaml community believes that the presentation of data in OCaml in the image of Lisp is an advantage because it is the basis of the excellent OCaml performance in the environment for the automatic proof of the Coq theorem (Coq). Xavier states that "the Ocaml strategy for symbolic computing is close to optimal." Functional languages are often optimized for high-performance purely functional collections due to the poor performance of imperative collections. Taking into account the fact that imperative collections are mostly faster, this leads to an artificially lowered ceiling for the performance of almost all functional languages.

7. Pure functional programming is good for parallelism in theory, but not very good in terms of performance in practice, and high performance in practice is the only real task of parallelism.

Today there are two reasons to write parallel programs. The first is to write objectively quick solutions. The second is to make slow decisions not so slow. In most cases, parallel programming in a functional language is a variation of the second reason. Almost no one in the high performance computing environment, that is, on supercomputers, runs the function code directly. If a developer in a functional language parallelizes a program, in most cases he does this not to achieve excellent absolute performance, but to improve the speed he has.

Pure functional languages such as Haskell designed to hide from the programmer work with memory and time. This gives the programmer a bird's-eye view of the program, but it makes it very difficult to estimate how much memory the program will have on Haskell and how long it will run before it gets the result. In parallel programming, it is always very important to make sure that the gain from parallelization will outweigh the administrative costs of parallel code execution. In Haskell it is very hard. In fact, it is so hard that a published study on parallel execution of Haskell provides very selective results on the degree of parallelization, which maximizes performance, despite the fact that this degree cannot be predicted in advance without turning the program many times. In my experience, in languages like C ++, parallelization by the most standard methods often gives predictable speed gains, in contrast to Haskell , where performance is not predictable.

ATTENTION! Beware of people who talk about program scalability without absolute performance. You can improve the scalability of almost any parallel program by meaninglessly calculating the Mandelbrot sets after each line of code for no reason, because most of the computations will occur in incredibly parallel code. Scalability is a necessary but not sufficient condition. It is necessary to look at the absolute performance as well.

8. It took 50 years to dilute the concentration of arrogant snobs in the community to the extent that you can get a useful answer on functional programming

I have been doing functional programming for more than 20 years. For decades, there was a social gulf between programmers in functional languages and those who solved real problems. Thank God, this problem began to dissolve at the same time as functional languages like Scala, Clojure and F # began to be used for real tasks; but for many years it was the arrogant snobs that prevailed in the functional programming community, which made it very difficult to get real solutions for real problems. The reason for this was that many communities (especially Lisp), consigned to oblivion and left to themselves for decades, had very developed (but wrong) arguments about why their language (Lisp) is so good. It took me many years to understand what is wrong with these arguments.

9. A lot of false information circulates about functional programming.

If you criticize the performance of hash tables in Haskell (more recent criticism here ), then you can get absolutely wild advice from the luminaries of the community — for example, they can literally advise you to simply disable garbage collection.

For a long time, the functional programming community has been shocked by remarkably short implementations of the Eratosthenes and quicksort sieve algorithms. They have even been taught to students for years. And only after many years came the understanding that these implementations do not correspond to the original algorithms. Melissa O'Neill even published a scientific article correcting the Eratosthenes sieve in Haskell. In particular, its real sieve requires a hundred times more code than the erroneous original version. The same is true for quicksort, where the “elegant” two-line version on Haskell is more than 1000 times slower than the Sedgwick C version, because Haskell makes a deep copy of the lists on each quicksort call, spoiling the asymptotic complexity of the original Hoare algorithm .

See also " Why Haskel is so little used in the industry ", where the page " Haskel in the industry " is refuted in detail.

From the author: the article “Why Haskel is so little used in the industry” I also hope to translate someday, because she is trying to refute the view that Haskel has been widely used in the industry for the last five years. But it should be noted that (judging by the comments to the original version in English) it is not so straightforward and the author himself seems to be exaggerating in some places.

Source: https://habr.com/ru/post/303984/

All Articles