📜 ⬆️ ⬇️

Great review testing of programming languages

Recently, once again, I worked with the 2nd year students on a 2-semester discipline "Algorithmic Languages". A survey of several dozen programming languages. One of the students, Vadim Shukalyuk, wanted to get to know them better, to get a clearer picture of each of them. He advised him to do a little research. Than and carried away. I offer my report on the work done in a few months with him.

Each programming language has its advantages and disadvantages. One of the most important characteristics of a translator from any language is the speed of program execution. It is very difficult or even impossible to get an accurate estimate of this speed of execution. The resource http://benchmarksgame.alioth.debian.org/ offers a game form to test this speed for various tasks. But the number of languages ​​represented on this resource is rather small. The maximum capacity of the stack, the critical value for recursive calculations is easier to check, but it can vary in different versions of the translator and be dependent on the system settings.

The following translators were tested: C (gcc, clang, icc), assembler (x86, x86-64), Java (OpenJDK), Pascal (fpc), Javascript (Google Chrome, Mozilla Firefox), Lisp (sbcl, clisp), Erlang, Haskel (ghc, hugs), dino [1], auk (gawk, mawk, busybox), lua, ruby, basic (gambas, libre office), python-2, pi-ah-pi, postscript (gs), prologue ( swipl, gprolog), pearl, metapost, T E X, tickle, bash. We investigated the actual execution speed of several small, but time-consuming algorithms, and:
')


As the first task on which all the translators were tested, the calculation of the Fibonacci number by double recursion was chosen according to the definition: the numbers with numbers 1 and 2 are units, and the next ones are the sum of the two previous ones. This algorithm has several attractive features:

  1. If the calculation time of the nth number t, then (n + 1) -th - t * φ, where φ is the golden ratio equal to (√ 5 + 1) / 2;
  2. The number itself calculated by ne is equal to the value φ n / √5 rounded to the nearest integer;
  3. Calculating fib (n + 1) requires the nth call nesting.

The first feature allows for a short time to test translators, the speed of which differ hundreds of thousands of times. The second feature allows you to quickly check the correctness of the calculations. The third feature theoretically allows you to explore the capacity of the stack, but due to the fact that the calculation at n> 50 becomes very slow even on a supercomputer, it is not possible to practically use this feature.

The following table 1 in the second column indicates the name of the language, the name of the compiler and its version and, if used, the option to optimize the generated code. The third column shows the relative computation time on an AMD Phenom II x4 3.2 GHz processor. Tests were conducted on the AMD FX-6100 at the same frequency, but their results differ little from those given. The calculation time in the Bash language is taken as a unit, so the calculation on the Erlang is about 20,000 times faster than Bash. The 4th column shows the relative computation time on an Intel Core i3-2100 3.1 GHz processor. Since the comparison of processors was not the purpose of the study, part of the translators were not tested on the Intel platform. In the fifth, the upper estimate (accuracy 10%) of the maximum number of recursive calls supported by the compiler when calculating ack (1,1, n) on a computer with 8 GB of RAM with a system stack size (ulimit -s) is 8192 KB. Some translators use their own settings, which determine the size of the stack used - the default values ​​for the selected version of the translator are always used. Measurements were carried out in the Linux system, but their results should not change when switching to another OS. The data is sorted by the 3rd column. All sources can be viewed here .

Table 1
NTongueAMDIntelStack
oneC / C ++ (gcc 4.7.2, -O5)354056493533790000
2C / C ++ (clang 3.0-6.2, -O3)307294270000
3C / C ++ (icc 14.0.3, -fast)250563232665530000
fourAssembler x86-64243083271443350000
fiveAssembler x86211514301603700,000
6Java (OpenJDK 1.7.0_25)1864012396598,000
7Pascal (fpc 2.6.0, -O3)170604186401180000
eightC / C ++ (gcc 4.7.2, -O0)159672173261180000
9C / C ++ (clang 3.0-6.2, -O0)146726110000
tenC / C ++ (icc 14.0.3, -O0)136862156602530000
elevenJavascript (Mozilla Firefox 25)1219794200
12Javascript (Google Chrome 31)9285010,000
13Lisp (sbcl 1.0.57)549255195631,000
14Erlang (5.9.1)1984518589there is no limit
15Haskell (ghc 7.4.1, -O)1858922946260000
sixteenAwk (mawk 1.3.3)6621630644000
17Lua (5.2)64207075150000
18Ruby (1.9.3)529769696600
nineteenDino (0.55)50246420190000
20Basic (Gambas 3.1.1)3968437326,000
21Python (2.7.3)367840131000
22PHP (5.4.4)28223720there is no limit
23Awk (gawk 4.0.1)26482547there is no limit
24Postscript (gs 9.05)235532465000
25Prolog (swipl 5.10.4)199624072300000
26Perl (5.14.2)15161670there is no limit
27Prolog (gprolog 1.3.0)11161320120,000
28Lisp (clisp 2.49)99810235500
29Awk (busybox 1.20.2)981111318,000
thirtyT E X (3.1415926)2393333400
31Metapost (1.504)235470<4100
32Tcl (8.5)1101231000
33Haskell (hugs 98.200609.21)8212117,000
34Basic (LibreOffice 3.5.4.2)20356500
35bash (4.2.37)one0.77600


The second task is the Ackermann function in the form, when all arithmetic operations are reduced to it, i.e. ack (1, x, y) = x + y, ack (2, x, y) = x * y, ack ( 3, x, y) = x y , ack (4, x, y) - tetration of x and y, etc.



With the growth of n, this function grows very quickly (the number ack (5,5,5) is so large that the number of digits in the order of this number many times exceeds the number of atoms in the observable part of the Universe), but it is considered very slowly. The latter property is theoretically convenient for testing performance. However, the calculation of this function requires a significant number of recursive calls and most of the tested languages ​​were not able to support them for calculations that have a noticeable duration. It is known that the calculation of this function cannot be reduced to iteration. The calculation for this problem allowed us to investigate the maximum capacity of the stack of the studied languages: calculating ack (1,1, n-1) requires the n-th call nesting and is very fast. The following table 2 presents the results of the calculation of the ack (5,2,3) pentation, for those languages ​​whose stack could withstand it (call nesting 65539) withstand. Per unit of speed, the operating time of gcc with the option -O5 is chosen, that is, php is about 420 times slower.

Table 2
gcc -O5one
asm x862.15
icc -fast2.18
asm x86-642.36
clang -O32.76
fpc -O34.44
gcc -O07.75
icc -O08.36
clang -O09.64
Erlang18.51
ghc -O50.18
lua122.55
php423.64
gawk433.82
swipl766.55
dino915.64


The idea of ​​using these two problems was borrowed from the work of B. V. Kernigan and R. Paik “Unix - Universal Programming Environment”, where it was used to test the hoc language [2].

Of course, with more complex calculations, using mostly standard library tools, the difference in the speed of the translators would be much smaller.

Time was measured by the standard time command, and when it was impossible (javascript, office BASIC), tools built into the language were used.

According to the study, the following conclusions were made, some of which turned out to be somewhat unexpected:

  1. The speed of the programs on the assembler may be more than 50% slower than programs in C / C ++ compiled with maximum optimizations;
  2. The speed of a virtual Java byte-code machine often exceeds the speed of hardware with codes received by translators from high-level languages. Java machine is inferior in speed only to assembler and the best optimizing translators;
  3. The compilation and execution speed of javascript programs in popular browsers is only 2-3 times lower than the best translators and even surpasses some quality compilers, certainly much more (more than 10 times) overtaking most translators of other scripting languages ​​and similar programs;
  4. The speed of the codes generated by the Intel C language compiler turned out to be noticeably lower than the GNU compiler and sometimes LLVM;
  5. The speed of x86-64 assembler codes may be less than that of similar x86 codes, by about 10%;
  6. Code optimization works best on an Intel processor;
  7. The speed of execution on an Intel processor was almost always higher, with the exception of the languages ​​Lisp, Erlang, Auk (gawk, mawk) and Bash. The difference in speed according to Bash is most likely caused by different environment settings on the systems under test, and not the translator itself or the hardware itself. Intel's advantage is especially noticeable on 32-bit codes;
  8. The stack of most tested languages, in particular, Java and javascript, support only a very limited number of recursive calls. Some translators (gcc, icc, ...) allow you to increase the size of the stack by changing the variables of the runtime environment or parameter;
  9. In the considered versions of gawk, php, perl, bash, a dynamic stack is implemented, allowing you to use all the computer's memory. But perl and, especially, bash use the stack so extensively that 8-16 GB is not enough to calculate ack (5,2,3). In version 5.4.20, the php stack was limited to about 200,000 calls.


In conclusion, a few words from a student starting to master the art of programming.

To write programs for the required calculations in any language, it is first necessary to understand how variables are declared in a particular language, how to build an if-else type construct, and how to organize recursion. I began my work with a simple Pascal language, because at that time I knew it better than anyone. After Pascal, I took on C, Java and Dino, since their syntax is about the same. S turned out to be quite interesting, simple, and at the same time with intuitive operators. Java seemed less convenient than C / C ++ - you need to write a lot of irrelevant things, such as could be taken by default. Also strained the moment the need for the same names of the class and file. Haskell left only positive emotions. Convenient, clear and powerful. PHP, a language for developing web applications, is very similar to C: you can simply paste code on C with minimal changes and everything will work as it should. Erlang is similar in syntax to Haskell and a bit to Prolog. Also quite pleasant and understandable language, no difficulties have arisen. JavaScript syntax is similar to Java or C syntax. Visual Basic in both the office and GAMBAS versions has a somewhat awkward and inconvenient syntax, but in general, it was not very difficult with it. Then, after acquiring knowledge of the base syntax C and Java, it came out pretty quickly to write code in Python, since Python is similar to C. No problems arose with Lua and its fairly powerful and flexible designs. Awk also has a similar structure with C, and managed to master it pretty quickly. There were some difficulties with a lisp, like a person who had previously studied C-like languages, for example, with a basic understanding of prefix notation. Which, after a small expenditure on development, seemed very convenient, logical and beautiful. After that, I switched to the Prolog logic programming language, which turned out to be specific, but very interesting and fundamental. Ruby - a language with powerful support for object-oriented programming and with a very beautiful bright red ruby ​​on the icon turned out to be an excellent language: no extra brackets, semicolons, or other unnecessary characters. One of the most memorable. Although the python, not counting the designs of the PLO, no less concise. Perl - although it is called the “pearl”, the symbol of the language is the camel, which apparently is a reference to the fact that the camel is not very beautiful, but very hardy animal, able to perform hard work. After Ruby again, putting dollars, brackets and semicolons was not very pleasant. The syntax is in some places similar to the syntax of the Bash terminal shell language. Then I took the assembler. There were certain difficulties and the need to understand the operation of the processor and its registers. There was no limit to surprise when it turned out that C copes with calculations faster than assembler, machine code! There were no problems with Bash, even though you need to put a lot of dollars there, and when calculating and brackets. The Metapost / Metafont language caused some problems - only numbers that are not larger than 4096 are supported there. Although its syntax is quite traditional. The tickle (TCL) is also quite a traditional syntax, but line-oriented is also similar to bash semantics at first very confusing. PostScript seemed the most difficult. In this language, the syntax is very specific and without preparation, it is impossible to write anything intuitively, so I had to study the relevant literature and begin to train with the simplest programs. PostScript was a real challenge: writing a double recursion with a postfix notation using only the stack, after getting used to all the tools and capabilities of Ruby and C was problematic. Writing and testing Ackermann's postscript is like trying to paint a wall with a toothbrush. But the first place in complexity is definitely occupied by T E X. I have not seen anything more difficult. And without the direct help of a teacher, it would not have been possible to overcome this language.

Curious were data on the size of the stack of languages. The larger the stack of language, the greater the likelihood that he will be able to cope with the function of Akkerman. But if a program in some language could not cope with the calculation of ack (5,2,3), this does not mean that the language is bad and inconvenient. It is likely that this language could be created for other useful purposes, such as Metapost or Postscript.

In general, the work seemed to me very interesting and super-cognitive, for example, writing the same logical turn in 20 different ways. Also, understanding the principle of the processor registers and writing a double recursive function only with the help of a stack and three operations: add, delete and scroll the stack greatly expanded my horizons.

To the teacher, some of the conclusions of his student seemed too categorical, but he decided to keep them as more fresh than their own.



[1] - A programming language developed in Russia. Now its author is an employee of the Red Hat company, working to improve the collection of gcc compilers. The latest version of Dino was released in 2007.
[2] - This language was also tested, showing a speed result between mawk and ghc.

Source: https://habr.com/ru/post/254121/


All Articles