Great review testing of programming languages

Recently, once again, I worked with the 2nd year students on a 2-semester discipline "Algorithmic Languages". A survey of several dozen programming languages. One of the students, Vadim Shukalyuk, wanted to get to know them better, to get a clearer picture of each of them. He advised him to do a little research. Than and carried away. I offer my report on the work done in a few months with him.

Each programming language has its advantages and disadvantages. One of the most important characteristics of a translator from any language is the speed of program execution. It is very difficult or even impossible to get an accurate estimate of this speed of execution. The resource http://benchmarksgame.alioth.debian.org/ offers a game form to test this speed for various tasks. But the number of languages represented on this resource is rather small. The maximum capacity of the stack, the critical value for recursive calculations is easier to check, but it can vary in different versions of the translator and be dependent on the system settings.

The following translators were tested: C (gcc, clang, icc), assembler (x86, x86-64), Java (OpenJDK), Pascal (fpc), Javascript (Google Chrome, Mozilla Firefox), Lisp (sbcl, clisp), Erlang, Haskel (ghc, hugs), dino [1], auk (gawk, mawk, busybox), lua, ruby, basic (gambas, libre office), python-2, pi-ah-pi, postscript (gs), prologue ( swipl, gprolog), pearl, metapost, T _E X, tickle, bash. We investigated the actual execution speed of several small, but time-consuming algorithms, and:
')

quality optimization of some translators;
features when working with Intel and AMD processors;
limit number of recursive calls (stack capacity).

As the first task on which all the translators were tested, the calculation of the Fibonacci number by double recursion was chosen according to the definition: the numbers with numbers 1 and 2 are units, and the next ones are the sum of the two previous ones. This algorithm has several attractive features:

If the calculation time of the nth number t, then (n + 1) -th - t * φ, where φ is the golden ratio equal to (√ 5 + 1) / 2;
The number itself calculated by ne is equal to the value φ ⁿ / √5 rounded to the nearest integer;
Calculating fib (n + 1) requires the nth call nesting.

The first feature allows for a short time to test translators, the speed of which differ hundreds of thousands of times. The second feature allows you to quickly check the correctness of the calculations. The third feature theoretically allows you to explore the capacity of the stack, but due to the fact that the calculation at n> 50 becomes very slow even on a supercomputer, it is not possible to practically use this feature.

The following table 1 in the second column indicates the name of the language, the name of the compiler and its version and, if used, the option to optimize the generated code. The third column shows the relative computation time on an AMD Phenom II x4 3.2 GHz processor. Tests were conducted on the AMD FX-6100 at the same frequency, but their results differ little from those given. The calculation time in the Bash language is taken as a unit, so the calculation on the Erlang is about 20,000 times faster than Bash. The 4th column shows the relative computation time on an Intel Core i3-2100 3.1 GHz processor. Since the comparison of processors was not the purpose of the study, part of the translators were not tested on the Intel platform. In the fifth, the upper estimate (accuracy 10%) of the maximum number of recursive calls supported by the compiler when calculating ack (1,1, n) on a computer with 8 GB of RAM with a system stack size (ulimit -s) is 8192 KB. Some translators use their own settings, which determine the size of the stack used - the default values for the selected version of the translator are always used. Measurements were carried out in the Linux system, but their results should not change when switching to another OS. The data is sorted by the 3rd column. All sources can be viewed here .

Table 1

N	Tongue	AMD	Intel	Stack
one	C / C ++ (gcc 4.7.2, -O5)	354056	493533	790000
2	C / C ++ (clang 3.0-6.2, -O3)	307294		270000
3	C / C ++ (icc 14.0.3, -fast)	250563	232665	530000
four	Assembler x86-64	243083	271443	350000
five	Assembler x86	211514	301603	700,000
6	Java (OpenJDK 1.7.0_25)	186401	239659	8,000
7	Pascal (fpc 2.6.0, -O3)	170604	186401	180000
eight	C / C ++ (gcc 4.7.2, -O0)	159672	173261	180000
9	C / C ++ (clang 3.0-6.2, -O0)	146726		110000
ten	C / C ++ (icc 14.0.3, -O0)	136862	156602	530000
eleven	Javascript (Mozilla Firefox 25)	121979		4200
12	Javascript (Google Chrome 31)	92850		10,000
13	Lisp (sbcl 1.0.57)	54925	51956	31,000
14	Erlang (5.9.1)	19845	18589	there is no limit
15	Haskell (ghc 7.4.1, -O)	18589	22946	260000
sixteen	Awk (mawk 1.3.3)	6621	6306	44000
17	Lua (5.2)	6420	7075	150000
18	Ruby (1.9.3)	5297	6969	6600
nineteen	Dino (0.55)	5024	6420	190000
20	Basic (Gambas 3.1.1)	3968	4373	26,000
21	Python (2.7.3)	3678	4013	1000
22	PHP (5.4.4)	2822	3720	there is no limit
23	Awk (gawk 4.0.1)	2648	2547	there is no limit
24	Postscript (gs 9.05)	2355	3246	5000
25	Prolog (swipl 5.10.4)	1996	2407	2300000
26	Perl (5.14.2)	1516	1670	there is no limit
27	Prolog (gprolog 1.3.0)	1116	1320	120,000
28	Lisp (clisp 2.49)	998	1023	5500
29	Awk (busybox 1.20.2)	981	1113	18,000
thirty	T _E X (3.1415926)	239	333	3400
31	Metapost (1.504)	235	470	<4100
32	Tcl (8.5)	110	123	1000
33	Haskell (hugs 98.200609.21)	82	121	17,000
34	Basic (LibreOffice 3.5.4.2)	20	35	6500
35	bash (4.2.37)	one	0.77	600

The second task is the Ackermann function in the form, when all arithmetic operations are reduced to it, i.e. ack (1, x, y) = x + y, ack (2, x, y) = x * y, ack ( 3, x, y) = x ^y , ack (4, x, y) - tetration of x and y, etc.

With the growth of n, this function grows very quickly (the number ack (5,5,5) is so large that the number of digits in the order of this number many times exceeds the number of atoms in the observable part of the Universe), but it is considered very slowly. The latter property is theoretically convenient for testing performance. However, the calculation of this function requires a significant number of recursive calls and most of the tested languages were not able to support them for calculations that have a noticeable duration. It is known that the calculation of this function cannot be reduced to iteration. The calculation for this problem allowed us to investigate the maximum capacity of the stack of the studied languages: calculating ack (1,1, n-1) requires the n-th call nesting and is very fast. The following table 2 presents the results of the calculation of the ack (5,2,3) pentation, for those languages whose stack could withstand it (call nesting 65539) withstand. Per unit of speed, the operating time of gcc with the option -O5 is chosen, that is, php is about 420 times slower.

Table 2

gcc -O5	one
asm x86	2.15
icc -fast	2.18
asm x86-64	2.36
clang -O3	2.76
fpc -O3	4.44
gcc -O0	7.75
icc -O0	8.36
clang -O0	9.64
Erlang	18.51
ghc -O	50.18
lua	122.55
php	423.64
gawk	433.82
swipl	766.55
dino	915.64

The idea of using these two problems was borrowed from the work of B. V. Kernigan and R. Paik “Unix - Universal Programming Environment”, where it was used to test the hoc language [2].

Of course, with more complex calculations, using mostly standard library tools, the difference in the speed of the translators would be much smaller.

Time was measured by the standard time command, and when it was impossible (javascript, office BASIC), tools built into the language were used.

According to the study, the following conclusions were made, some of which turned out to be somewhat unexpected:

The speed of the programs on the assembler may be more than 50% slower than programs in C / C ++ compiled with maximum optimizations;
The speed of a virtual Java byte-code machine often exceeds the speed of hardware with codes received by translators from high-level languages. Java machine is inferior in speed only to assembler and the best optimizing translators;
The compilation and execution speed of javascript programs in popular browsers is only 2-3 times lower than the best translators and even surpasses some quality compilers, certainly much more (more than 10 times) overtaking most translators of other scripting languages and similar programs;
The speed of the codes generated by the Intel C language compiler turned out to be noticeably lower than the GNU compiler and sometimes LLVM;
The speed of x86-64 assembler codes may be less than that of similar x86 codes, by about 10%;
Code optimization works best on an Intel processor;
The speed of execution on an Intel processor was almost always higher, with the exception of the languages Lisp, Erlang, Auk (gawk, mawk) and Bash. The difference in speed according to Bash is most likely caused by different environment settings on the systems under test, and not the translator itself or the hardware itself. Intel's advantage is especially noticeable on 32-bit codes;
The stack of most tested languages, in particular, Java and javascript, support only a very limited number of recursive calls. Some translators (gcc, icc, ...) allow you to increase the size of the stack by changing the variables of the runtime environment or parameter;
In the considered versions of gawk, php, perl, bash, a dynamic stack is implemented, allowing you to use all the computer's memory. But perl and, especially, bash use the stack so extensively that 8-16 GB is not enough to calculate ack (5,2,3). In version 5.4.20, the php stack was limited to about 200,000 calls.

In conclusion, a few words from a student starting to master the art of programming.

To write programs for the required calculations in any language, it is first necessary to understand how variables are declared in a particular language, how to build an if-else type construct, and how to organize recursion. I began my work with a simple Pascal language, because at that time I knew it better than anyone. After Pascal, I took on C, Java and Dino, since their syntax is about the same. S turned out to be quite interesting, simple, and at the same time with intuitive operators. Java seemed less convenient than C / C ++ - you need to write a lot of irrelevant things, such as could be taken by default. Also strained the moment the need for the same names of the class and file. Haskell left only positive emotions. Convenient, clear and powerful. PHP, a language for developing web applications, is very similar to C: you can simply paste code on C with minimal changes and everything will work as it should. Erlang is similar in syntax to Haskell and a bit to Prolog. Also quite pleasant and understandable language, no difficulties have arisen. JavaScript syntax is similar to Java or C syntax. Visual Basic in both the office and GAMBAS versions has a somewhat awkward and inconvenient syntax, but in general, it was not very difficult with it. Then, after acquiring knowledge of the base syntax C and Java, it came out pretty quickly to write code in Python, since Python is similar to C. No problems arose with Lua and its fairly powerful and flexible designs. Awk also has a similar structure with C, and managed to master it pretty quickly. There were some difficulties with a lisp, like a person who had previously studied C-like languages, for example, with a basic understanding of prefix notation. Which, after a small expenditure on development, seemed very convenient, logical and beautiful. After that, I switched to the Prolog logic programming language, which turned out to be specific, but very interesting and fundamental. Ruby - a language with powerful support for object-oriented programming and with a very beautiful bright red ruby on the icon turned out to be an excellent language: no extra brackets, semicolons, or other unnecessary characters. One of the most memorable. Although the python, not counting the designs of the PLO, no less concise. Perl - although it is called the “pearl”, the symbol of the language is the camel, which apparently is a reference to the fact that the camel is not very beautiful, but very hardy animal, able to perform hard work. After Ruby again, putting dollars, brackets and semicolons was not very pleasant. The syntax is in some places similar to the syntax of the Bash terminal shell language. Then I took the assembler. There were certain difficulties and the need to understand the operation of the processor and its registers. There was no limit to surprise when it turned out that C copes with calculations faster than assembler, machine code! There were no problems with Bash, even though you need to put a lot of dollars there, and when calculating and brackets. The Metapost / Metafont language caused some problems - only numbers that are not larger than 4096 are supported there. Although its syntax is quite traditional. The tickle (TCL) is also quite a traditional syntax, but line-oriented is also similar to bash semantics at first very confusing. PostScript seemed the most difficult. In this language, the syntax is very specific and without preparation, it is impossible to write anything intuitively, so I had to study the relevant literature and begin to train with the simplest programs. PostScript was a real challenge: writing a double recursion with a postfix notation using only the stack, after getting used to all the tools and capabilities of Ruby and C was problematic. Writing and testing Ackermann's postscript is like trying to paint a wall with a toothbrush. But the first place in complexity is definitely occupied by T _E X. I have not seen anything more difficult. And without the direct help of a teacher, it would not have been possible to overcome this language.

Curious were data on the size of the stack of languages. The larger the stack of language, the greater the likelihood that he will be able to cope with the function of Akkerman. But if a program in some language could not cope with the calculation of ack (5,2,3), this does not mean that the language is bad and inconvenient. It is likely that this language could be created for other useful purposes, such as Metapost or Postscript.

In general, the work seemed to me very interesting and super-cognitive, for example, writing the same logical turn in 20 different ways. Also, understanding the principle of the processor registers and writing a double recursive function only with the help of a stack and three operations: add, delete and scroll the stack greatly expanded my horizons.

To the teacher, some of the conclusions of his student seemed too categorical, but he decided to keep them as more fresh than their own.

[1] - A programming language developed in Russia. Now its author is an employee of the Red Hat company, working to improve the collection of gcc compilers. The latest version of Dino was released in 2007.
[2] - This language was also tested, showing a speed result between mawk and ghc.

Source: https://habr.com/ru/post/254121/

All Articles

Great review testing of programming languages

More articles: