📜 ⬆️ ⬇️

More than java?

Having finished another Java project, I tried to understand the reasons for the accumulated irritation. Yes, I love Java and all that, but ... There are several "but" that annoy. You have to write quite a lot of sample code, which the compiler itself can easily handle, the IDE, of course, helps out, but this is not a solution to the problem, but rather a crutch: if something has changed, you need to regenerate and clean it, etc. Checks on null! This is a toothache, in an amicable way, you should always make them so as not to run into an "unexpected" in the form of a NullPointerException at the most inopportune moment. In short, there was a desire to see what else appeared in nature and whether it could replace something with Java. Then it makes sense to describe the participants in this comparison. I must say at once that I do not pretend to complete the analysis, unfortunately, I had too little time to properly get acquainted with each language.

Mandatory requirements for applicants who I had:

• General language
• Cross-platform (at least Windows / Linux)
• Stability
• Static typing
• Automatic memory cleaning
• Support for a fully-fledged object-oriented paradigm
• Good support in IDE (Eclipse, IDEA or at the worst NetBeans)
• Non-hemorrhoid access to existing frameworks / libraries
• Java level performance

In addition, I would very much like to:
')
• Null pointer security provided by the language at compile time
• Simple type conversions without rudiments from C
• Possibility of reasonable operator overload
• To, finally, were cleaned basic types. Well, on the fig in a new language from bearded times all sorts of float / double, short / int / long? Maybe it made sense at the time of i386 processors, which could have a math coprocessor, but could not. All these subtleties can live in the core of the operating system closer to the hardware and do not lean out to the top. Base Entities: Integer and Float, that's all it should be, let the parts drive the compiler.

• Harmonious coexistence lambda
• Overall thoughtful and holistic language design
• Expressiveness, smaller ASCII art and mysterious structures that you want to sprinkle with holy water
• Normal strings, in which characters are not represented by surrogate pairs or a scattering of bytes, but by full Unicode points.

I should note that a whole layer of languages ​​that are compiled into native binaries or C did not get into the review. Perhaps they are wonderful, but having tried several options such as D, Nim, Eiffel, Rust, Go, and something else I came to concludes that I will not delve into these studies. The check was very simple - HelloWord. Binary output is usually measured in megabytes. Those. a small program that prints a dull line to the console and should take 3-4KB of a megabyte on the power! You say a trifle. Yes, trifle, I understand, the creators of the language cram into binar all the power and wealth of their offspring, but how could you somehow optimize it !? If only printing is required, why bubble up everything in the world? Well, make the minimum necessary part in the static library, and everything else is dynamic. If I'm not mistaken, the creators of D and Rust are positioning these languages ​​suitable for writing the kernel of the operating system. Are you serious guys? But all this “beauty”, megabytes of auxiliary language code, with a bunch of Window / Linux system calls, are you also proposing to pull into the kernel? In general, this class of languages ​​did not fit into my demand for “cross-platform” and “stability” (well, with “automatic cleaning of memory” and “object-orientation” everything is not always smooth, the interfaces will forget to add, instead of classes, some doubtful alternatives are offered) they contain a very fat layer between the language and the operating system, and the quality and stability of this code raises many questions. Maybe he is very good, I do not know, and if not? Still, JVM and Java SDK are tested for years and work fine on different operating systems. Therefore, all languages ​​considered further in one way or another rely on the JVM. We now turn to the list of our participants, with a small description, or rather, with my subjective impressions, you should not take them too seriously.

C # (MSVC2015) - entered the review only because I had a chance to write a considerable number of lines on it, and I didn’t find any advantages over Java. Quite the contrary. In addition, with cross-platform everything is not too rosy here, the native Windows platform for it, and mono does not provide a full replacement for Linux (see below). But it was interesting for me to compare it in terms of performance with other participants.

Qt / C ++ (5.10) - was also added solely to compare performance.

Ceylon (1.3.3) - this wonderful language was created in 2011 by Gavin King, the author of the famous Java Hibernate framework and is now supported by the RedHat community. The language is very beautiful, thought out, and expressive. Probably, Java would have looked like this if it had developed without oppression of backward compatibility. The language includes all of my "Wishlist" listed above, and even more. I really became a fan of this language and spent most of my time studying it, unfortunately, there is not very much information on it, and the community is not too large and active. To my taste, the language is almost perfect, but I would like to mention the shortcomings. There are plugins for Eclipse / IDE, they are done quite well, but the optimization is not at the highest level, in short, they are shamelessly inhibited. Running tests (there is a special framework for testing) may take 7 (!) Seconds, and the tests themselves are performed instantly, but the start is very slow.

Most likely, this is due to the second drawback - the modularity system embedded in the language, no, I do not want to say that modularity itself is a disadvantage, I just meant that the specific things built into the language become unopposed and this is bad. So in Ceylon, the modules have their format (.car files), and this is all based on the JBoss module system. For some reason, the creators of the language refused to 'protected', which makes it impossible to implement, for example, such a design pattern as the "Template Method".

Scala (2.12) - it seems that the creators of the language were guided, basically, by the principle “let's do everything not like people”. Instead of the usual * done _, why? Apparently, "so that no one knew" (c). Everyone knows that an array is usually [], so let's do () and let the brain take it out. And what then will [] mean? And let it be generics? Let's ... And so on. What do you think means a <- b? I had thought, I suppose, cloning, or shove everything from collection b into collection a, but no, this is a crawl of the collection ...

And why? Yes, because it is Scala, because they are from Tau Ceti, their blood is purple and their brains work differently, in short, they are different ... They took and sewed into the XML language ... Apparently, go ahead and embed directly into the Microsoft Office language, why not? Strictly add the xls file format to the standard language ... At first, all this causes irritation (also stupid names of standard classes - StringOps as you?), Someone will spit and say, and well, this rock. And adherents shrug their shoulders with contempt, they say, do not let the sivolapy climb into our microcircuits, since you don’t understand a damn, we are the elite here. But seriously, ordinary language, without revelations, unfortunately, there is no protection against the null pointer, only optional. There are slippery places and traps for beginners, I'm not sure that this is a good solution for collective creativity.

Kotlin (1.2) - reminded me of the nest of the magpie: here is a silver spoon, here is a candy from candy ... The motto of the guys, apparently, was the song "I blinded you from what it was." Take time-tested solutions from other languages ​​and combine, why not? The creators of the language call it a pragmatic approach. Only sometimes, it seems to me, they were let down by a lack of taste or something (or feelings of “beautiful”, for a second). Looking at such a game as, return @ forEach (direct e-mail of some kind) I want to swear intelligently and give a pledge not to use it. But in general, I still liked the language, concise, everything you need to eat. The only pity is that the assignment in expressions was forbidden and the base types were not cleaned as in Ceylon (probably, for less effort when docked with Java).

Fantom (1.0.69) - this is such a “thing in itself”, phantom, seems to have already existed since 2005, and it seems like there is no such thing. He took into account exclusively because of the story that he was tearing “like a tuzik heater” in Java performance on her own JVM, well, in general, it was interesting to look at the language that is compiled into certain self-made fcode, which can then be translated as on JVM so on the CLR. What to say about the language itself? After I read that generics are “not yet supported,” but this “for now” lasts from some shaggy years and apparently will remain so, my practical interest in this language has waned. To be precise, the support for generics is there, but only for the built-in collections, which, in my opinion, is not enough for an industrial-level development tool. On the one hand there seems to be nullable types - “Str? str ”, but the compiler does not prevent access to str without checking for null. It's a shame ...

D (dmd 2.078.2) - was added later, solely for sporting purposes as an additional participant not from the JVM. Unfortunately, writing to D is not very comfortable due to the lack of an IDE with a “human face”. The site contains a whole list of “IDE”, some are even written on D, but in fact it all turns out, sorry, crap ... not true. All of them are unsuitable for comfortable writing code. In reality, on what you can somehow “labat” this plugin for Idea, but even there is no code finisher even after the dot. The most suitable for work is the DDT plugin for Eclipse, but its author wrote that he stopped working on the project, apparently disappointed in D. Regrettably.

I didn’t have time to investigate / analyze / test each aspect of the language in detail, so it was decided to write in all the studied languages ​​a similar code with the same set of classes / logic and to be sure to use:

• Intensive I / O
• Parsing lines
• Floating point calculations using mathematical functions: sin, cos, atan2, toRadians, sqrt
• Working with collections: associative arrays and just arrays
• Lambda (if any)

If the language had its own libraries, I tried to use them, if not, then from Java.
The huge text files from the previous project with map data were at hand, so the test applications do the following work. In the command line, the path to the directory with the files is given, the participant should get a list of files by filtering them by extension and processing each one. Processing consists in line reading and parsing a line, and if it is a segment (that is, a segment on the ground with the beginning and end coordinates), calculate its length and put in the map all segments that are within a certain radius from the specified point. The key in the map is the segment group identifier. The program itself measures the execution time of the entire operation in milliseconds and gives the number of found groups of segments.

The most concise code came out on Kotlin, the most elegant and accessible for quick understanding on Ceylon.

A few boring numbers, the amount of code in lines / kilobytes:

• Ceylon: 128 / 4.7
• C #: 177 / 6.2
• Fantom: 153 / 3.8
• Java: 203 / 6.1
• Kotlin: 117 / 3.9
• Qt / C ++: 413 / 8.3
• Scala: 123 / 3.6
• D: 204/5

The volume of the resulting binary files in kilobytes:

• Ceylon (* .class): 31.1
• C # (.exe): 8.7
• Fantom (.pod): 7.5
• Java (* .class): 9.9
• Kotlin (* .class): 20.9
• Qt / C ++ (* .exe Release): 37.7
• Scala (* .class): 20.7
• D (.exe, Release): 1765

Well, now the most delicious, the results of the race ... Who ran first, and who was an outsider? Place your bets, gentlemen! Personally, my predictions did not materialize, and the results were surprised. Testing was conducted on the same machine, but on two platforms: Windows 7 x64 Professional and Linux Debian x64 Stretch Stable. Both Orcale Java 8.152 was installed there and there. On Linux, the compiled .exe C # file was launched via mono from the standard Debian repository. It all started under the same conditions on an unloaded system. I spent two rounds, in the first at the entrance there was only one relatively small 30MB file - sprint, in this case there was some variation in results from launch to launch, so I ran each participant 20 times and took the average result. In the second round, the conditions sharply tightened, at the entrance there were several files with a total volume of about 900MB - a marathon. In this case, the spread was significantly smaller, apparently due to the “warm-up” of the JVM. But for the purity of the experiment, I still ran 10 times:

Windows 7 x64 Professional

One file (30MB)

• Java: 621ms, 1.00
• Kotlin: 667ms, 1.07
• Scala: 745ms, 1.20
• D: 797ms, 1.28
• C #: 1143ms, 1.84
• Ceylon (JavaLibs): 1160ms, 1.87
• Fantom: 1362ms, 2.19
• Qt / C ++: 1378ms, 2.22
• Ceylon: 1479ms, 2.38

Several files (~ 900MB)

• Java: 22932ms, 1.00
• Scala: 23013ms, 1.00
• Kotlin: 23300ms, 1.02
• Fantom: 32047ms, 1.40
• C #: 33349ms, 1.45
• Ceylon (JavaLibs): 38466ms, 1.68
• Qt / C ++: 40444ms, 1.76
• Ceylon: the test failed, the participant could not reach the finish line and was forcibly withdrawn after 5 minutes of work on a large 738MB file (see comment below).
• D: Peacefully rested in the Bose, epitaph: “core.exception.OutOfMemoryError@src \ core \ exception.d (702): Memory allocation failed”

Debian x64 Stretch Stable:

One file (30MB)

• Java: 612ms, 1.00
• Kotlin: 652ms, 1.07
• Scala: 686ms, 1.12
• D: 785ms, 1.28
• Qt / C ++: 1023ms, 1.67
• Ceylon (JavaLibs): 1190ms, 1.94
• Fantom: 1356ms, 2.22
• Ceylon: 1480ms, 2.42
• C #: 2119ms, 3.46

Several files (~ 900MB)

• Java: 22161ms, 1.00
• Scala: 22625ms, 1.02
• Kotlin: 22865ms, 1.03
• D: 22876ms, 1.03
• Qt / C ++: 31349ms, 1.41
• Ceylon (JavaLibs): 34664ms, 1.56
• Fantom: 40903ms, 1.85
• C #: 68038ms, 3.07
• Ceylon: the test failed, the participant gave up the spirit before reaching the final with the following epicrisis: “Exception in thread“ main ”java.lang.OutOfMemoryError: GC overhead limit exceeded”

Unfortunately, Ceylon, which I liked so much (the song you're not singing), barely dangled somewhere at the end, the gap with Java is significant. Scala drove nostrils with nostrils with Java. Fantom naturally did not make any revolution. Java is the leader in speed, but C # was a little surprised, I thought it would be somewhere on a par with Java. I did not expect the native code to be in the tail! You can, of course, write off Qt for slowness, but still ... No, well, you can, of course, rewrite to std libs or even to pure C with assembly inserts, but this will be a completely different code. It was hoped that the young and early Kotlin will show performance similar to Java, but there is still a lag, significantly less than Ceylon, but more than Scala. And somehow it is not clear where a performance can sink in Kotlin, because it does not have its own input / output, mathematical library and collections, all this is used from Java. Here, for example, Ceylon let homemade collections, as it turned out, when I began to understand the reasons for its failure on a large amount of input data. They turned out to be too voracious to memory, and they ate 8GB of RAM without even noticing, so the test and “freeze”, for example, when running on a machine with 24GB of memory, the test for Ceylon passes normally, but, of course, absolute numbers cannot be compared, this is different iron. I still managed to get Ceylon to work when I replaced the collections from its standard library with those from Java (HashMap and ArrayList) and the test was passed. Also added the result for Ceylon when using Java libraries for I / O and the collection, the results are slightly better, but do not save the situation.

By the way, in passing, it can be noted that on the same hardware, Debian provides slightly better performance (I put the initial data on the ext4 partition, and did not take it from NTFS).

findings

As I said, of all the participants, I liked Ceylon most of all, I really wanted to use it on it. But it seems that he is not yet ripe for industrial use, the creators need to seriously attend to profiling and clean up the bottlenecks, however, I will follow his fate. Of the remaining options, I would choose between Scala and Kotlin, by the way, they are very syntactically very similar, the same a la pascal style of declaring functions and variables. Scala is better optimized for speed and more stable, but in Kotlin there is a “smart cast” and null-security. I, perhaps, while I will stop on Kotlin and I will study it in more detail, perhaps, we will write the following internal project on it.

PS Yes, yes, I foresaw the "code in the studio" J
Here are the sources: drive.google.com/open?id=1N3sEsw4MZ33GI-PPQOw_vocahGdjg9bq
Here is the mp2dcf tool that converts maps in Polish format to DCF: drive.google.com/open?id=12SlixUmpnrKH5Eh8k69m_T1tmWCVy6Ko
Maps can be downloaded for example here: navitel.osm.rambler.ru
I managed to try on Italy: navitel.osm.rambler.ru/?country=Italy

Laid out the source on GitHub: github.com/akornilov/LangBenchmark

UPD
Updated performance test digits.

Thanks to the comments in the discussion, the code has been improved / corrected:
asdf87 - added a BufferedStream to C # (no noticeable effect on the result);
stack_trace - fixed Qt / C ++ code , adding to map (does not have a noticeable effect on the result);
AlexPublic - an important observation in Java implementation, the length of a segment, unlike other languages, was calculated on demand, and not when creating an object.

Back in Scala, Kotlin, and Ceylon (JavaLibs) , IOException was added as in Java .
In Kotlin, instead of the Java API, the extension function from the standard library is now used to read lines: .useLines .

The method of obtaining results has also changed: a script was written that automatically launches all the participants of the race several times and calculates the average time of work and controls that the number of segments found matches for everyone. For the test on small files (sprint), each participant starts 20 times, on large files (marathon) 10 times. For all languages ​​with JVM -Xmx2g is exposed.

UPD
1) For the sake of sports interest added to the "race" D. The participant showed a good (but not brilliant) result, but, unfortunately, in the Windows Marathon, he died from a lack of memory (this is with 8 gig operatives). , , Linux . , -, .
2) Kotlin 1.2.30-release-78 , «» , , . — Kotlin ! - . Kotlin, — ! Kotlin Java «».

Source: https://habr.com/ru/post/345100/


All Articles