Let the Holy War begin: Java vs C ++

On the eve of Joker 2016 we dashed a post about Java Performance , which caused a storm of emotions among readers. In order to throw fuel into the fan and still try to come to some kind of a single solution, we decided to attract experts from different “camps”:

Dmitry Nesteruk . Expert on .NET, C ++ and development tools, author of courses on technology and mathematics, quantum .
Andrey Pangin . Lead programmer at Odnoklassniki, specializing in high-load backends. He knows the JVM like the back of his hand, as he had previously developed the HotSpot virtual machine in Sun Microsystems and Oracle for several years. Likes assembly language and low-level system programming.
Vladimir Sitnikov . For ten years, NetCracker OSS has been working on the performance and scalability of the software used by telecom operators to automate network management processes and network equipment. He is interested in Java and Oracle Database performance issues.
Oleg Krasnov . SEMrush CTO and ANSI C adept.

Andrey Pangin

- Java and C ++, what do you think is the most popular language now? Both of them are already adults, but who is more mature and honed?
')
- First of all, I do not consider that there is any competition between these languages. Each of them has its own niche, and they perfectly coexist together. Traditionally, the popularity of Java is somewhat higher. The Java platform attracts with its powerful tools for debugging and maintaining applications. However, the significance of C ++ is difficult to overestimate. Despite the fact that it is a language with a great history, it continues to grow rapidly: only developers managed to get used to C ++ 11, as C ++ 14 came out with many new interesting features.

- What can languages give in the world of high-loaded servers? Does it make sense to develop individual system modules in different languages, honing them for specific tasks? If they could (they wanted to, or simply would have had the opportunity), would they use C ++ for solving problems, or would they do everything only in one language?

- Everyone understands something of their own as high-loaded servers. For some, there are thousands of network requests per second, for others, parallel computing over large amounts of data. Different tools are better suited for different tasks. We (Odnoklassniki) have modules written in C ++, in particular, related to image and video processing - they need SIMD computing and the most efficient use of the processor. However, for most of our systems, Java performance is enough. Moreover, the code fragments, previously developed in C and called via JNI, were gradually rewritten in Java, as a result, we even won in performance because we got rid of unnecessary copying and JNI overhead.

- Using Unsafe in Java, justified or not? Why not use C ++?

- I have a whole report on why we use Unsafe. There are a number of scenarios where one cannot do without Unsafe, in particular, for working with off-heap and interacting with native code.
If we wanted to write an application entirely in C ++, we would have to re-implement all our common frameworks and protocols: for collecting statistics, for monitoring, for communication between servers, etc. And so - we have only a small part of the code with Unsafe which is responsible for low-level operations, but we conduct the rest of the development in familiar Java, adhering to the best patterns of writing simple and understandable code. It is much more convenient when the entire ecosystem is developed on a single platform.

- What are the most frequent performance problems in the development of Enterprise-systems and possible solutions?

- Rarely, when we run into the performance of the Java platform itself. Usually, problems can be solved either by replacing the algorithm or by scaling, that is, building up iron. The most common bottleneck is network bandwidth or disk I / O. But if we talk about JVM, then the main or even the only performance problem is sometimes delivered to us by the garbage collector, because the pauses longer than 500ms are often critical in our cases. Therefore, we try not to make the Heap too large: a maximum of 50 gigabytes, but more often even less: from 4 to 8 gigabytes per application. We try to move large volumes outside the hip: we even made a framework for creating large high-loaded caches . An additional advantage of such a cache in comparison with hip is the persistence, that is, the ability to restart the application without losing data. This is achieved through the use of shared memory: immediately after starting the application displays the shared memory object in the address space of the process, and the cache with all the data becomes instantly accessible.

- What can you say about the imminent release of JDK 9 and its main feature - modularity?

- There was a lot of talk about modularity, the release date was even transferred several times to finally finish this modularity. But at the same time among my friends I don’t know a single Java developer who would really need this feature. I think it would be better if we released JDK 9 earlier - the developers would be only grateful. For example, modularity hurts us rather than helps: one of the side effects is that Unsafe will now be hidden deep inside, and is unavailable without special keys. But in JDK 9 much more pleasant innovations are expected, for the sake of which the new version should at least try: improvements in G1, Compact Strings, VarHandles, etc.

- If you look at it very roughly, the difference between C ++ and Java is in the runtime layer, which, among other things, performs all sorts of optimizations. Which is preferable: use the architectural features of the machine manually (C ++), or is it better to rely on dynamic JVM optimizations? Speaking about specific things, is it better to do automatic garbage collection or manual control?

- Adaptive compilation and automatic memory management are just the strengths of Java. In this, the virtual machine has succeeded and surpassed static compilers. But the main thing is not even that. We choose JVM for the security guarantees that it gives us. In the first place - protection against fatal errors due to incorrect memory handling. It is an order of magnitude more difficult to look for problems related to pointers or going beyond the array boundaries in unmanaged code. And the cost of correcting such errors more than covers the benefit of the small speed advantage, which gives direct memory access. As mentioned above, we sometimes use Unsafe, and in these cases we automatically expose ourselves to the same risks as in C ++. Yes, we sometimes have to understand the crash dumps of the JVM, and this activity is not pleasant. That is why we still prefer pure Java, and use unmanaged code only in cases of extreme necessity.

I will also have a report on Joker on the topic: “Myths and Facts about Java Performance”.

Dmitry Nesteruk

- Java and C ++, in your opinion, what is the most popular language now? Both of them are already adults, but who is more mature and honed?

- If we talk about demand, then everything is obvious: Java, of course, is more in demand than other languages. C ++ occupies its own niche in the three main disciplines (game dev, finance and embedded), well, plus is the main language for HPC and scientific computing. Therefore, if you stick to selfish interests, then Java is certainly safer as a skill, unless you are going purposefully to one of these areas.

As for maturity, everything is difficult and you must first break into language features, compiler abilities and features of standard libraries.

Let's start with the first - with languages. And there and there are problems. With Java, the problem is that the language does not develop as rapidly as its closest competitor, as a result of which features come very slowly and not in the way you want. It is noteworthy that C # is younger, but lambdas first appeared in it, LINQ technology (Language Integrated Query is such convenient mechanisms for traversing and sampling data sets), and the original solutions based on C # (that is, supporting properties and delegates) were also performed correctly and successfully.

As for C ++, the main problem here is 100% C ++ compatibility with the C language, which automatically means a huge baggage of useless language features. On the other hand, the stagnation of C ++ in the 2000s also did not add to the popularity of the language, since developers need to be constantly fed with new features. Now the situation is better - in C ++ there are lambdas (by the way, more expressive than in C # / Java), type inference for variables and even values returned from a function, in general, the language somehow evolves.

This is what concerns languages. Now about compilers. Here, first, the comparison is not entirely correct, because JVM is a JIT, that is, the idea that you can take bytecode and turn it into the perfect representation for the current processor, with all the optimizations that apply. That sounds good in theory - I don’t know how it is in Java, but in the .NET world this approach, compared to the C ++ compiler optimizations, of course, it does almost nothing. If you do the math or, I say this: if you, say, buy a mathematical .NET library on the web, then it will be just a wrapper around C ++.

Yes, and regarding the C ++ compiler: I use Intel C ++ for computational tasks, that is, the compiler that is supplied by the processor manufacturer itself. There are a huge number of disadvantages in this: there are fewer language features than in MSVC, a bunch of awkward errors for which you have to contact support, but we eat this cactus for one simple reason: optimization. Intel'ovsky compiler generates the most effective code. Of course, not a single code: here we have all the power of Intel Parallel Studio is used, this is also the Threading Building Blocks for parallelization (by the way, an analogue of Microsoft Parallel Patterns Library), and Intel Math Kernel Library, which even if you don’t use directly, You use it indirectly through MATLAB and others. Here we need to clarify that a library like MKL is already optimized by guys from Intel: here both vectorization and parallelization, and even cluster parallelization via MPI (for example for FFT) is made out of the box - that is, take it and use it. And of course it is worth mentioning the means of profiling, which are also part of the IPS. This is a very powerful toolkit, in essence, it sets itself the goal of helping the developer optimize the code in terms of performance, well, correctness too - there is also memory profiling, so there are easy leaks and everything like that.

And finally, about “lib” - everything is simple, Java wins, in C ++ everything is bad. I will not even talk about the fact that the C ++ Standard Library interface itself is a little bit insane, but the problem seems to me not only that everything is "legacy", but that there are just very few features! We have just appeared such things as file system support and some kind of thread support. And then, here I have a string, I want to break it into substrings by a space - this is not in the standard library, that is, I have to take a third-party library (well, there is such a thing as Boost, there are many useful things). But the development really slows down. Many companies, such as Electronic Arts, they write their implementations of STL because they are not satisfied with the standard. Well, on the sidelines, many admit that we essentially need a new, from scratch, library, some STL2, although it would be more correct, of course, to call it Standard Library 2 or something like that.

There are still a lot of problems, for example, the absence of the main package manager, and also, even if it were, how is the library fumbling? In Java or .NET, you can simply distribute binaries, while in C ++ you need to fumble around. Nobody has really solved this problem yet, and this also slows down the development. sometimes you take someone else's liba, and then you spend half an hour just to make it work for you.

- How do languages generally feel in Enterprise, for example, in the banking sector? For example, in the HFT (High-Frequency-Trading) world there are heavy loads and high demands on reliability. Also, the financial industry is quite conservative. How does this affect the choice of a particular technology?

- Enterprise is one such large comb, under which any corporate development is currently underway. Globally - this, of course, C # and Java, and other languages somewhere on the periphery. As for banking, everything is somewhat more interesting, and it is especially interesting that C ++ appears in some places, well, there are some offices like Bloomberg, which are entirely in C ++, but it seems to me an anomaly. In general, if you now receive MFE, that is, Master of Financial Engineering, then there is mainly C ++ used, although now popular languages such as Python and R, well, MATLAB also remains relevant.

As for the HFT, this is also a controversial topic, but yes, it basically tends to C ++, and even to C, the use of any FPGA, where the system C is present, or people write HDL languages. When speed, performance is important, then the native code is somehow closer, although the argument that de “Java slows down” seems to me irrelevant. It's just that manual memory management is sometimes necessary for people, everyone is afraid of a big malicious GC, which will come and stop all flows at the very moment when you need to make some kind of deal.

In quantitative terms, the “pluses” remained rather from some conservative considerations, because the financial system, in contrast to the usual software development, he considers programming as a skill akin to knowing English, and not as something systemic. Accordingly, people simply learn C ++ and do not suffer, although now for analysis Python and R are somehow even more popular. But the "advantages" in investment banks car.

- For software development for embedded devices, which language is better suited, in your opinion? How do these languages allow us to write portable code?

- In general, the theme embedded is too wide. For many, embedded are any Rasperry Pi or Arduino, for me it is FPGA, for someone else something. But if to generalize, then embedded it is finite basically C or C ++, if we talk about the application layer. Of course, for FPGA development, I use either VHDL directly or write MATLAB, which after conversion VHDL throws out - the essence remains the same.

Specifically, about FPGA, since this is the only topic in which I understand at least something, I can say that languages, and the development approach itself is a good illustration of how all technology can get stuck much worse than C ++ somewhere in older models, languages and in general. It is very difficult to work with this technology and you essentially start using any kind of generator like MATLAB or write something of your own. That is, for people who work purely at the system level, shifting the bits to manual is normal, but I, as a person who wants, for example, to model a set of business rules in hardware, do not like this approach at all, and the language not enough to explain at a high level what I need.

And I’m just not qualified to talk about Java and embedded.

- If you look at it very roughly, the difference between C ++ and Java is in the runtime layer, which, among other things, performs all sorts of optimizations. Which is preferable: use the architectural features of the machine manually (C ++), or is it better to rely on dynamic JVM optimizations?

- Well, I, it seems, has already raised this question, but everything is an amateur. Here to take me, I practically use all levels of parallelization, I mean SIMD, OpenMP, MPI and not to mention any specifics like hardware accelerators. There are some SIMD optimizations in Java, and now .NET is slowly pulling up, but in fact C ++ still rules in terms of automatic optimizations, and let's not forget that in C ++ you can manually assemble assembly blocks. I understand that now nobody knows the assembler, and many have never seen C ++, but the point is that when it comes to pure computation, that is, mathematics, and I want to quickly, then why not?

I don’t really believe in dynamic optimizations. That's why: if you have a simple loop, let's say the array is summed up - yes, that can be recognized, parallelized there. The problem is, if you, for example, dragged some kind of dependency from the outside into a cycle, what then? In OpenMP, we have the appropriate markup, but the dynamic optimizer cannot solve such problems, never. Therefore, someone, for example, will look at CUDA and say that this model is absolutely unreal, why should I rewrite all algorithms, and even learn something? And as for me, it is inevitable, because optimizers work very well on understandable, simple things, do all sorts of inlinings, but all that performance-critical can be written with your hands, write in the native code and not suffer.

- How dynamic is the Java and C ++ ecosystem? How often do updates, releases, standards come out? How lively are languages (how many language features appear)?

- Well, I think we can say that the “pluses” are immortal, unlike Java as a language, where many new languages have appeared with interesting features - these are Scala and Kotlin, and others. Another thing is that language and platform are different things. Java as a language does not suit many people; in fact, therefore, new languages. But as a platform - everything is fine there, apparently, again, there are advantages even in comparison with the closest competitor (for example, in terms of GC). But as a language, there are plenty of grounds for complaints.

About C ++, I must speak here, I guess. Of course, after the community that way 13 years did nothing at all, new standards and new library features are, of course, good, fine, I would even say, compared with total apathy. In C ++ 11, many, many real useful moves have occurred, I am now writing a completely different C ++ than before. In C ++, 14 is still a bit better, but in C ++ 17, the whole world has again disappointed - that is, the features that have been waiting for everyone will not be there. The main feature that everyone wanted and wants in general is modules. Just now C ++ compiles very slowly, more precisely the primary compilation, since The incrementality of it, judging by MSVC, is simply super, but building it for example from scratch is a pleasure below average. Well, modules should solve this problem, but no one knows when.

Again, in C ++ there is such a problem that the most basic things are not in the standard library. And this newcomer, who needs to translate a string, say, to lower case or beat them by tokens, will simply lead to a stupor. Of course, there are many third-party libraries, but the very usability of libraries is also a question. Languages that have metadata - you see a function there and you know how to use it, even the documentation will appear as an extra. And in C ++, you can have a template argument of the Func type, that is, a function, and you can not understand the function signature, even if you climb into the sorts. And it is not clear what to do with this, actually.

In general, if to summarize, I would say that both languages are as if alive, and everything depends on what you actually need. In general, you can write productively on and on that. Regarding lib, here the pros lose, it’s clear to everyone, and surprisingly, because the language seems to be older, much, and the libraries, well, they even if there is, are from the world C and are not particularly usable, or they simply do not exist and need to be searched somewhere outside, download, compile, and only then use.

Oleg Krasnov

- Why C?

- When I came to SEMrush, there were no significant developments in server logic in other languages. At that time, I mostly programmed in C and decided to develop the product in this language. I believed in my strength. =)
For me, C is a simple and convenient language. With sufficient skill and knowledge of libraries, it is perfect for prototyping development at the level of a scripting language.
In SEMrush, the distribution of server-side programming languages is approximately the following: 1/3 is C and C ++, 1/3 is scripted, 1/3 is Java.

- Java and C, what do you think is the most popular language now? Both of them are already adults, but who is more mature and honed?

- My experience suggests that the C language is to develop things that are related to productive tasks. For example, this is work with sockets, data multiplexing, high-loaded multi-threaded applications, where you can and should maximally manage computer resources.
SEMrush does not have an explicit division of programming languages into zones of their use. If you need to start a new product, the choice of a particular language depends on the professionalism of the person who starts developing the architecture and programming. And also on how communicative and able he is to convey to his colleagues the ideas he wants to translate.
Quite frequent tasks for us are data collection and processing. Among the reasons why, for example, we do not use Java in all products, is that we have no wild inheritance in terms of entities. Due to the nature of our work, its depth in the vertical plane is less than in the horizontal one. That is, a large amount of data will be transferred between independent entities rather than between parents and descendants.

- Does it make sense to develop individual modules of the system in different languages, sharpening them for specific tasks?

- I believe it has. In this regard, we have everything very well. The development is carried out by small groups of 5-6 people, each of whom works on their product, and the interaction between them is carried out through the API. Both the user interface and services must interact “well” with each other. This is done, for example, using data formats such as JSON and Binary-JSON. So yes, you can use different languages to write the whole system.

We have our own database, written in C, and there were no major problems with its work. When I was engaged in the development of the architecture of this base, there were no ready-made adequate tools that would fit. This database works on ordinary files and, with this in mind, is quite reliable. If we exclude from consideration all measures to ensure its uninterrupted operation (redundancy, clustering), then even a “fallen off” disk (hardware malfunction or unforeseen technical reasons) will not cause much harm, no more than 8.5% of data will be lost. That is, the probability that users will suffer, is further reduced at the level of business logic. But in general, the system is built in such a way that data is not lost. Everything is very reliable.
We once carried out tests for maximum utilization of the performance of "iron" with 12 disks. If RAID5 was used, then the read / write speed was about 2.5x the speed of one hard drive. But our system uses 12 disks separately at the level of business logic, this allows us to achieve an 11-fold increase in speed due to the fact that each thread works with its own disk.

- How do languages generally feel in Enterprise?

- We have large products. For example, one of them bypasses the Internet and creates a database of site pages that link to each other. The kernel is written in C, which allows you to dispose of "iron", almost 100%. More than 150 servers are involved in this product, but this approach allows us to be sure that we do not overpay for the server park, and we, as you understand, are concerned with financial efficiency. Separately, I note that thanks to the agile development process, we have time both to deliver new features to the user, and to hone the performance of each product.

- What are the most frequent performance problems in the development of Enterprise-systems and possible solutions?

- To be honest, I can’t even remember any particular problems. If suddenly there is not enough power, then the use of the assembler, special libraries and the efforts of highly skilled programmers allows you to quickly solve the problem. But in 99% of cases we have no problems with productive solutions.
I have no prejudice about Java, but it is more demanding on resources. Yes, with its help it is very convenient to solve problems with complex multi-level business logic, to build interconnected systems. However, such tasks as networking, multi-threaded programming, or large binary data, in my opinion, are more suitable for C. In Java, this can also be done, but I would choose C.

- What do you think should be the "ideal" database? Is it worth bothering with application performance if all the power of the system is leveled by transactions to the database and how can this be fixed?

- The ideal database is not universal. -. , , «» .

— , , — Java ++?

— — , -. - , - — . . .

— Java C++? , , ? ( )?

— 99, , . , . ++ — ++14 ++17 (draft) , , . , . . .
++ ++. It is irrational. . ++, .

— ? , ? ( ), ++ ?

— . — . 10 , , 10 , . , , , , . . — . , java , , .. C++ . , java , .

— Unsafe Java, ? ++?

— , Unsafe Java () , . unsafe java . , OpenJDK java.util.HashMap Map.Entry , , . , , .. , . , javac JIT- .

C++, - entry . , . ++ .

java ? Unsafe, off-heap ? , , , . java HashMap .

« », (Unsafe, VarHandlers, memory mapped files, ..). , JEP 169: Value Objects — , Java .

— Enterprise- ?

— , . , SQL-. Those. . , , . «» . , . API 1000 , .

— JDK 9 — ? , Java?

— Java . , 30 , .

Java- ? ? .class JVM . JDK9, , , , java-. Java- , , .

– java-. JVM , .

— , C++ Java — runtime, , , ?

— Java — . , java- Iterator. , «» (invokeinterface). , Iterator . JIT- , , « hasNext». , - , « ».

C++, , , , .

, — . C++ «» , C++ (clang LLVM, ..).

Source: https://habr.com/ru/post/307180/

All Articles

Let the Holy War begin: Java vs C ++

Andrey Pangin

Dmitry Nesteruk

Oleg Krasnov

More articles: