⬆️ ⬇️

Java performance: present and future

For two decades now, myths have been actively breeding that performance problems are peculiar to Java applications. At the same time, Java is creating truly high-load systems. Who is right in the end? In order to form an opinion on how Java is doing now, we turned to two interested parties: the creators of Java itself and the customers who use Java in their systems. Alexey Shipilev (Oracle) and Oleg Anastasyev (Odnoklassniki) kindly agreed to answer our questions.





Java Performance through the eyes of the JDK creators





JUG.RU: Please tell us about yourself and your work?



Aleksey Shipilev: My name is Aleksey Shipilev. I have been working on Java performance for over 10 years. During this time I managed to work on different JVMs - first on Apache Harmony at Intel, then I moved to Sun Microsystems, where I was working on OpenJDK . At the moment, my job for the most part is to find performance problems in the product and designate ways to solve them, or to correct it with my own hands if the problems are simple. This includes optimizing for standard benchmarks, testing virtual machine performance, and solving client problems (optimizing their applications), and improving global things that are needed by the multi-million Java ecosystem.



JUG.RU: Do you think it is correct today to say that performance is a problem in general, Java, as a technology, and not as individual applications?



Alexey Shipilev: It is difficult to talk about this, since performance is often determined by the code of the final application, and not by the language used to create it. A programming language has no performance at all; it can only be implemented by this language. Moreover, there may be many different implementations. However, in the Java world it happened that most of the time we are talking about the Sun / Oracle JDK implementation, which occupies more than 95% of the market. We will keep it in mind.

At the very beginning, in 1995 - 2000, Java, like any young product, was really not very effectively implemented. But over the past decade, so much has been done in Java implementations that the problems that were previously considered typical have ceased to be so hard and so painful for developers to hurt.



Of course, many of the difficulties associated with the development of high-performance applications have been preserved, so to say that this problem is finally solved, in my opinion, imprudently. There are still a lot of things to do.



But it is important to remember that not everyone needs high-performance solutions: for many developers, productivity is not included in the list of success criteria - "it works in a reasonable time" and well. If developers really need every last drop of productivity, they will have to bypass the pitfalls that are present in any fairly complex product (and the HotSpot virtual machine - and OpenJDK as a whole - is a very complex product).



JUG.RU: In other words, performance is no longer a global Java problem?



Alexey Shipilev: I think so.

To be honest, it seems to me that the performance problems of any platform are exaggerated. As I said, most applications do not require high performance. But in the programming community there are persistent legends and simple recipes that are very convenient to repeat (when you repeat them, it seems that you join the group of initiates!). And one of these legends is that "Java slows down." Personally, I am sure that it has long been irrelevant. For her, there were objective evidence about 10-15 years ago, but now the situation has changed. Of course, even now you can write applications that will tackle performance problems in runtime. But these problems are mostly known, there are workarounds for them, and those to whom such ways are not suitable create their own “crutches”.



JUG.RU: How active is the development of the JDK and, accordingly, the elimination of known performance problems (if we talk about Oracle JDK)?



Alexey Shipilev: Actively enough, and for good reason, the Java ecosystem is very large, even in terms of one company. Oracle has enterprise stacks written in Java. Accordingly, any improvement that is made in the platform spreads throughout the stack and makes life easier for Oracle developers as well. But this is the story of why Oracle is developing OpenJDK. This story in different qualities is repeated for other vendors, and for other open source projects.



JUG.RU: What are the latest innovations in Java that seem most significant to you in terms of increasing productivity?



Alexey Shipilyov: First, I like the fact that the Garbage-First (G1) garbage collector story is slowly coming to a logical end. Garbage-First was announced a long time ago, but only in Java 8 and 9 it began to behave quite decently, so that it can be used on an industrial scale - so much so that it is enabled by default in Java 9. This multi-year project finally shoots and makes things that were thought from the very beginning.



Secondly, the story is close to me with high-performance applications that require Unsafe, to replace parts of which VarHandles is being developed. There are legitimate cases when you want to squeeze the last drops of performance using low-level hacks. But Unsafe, as you know, is a private API, not really standardized, i.e. its use is a run with sharp scissors on the embers of a burning building. And VarHandles is one of the ways in which we can provide a public API for such rare but important cases when you need maximum performance or some kind of functionality that is otherwise not available.



Another interesting innovation - Compact Strings . I personally participated in this project and other "string" optimizations. This kind of platform changes, which seriously improve commonly used classes, significantly increase the performance of all applications written in Java, and thereby reduce the need for crutches even more.



And in this case we got a very good result not only on synthetics, but also on real applications. The achieved 10% improvement in memory and performance on large applications is a very good increase for a healthy adult platform like Java.



JUG.RU: Since we are talking about performance, are there any "canonical" methods for measuring it? Is everything for money reduced to business?



Alexey Shipilev: Productivity is not always translated into money. In practice, it is rather difficult to assess how the performance gain affects the economic side of the issue. Often these are indirect effects - the time that a programmer spends writing code that fits into performance goals; time spent by users waiting for a result, etc. But with the placement of servers in the clouds and dense data centers, the performance is closer to the financial side of the issue: the faster your application runs, the less it consumes resources, the less you pay for renting and servicing servers. Moreover, this dependence in well-scalable applications can be simply linear, i.e. 50% overclocked your application - you need 2 times less hardware, you pay for infrastructure 2 times less.



In addition, the question of money arises when it is necessary to justify the time spent on optimization. Organizations that are engaged in commercial development - not a poorhouse. They pay engineers to help them solve their business problems, so organizations are trying to figure out whether to finance a specific direction of development; how much business profit will give these results. So optimization is not just “we were digging around with a screwdriver, because we are very interested in picking around here”. So individual developers can think, sometimes successfully associating these desires with business goals, but the business itself is not interested in optimization for the sake of optimization.



JUG.RU: In your opinion, what future innovations in the JDK are most expected in terms of performance management?



Alexey Shipilev: Value Types is a very expected innovation, which is currently planned to be implemented by the release of Java 10. This is a very complex project that requires a detailed analysis of how it fits into the rest of the platform. Java “sacred cow” is backward compatible. It is impossible to make a feature that will break it (or rather, you can break it in some minor moments, but you need a very good rationale for why you break it, and what ways the user has to work around).



Value Types solves a very simple problem. One of the pillars of Java as a programming language is the unspoken property that (practically) everything is an object. Therein lies an interesting rake: it stems from the fact that Java objects have individual properties. For example, identity: if you made an object that has the number 42 in some of its fields, and the second object, which contains the "same" number 42, then these 2 objects are different in terms of language, and differ just due to identity. From the point of view of implementation, this means that you need to store two separate copies of these almost identical objects - for example, to have where to keep the meta information about them. And when large object graphs appear in the application, the overhead for each object devours a substantial part of its useful memory. It would be nice if the language had entities without identity, for which this could have been avoided. And there are such entities: primitives! But their list is strictly fixed. A natural extension is to give the opportunity to declare entities that are written as classes, and work as primitives - these are value types.



Value types are significantly different from the usual reference-types. For example, is Object a supertype for all value types? It is logical that no, and then there are subtle moments of interaction with generics, with specialization, etc. There are libraries that make this kind of specialization with their hands (the same GNU Trove), but everyone wants this to be implemented in the language itself. So this is a very awaited feature: we know what bonuses it will bring; it is known now what problems will arise. However, in the course of development we will see again how many real bonuses are there and how many problems.



JUG.RU: Considering that performance problems are private rather than global, is it possible to talk about some typical pattern when optimizing applications?



Aleksey Shipilev: There are quite specific methodologies that prescribe where you should first look at on the basis of certain symptoms. Sergey Kuksenko, I and others gave reports on this topic.

For example, we can say that we have very good garbage collectors, but as you are not cool, if you litter a lot, as a result, garbage collection will take a significant part of the time. What runtime you do not write, and if the programmer hands wrote bubble sorting or linear search in an array of 100 million items, it will not be fast. There is no magic here - one fool can make such a task that the seven wise men will not answer.



In my experience, I can say that if the performance of a particular application has never been done or done poorly, then there is almost certainly (99%) many idiotic or obvious inefficiencies that can be quickly detected and quickly corrected by raising the performance at times.



JUG.RU: And besides garbage collection, what are some typical problems that can be easily optimized?



Alexey Shipilev: My favorite are problems with multithreading. It is known that the easiest way to write a correct multi-threaded application is to generously use synchronization. I'm not saying that this practice is flawed, but there are often problems with the fact that hardware resources are not fully used due to permanent locks. It is very easily diagnosed, and often easily corrected (often, however, requires revisions in the architecture).

Algorithmic problems are very common when rewriting bad pieces of code into good pieces that either have the best algorithmic complexity in principle or use specific knowledge about the data in an application in some way produces huge gains that haven't been dreamed of in any runtime optimizations.



JDK / JVM-specific problems are encountered, but rarely. Problems with data density in memory (where value types are waving to us again), problems with high-level optimizations (escape analysis and auto-vectorization, hello!), And problems with code generation fall here. And here is a slippery question - the problem is that the runtime is bad and does not work “correctly”, or that we don’t want to somehow change the decision so that we have better performance (for example, use an additional library). Different people and different organizations look at it differently.



In general, from my point of view, optimizing the performance of Java applications is not fundamentally different from optimizing a native application in which the JVM does not participate. JVM is, of course, a separate level in this hierarchy, but many of the problems that exist there are inherent in development in general, and not specifically in Java.



JUG.RU: Given that certain problems still exist, is there any point in using Java for high-performance applications?



Alexei Shipilev: You know, when I was a schoolboy, one of my teachers responded to a malicious question from some of my friends, why aren't we writing in such-swift C, said the following thing: "I will write my industrial code on Pascal (popular in those distant times), because he places pads on me everywhere, checks everything everywhere, will not let me shoot myself in the foot. And in that place where speed is important for me, I will deceive him so that it will be fast. " And this story is repeated with different actors and with different languages: Pascal versus C, Java versus C ++, C versus assembler, etc. In fact, the performance of a large application on the horizon of sane gains, as a rule, is determined by the performance of a fairly small piece in this application. Therefore, it may be easier not to run amok and not to write in a language that forces you to write low-level code, because you are going crazy. It is necessary to write on a high-level language, and where it is necessary to deceive it: to make it so that it is faster in specific places, going either to a less idiomatic code that repeats the curvature of libraries and runtime, or by dropping a heavy one to a lower level. The practice of industrial development in Java and the history of its performance in many ways, this approach embodies.



Alexey's next report will take place at the Joker 2016 conference in keyout format and, of course, he will be devoted to platform performance and ways to improve the performance of your code.



Java Performance in the eyes of the developer







JUG.RU: Please tell us about yourself and your work.



Oleg Anastasyev: I work in a team of the platform in the company Odnoklassniki. The platform team develops programs for classmates to work quickly, i.e. develops and maintains various data storages, frameworks for server communication with each other, etc. In addition, if something happens with speed in production, the platform team is looking for a solution on how to treat it. Our responsibility is to make Odnoklassniki work quickly, because if they don’t work quickly, they just won’t work - they will quickly collapse under load.



JUG.RU: Do you think it makes sense to use Java for high-load applications? Or is there simply no alternative?



Oleg Anastasyev: Perhaps Java is not the fastest language, there are also faster languages. But if we consider Java as a language for developing large loaded projects, here, in principle, there is no alternative to it yet. You can write faster code in C or C ++, but at the same time this code will be more expensive - its writing, debugging and subsequent support will be much more expensive than similar Java code. In addition, the C code that sticks to the Internet raises many security issues. As you know, in C, any unsafe constructions are possible, through which bad people will then hack you. There are fewer such unsafe constructs in Java, so the Java program will require less effort in terms of security.

As a result, Java has a very good price / performance ratio.



Java has a number of problems, in particular, we had to work separately on memory management and high traffic support, but they can be solved with a small amount of code — we created a separate one-nio library for this (link to https: // github. com / odnoklassniki / one-nio ). All the rest of the code has the positive features of Java that it has - rapid development, security, good tools for diagnosing problems, built-in JVM, error protection, etc.



JUG.RU: You mentioned that you had to solve certain performance problems. Please tell us more about them?



Oleg Anastasyev: For us, speed is not only the speed of code execution in Java. We consider it from the perspective of efficient use of resources - that is, the amount of data processed, and bandwidth, and memory consumption. And here in Java, really, a lot of things are missing: collections of primitives, structs, working with off-hooks, transparent use of native APIs, affinity management, file caches, etc., so we have to develop solutions that allow us to bypass the bottlenecks .



For us, the performance of Java in this interpretation is not enough, moreover, this issue can not wait for several years for the next release of Java - the problems should be solved right now, so we are actively looking for solutions to ourselves. Let's dwell on this in more detail.



For example, one of the pain points of Java is the speed of I / O, both blocking and non-blocking (in particular, network).

This is clearly seen in the video distribution example. The total outgoing video traffic now reaches 500 gigabits. In order to serve such a stream, we must distribute the video as quickly as possible so that as much traffic as possible falls on one server. Our hardware is capable of delivering 40 Gbps from the machine, but writing a Java server using standard solutions that will not use all 40 Gbps of the server will cause too much performance loss within Java itself. This is one of the problems that we solved as part of our open source library .



The example with 40 Gbps traffic is a kind of extremum. There are also less loaded servers, but there are also problems there. For example, another pain point of Java is storing a large number of objects in memory. Java has a garbage collector. On the one hand, this is good because it allows you to automatically clean up the garbage. But on the other hand, when you need to cache a lot of information in memory, it interferes rather than helps. Moreover, if the data array of one hundred gigabytes is stored in memory, then you want it not to be lost when the program is restarted - its loading will take considerable time. I want to store such arrays in shared memory, but there are no built-in tools in Java either. In such places you want to have manual memory management. It is good that in Java there is Unsafe, through which we made our own decision.



JUG.RU: Does the JDK evolve towards solving problems specific to your needs? Are there new options that you can use?



Oleg Anastasyev: The latest Java version released to the world is 8. There are no solutions to the problems mentioned. There is only the intention to solve some of these problems in Java 9, some of them in 10 and later. But it will or will not; It’s too early to say how much the proposed solutions are better than they are now, because, for example, Java 9 has not yet come out. Of course, the beta version can already be taken, but what will change until it reaches the release is not known. Therefore, Java will be released - we'll see.



JUG.RU: Are there any expected innovations that could help you? For example, VarHandles?



Oleg Anastasyev: Whether VarHandles will help us or not depends on how they are eventually implemented in the final version, and how quickly they will work.

VarHandles is a rather complicated way, even from an API point of view, to do something that can now be done simply and clearly through Unsafe. This is the ability to declare an array in the memory of one type, and then read it, as an array of memory of another type. For a person familiar with assembly or C principles, it looks like addressing a memory address and reading a memory cell as Long or Byte, depending on the situation. Roughly speaking, VarHandles allows you to do the same, but more difficult (this is a technically more complicated solution at the JDK level), but it has more protection from programmers who occasionally "shoot themselves in the legs."

In addition, VarHandles solve only one of the Unsafe usage scenarios. But we also use Unsafe for completely different scenarios (for example, for custom serialization or working with shared memory), and Java 10 is not exactly expected to be an alternative to this.



JUG.RU: Based on what does your company decide on the need to work on application performance?



Oleg Anastasyev: We are not engaged in speed for speed; always evaluate the economic effect of optimization.

For us, productivity is measured in the amount of money that needs to be spent on hardware: first for the purchase of servers, and then every year for their support in the data center. The greater the speed of applications, the less iron is needed for the same task, i.e. less money will go to his support.

Optimization of performance is based on the tasks. Can we afford this amount of equipment for this particular task?



For example, to spend, relatively speaking, the year of work of a highly qualified programmer, in order to improve performance by 0.5%, is not economically efficient until you are affected by some problem, say, all servers in the data center. Then it will be cost effective, and we will do it. If not, it is easier for us to buy new servers, i.e. solve the problem with iron.

For us, speed is a business metric; and it has a clear business case. The speed is sawn as long as it is economically efficient.



Despite all the questions about the performance of Java solutions, it is gratifying to see that continuous work is underway on these problems, and Java itself, despite all the attacks, has long been an industry standard for large-scale high-load corporate projects.






More interesting reports, technical, hardcore, you will find in the program Joker 2016 . We offer you a few examples:





')

Source: https://habr.com/ru/post/307178/



All Articles