Java Performance Conversations

Every year at the JPoint, experts deliver hardcore Java performance reports. And it has never been boring - the question remains relevant for many years. About where the legs of myths grow from, what the JVM does, how to measure performance, what’s the business requirements of the customer, and how to get around the rake, we talked to experts for whom Java performance is not a problem, but work.

Java Performance and all-powerful JVM

What happens on the Java Virtual Machine (JVM) side, how does it affect performance? About this and a little about Java performance in general, we talked with JVM Engineer at SAP Volker Simonis . Volker answered questions in English, here we publish the translation.

- Why say that Java is slow and performance problems are critical?

- I think that the perception of Java as something slow is in the past. Currently, Java virtual machines have really cool JIT compilers and garbage collection algorithms that provide very good performance for regular applications. Of course, you can always find examples where Java is slower than native C / C ++ applications. But systems such as Hadoop, Neo4j, or H20 are examples of the fact that even large, complex, high-loaded applications can be written in Java. That is, Java performance is very high and, as a Java technology, is highly competitive.
')
In my experience, people today complain more about the performance of "alternative" languages for JVMs, such as Ruby (JRuby), Clojure, or Scala. I think the JVM can optimize them as well as pure Java code. It is only a matter of time.

- How to evaluate and measure Java performance?

“Measuring Java performance is real science. There is nothing surprising in the fact that companies such as Oracle, Google, Twitter or Odnoklassniki have dedicated performance teams consisting of very experienced and qualified employees.
The familiar approaches to profiling Java application performance are likely to always fail.

Currently, Java Microbenchmark Harness or, for short, JMH Alexei Shipilev (note through “”, he insists on this, at least in Russian-language publications), which is part of OpenJDK, has established itself as a standard for measuring and analyzing performance small and medium java applications. The complexity and ingenuity of the JMH implementation will give you an idea of what JMH does to accurately measure the performance of a Java application. His blog at http://shipilev.net/ is an indispensable source of information on performance and optimization of Java.

And while JMH is mainly responsible for measuring and optimizing the performance of Java code, the problem of improving GC performance continues to exist. And this is another story that requires its own, different set of tools and expertise.

- Where are the most frequent performance problems?

“This, of course, largely depends on the type of your Java application.” If we are talking about data-bound applications (that is, if you are running simple code on huge data), it is probably more important to choose and configure your GC algorithm correctly. And even in this case, you first have to choose, whether you want to optimize the application for maximum throughput, or you consider the application latency to be more important.

On the other hand, if your application is computationally dependent, you can use tools like JMH and / or native profilers to verify that the virtual machine really fully embeds and optimizes all the loaded executable branches.
For tightly parallel applications, it is better to choose the correct synchronization / blocking policy (which may vary depending on the hardware / platform on which your application is running) and for input / output related applications, it is important to choose the right abstraction (i.e., blocking vs non-blocking) and API .

But the main golden rule: most efforts should be directed to the correctness of your algorithms, because no virtual machine can ever turn badly written code into a high-performance program. Fine-tuning and fine tuning at the JVM level, as explained above, should always be just the last step of the development cycle.

- You are a JVM expert. How often does a JVM cause poor performance? What are the main problems and how to deal with them?

“Ideally, after a certain number of iterations, a Java application should reach a state in which JIT compiled code runs at least 95% of the time. In this case, there are regular, but rather short GC-pauses. This can be easily verified using tools such as JConsole, JITwatch or Flight Recorder. If this is not the case, then you need to find out which part of the virtual machine is causing the problem and zatyunit (or optimize) the corresponding component of the VM.

You may know that the HotSpot JVM product version offers about 600 so-called advanced options (these are options that start with -XX), and the debug build includes more than 1200 such options (you can use the key for a complete list). -XX: + PrintFlagsFinal). With these options, you can get detailed information about what is happening inside the virtual machine. But the most important thing is that these options can be used to fine-tune almost any subsystem of a virtual machine. The bad news is that these options are not well documented, so you will probably have to look at the sources to understand what is really going on.

Some problems with garbage collection can be solved by choosing the right Garbage Collector (in the current version of HotSpot there are full of them) and the correct configuration of the heap. Performance problems in generated code are often caused by the wrong choice of what needs to be inline. Again, this affects to a certain extent, but, alas, this is a very difficult topic and it is difficult to achieve improvement in one part of the code without compromising the performance of other parts.

Locks, thread waiting, and various cache effects are another source of frequent problems. But to solve them requires a deep knowledge of the virtual machine, as well as knowledge of the insides of the hardware platform and operating system.

Finally, I want to note that lately we have increasingly seen problems with the work of Java in virtual environments . The JVM performs a large number of “self-configurations” at startup in order to optimally adapt to the environment. But if the information on real resources (such as the number of CPUs or the amount of available memory) does not correspond to what the JVM received, this can lead to very poor performance.

For example, if the operating system reports a large number of available logical CPUs, the virtual machine will create multiple threads for garbage collection and for JIT. But if the host operating system distributes these threads to a very small number of physical CPUs, then the user threads of the Java application can be completely superseded by the system threads of the virtual machine. A similar problem arises if the guest operating system allocates a large amount of memory for some JVM, but this memory is fumbled (shared) between other guest OSs.

- How are the memory management algorithms implemented in the JVM, what is being done to ensure that developers get less performance problems?

“In fact, the HotSpot virtual machine uses different memory management systems and strategies. The biggest and probably most known for a regular Java programmer is, of course, heap, plus the heap-based garbage collection algorithms that work on it.

However, the JVM still works with many other memory pools. There is a metaspace that stores the loaded classes and the cache for storing the code generated by the virtual machine (the most probably well-known type of code is methods compiled by JIT, as well as generated interpreter chunks and various code stubs).
Finally, various virtual machine subsystems, such as the GC or JIT compiler, may for some time require a substantial part of the native memory in order to do their work. This memory is usually allocated on the fly and is maintained in different segments, resource zones or subsystem caches. For example, JIT compilation of a large method may temporarily require more gigabytes of native memory. All these memory pools can be customized to the requirements of a particular application.

- Please tell an interesting story from your practice when there was a Java performance problem. How was it resolved?

“Our SAP JVM runs on many different platforms, including HP-UX on PARISC and Itanium CPUs. HP-UX on Itanium provides software emulation for PARISC binary files, making it possible to run PARISC applications easily and transparently. True, an order of magnitude slower than native applications.

After we received too many customer complaints about the very poor performance of our JVM on HP-UX / Itanium, we began to figure out what was wrong. It turned out that clients used PARISC binaries to run on HP-UX / Itanium. Since we couldn’t explain the complexity of emulation to each client, we simply added a check to the next version of our JVM, which prohibits the execution of code for PARISC in the mode of program emulation.

Java Performance for Enterprise

Agree, an unpleasant moment when the application came out in production and returned with the indignant remarks of the client about the braking and freezing. Java performance issues are critical primarily for Enterprise solutions. That is why for hardcore practical advice we went to the expert, Performance Architect at NetCracker (telecom solutions company), Vladimir Sitnikov .

“We can hear about possible performance problems in Java in both training and professional environments. In every joke there is a joke share. Part of the slowness of Java is a myth, especially since the performance of the platform changes with time, and what caused criticism 10 years ago now works completely differently. To talk about Java performance at the level “slows down - does not slow down” imprudently.

In practice, the main problems in industrial applications are associated with the presence of unnecessary code, when actions are performed more than is necessary to solve the problem. Somewhere the developer has become too smart, and somewhere the business requirements are so complicated that you can’t write quickly the first time. Sometimes there is enough iron performance, and sometimes the solution is slower than expected. The main recommendation here is one - do not execute unnecessary code, or at least part of it.

Another reason for the decline in performance lies in the algorithms; when, for example, they use brute force instead of finding a record by an algorithmic attribute. But in practice, ingenious (smarter than HashMap) data structures are rarely used to improve performance.

May have an impact on performance and third-party development tools, such as libraries and frameworks. In practice, problems can occur at all levels - and they happen. Developers usually identify them on tests, they are analyzed using different profiling techniques (at the browser level, application server, database). Problems with third-party code occur frequently, and the only way to catch them is to take measurements and carefully monitor the development of this third-party code. In my practice, no month, then a godsend. It happens, the replacement of one line of code speeds up the insertion of data into the database 10 times.

In general, the main thing in measuring and working with Java performance is not to forget about load testing. Accordingly, the business requirements for enterprise development are so great, and so huge are the tests that check them and help them to notice unpleasant issues related to performance. For example, it must be taken into account that the load in the corporate environment is uneven: in the morning everyone came, logged in, the peak load, then the lunchtime went down, the load was the smallest at lunchtime, and after lunch the users increased the load to high values with a new power. Of course, in the first analysis such nuances may be missed. And in fact, they can have a very strong effect on where to expect a massive load and how to write code to avoid performance problems.

You ask me about the most common mistakes and rakes. Everyone has their own. For example, the choice of a particular processor or memory model rarely changes. Of course, if samples of 2006 and 2016 are compared, then the model matters. Many questions lie in the plane of scalability. Many are in search of the holy grail of enterprise development: everyone wants, after making a single measurement, to predict how many servers are needed and how much power is needed to deploy the application. Here the only way out is to test on a live server, take into account the resources for fault tolerance.

In general, probably, there is a universal optimization tool - the inevitability of testing. We have specialists in load testing and analysis of results, but the main effect is achieved when the developer himself is responsible for the speed of the code. Performance is not something that can be taken and added after the fact. It’s hard to predict where the problem will be. It’s not enough to suspect every line of the source code. A common problem is that the test was conducted either wrong, or on the wrong amount of data, or on the wrong load and configuration. For example, when testing the library was not updated, and the customer took and updated - consider, the problem is missed.

In my memory there were projects that tried to get around the performance issue, but they were either small demo projects or those that returned very quickly with complaints from customers about the slow operation of the application.

Often, not a single global error, but a combination of two or three problems, each of which, at first glance, does not pose a threat, leads to a decrease in performance or a fall in the entire application. And the worst thing is that the developer may not know about the problems that have arisen in his code. If the problem can be solved by technical support engineers, they rarely seek the author of the code in order to “reward” him.

By the way, about the complaints. If the customer is indignant, to understand where to look for the problem, there is a simple algorithm. You need to ask the customer’s representative how it should be. If everything works out slowly and it seems to you that it is a matter of subjective sensations, specify how long it should work. Such requirements, of course, are also best collected in advance (during development, or before it starts). We split the requirements by component: for example, if the user wants the page to open in 5 seconds, we allocate the budget to the browser - 1 s, to the server part - 3.5 s, the network - 0.5 s, and so on. Often, the breakdown goes right down to the components of the system itself, which, by the way, log the runtime, and the developers, by opening the profiler, can check if their code fits within the acceptable limits.

So, there are problems. Where to look, where do legs grow from?

Look at the CPU load and the contents of the GC logs. If the system is loaded, and at the same time the load is caused by the operation of the GC, then either there is little memory (you need to increase -Xmx), or "someone eats too much."
By itself, a 50–100% load is already indicative of the fact that response times can be the most inconceivable.
Sometimes, the fault lies with third-party applications. A harmless update can bring a pack of brakes "as a gift." A canonical example is the story of RedHat Linux: the huge pages mechanism causes performance problems. This is a “feature” of Red Hat Linux, due to which “suddenly” both java-applications and databases start to slow down. The problem is subject to Red Hat Linux 6+ (CentOS 6+). Under the link you can read more. THP mode must be disabled. If such adjustment of hardware and programs is performed manually, then the human factor is not excluded, and as a result - brakes.
To ask classic questions “did it work before” and “what changed”. Ask what was done on the client side, whether updates were rolled into the system components. If rolled - the search vector becomes clearer. If there is no clarity, the memory is free, then we look into the profiler what activity takes up the resources. Well, then everyone has their own story with their own code.

Now let's look at the problem of performance closer to real conditions, to the client. Often the application behaves completely unexpectedly in production, although on tests everything was perfect. And there is an explanation. If the situation is reproduced only at the client, then this may indicate a problem with the original data. For example, we know that an application has to perform operations with 1,000,000 clients. We generate 1'000'000 random names and surnames, the tests pass well, the search for the client's full name flies. We leave in production - users are indignant. We take a closer look at the problem search - and it turns out that Ivanov Ivanov Ivanovich is a thousand. If the load tests do not take into account such a number of repetitions, then it is not surprising that in the tests a couple of the found lines are displayed instantly, and in the search operation it works through time.

Generate the correct data set for tests - the whole science. It is necessary to create not just meaningful, say, 100 GB of data, but also to take into account the interaction between the data - it should be as similar as possible to what will be in operation.

The ideal case is the presence of a client base dump. It can be used directly or pre-encrypted. Moreover, encryption should take into account the existing relationships and links between objects and ensure data security at the same time. For example, simply replacing all values with MD5 is not safe enough. Even a Google search on MD5 can find the “original” value. It is more correct to use HMAC so that, on the one hand, identical data is hashed into identical hashes, but decryption will not be possible.
It also happens that there is no dump. In this case, the parameters of the future system are studied and scripts are written that generate test data based on these parameters. At the same time, it is necessary to avoid even distributions, because, in reality, it is rare when distributions are even.

What else can you say about Java performance for an enterprise? Collect non-functional customer requirements: a breakdown of the work scenarios, the frequency of user interactions with a particular application module, the number of users and the expected time of work. Correcting performance problems in the later stages can be very expensive, and an extra second while the call center operator is waiting leads to serious losses. ”

If you have any questions about the statements of experts or you want to tell us about your position on Java performance, welcome to ~~comments~~ at the JPoint 2016 conference on April 22-23. All will be there!

Source: https://habr.com/ru/post/280420/

All Articles

Java Performance Conversations

Java Performance and all-powerful JVM

Java Performance for Enterprise

More articles: