Java code optimization tips: how not to step on a rake

Good evening, colleagues.

Translation of the article, which we will offer you today, is designed to help answer the question: has there been a need for an entire book on optimizing Java code? We hope that the material will not only seem interesting to you, but also useful in practice. Please do not forget to vote.

In this article, I will present a few tips on optimizing Java code. I will specifically consider specific operations in real Java programs. These tips, in essence, are applicable in specific scenarios that require high performance, so there is absolutely no need to write all the code in this manner, since usually the speed gain will be miserable. However, in the hottest areas, the difference can be substantial.

Use the profiler!

Before embarking on any optimization, the developer must make sure that he correctly assesses the performance. Maybe the code snippet that seems slow to us in fact just masks the true source of the slip, so no matter how much we optimize the “obvious” source of delay, the effect will be almost zero. In addition, you need to choose a control point by which it would be possible to compare whether your optimization gives any effect, and if so, which one.
')
To achieve both of these goals, it is best to use the profiler. It provides tools to determine exactly what part of your code runs slowly, how long it takes to execute this code. I can recommend two profilers - VisualVM (free) and JProfiler (paid - but absolutely worth the money).

Armed with such information, you can be sure that you optimize exactly the code you need - and that the effect of the changes you make can be measured.
Let's go back a step and think about how to approach the problem.

Before attempting to move on to point optimization of a particular code execution path, you need to think about which path the code is currently running. Sometimes the chosen approach is fundamentally flawed - for example, you can speed up this code by 25% at the cost of incredible efforts and every imaginable optimization, but if you change the approach (choose a different algorithm), the execution of the code can be accelerated by an order of magnitude and even more. This often happens when the scale of the data to be processed is dramatically changing. It can be easy to write a solution that will work in this particular case, but it may not be suitable for working with real data.

Sometimes the output is trivial - just change the structure in which you store your data. Here's an imaginary example for you: if the program usually accesses your data in random order, and you store it in LinkedList , then it is enough to switch to ArrayList - and the code will be executed much faster. When working with large data sets and solving problems where performance is critical, it is extremely important to choose the right data structure that meets the form of your data and the operations that are performed on it.

It is always advisable to look back and think about whether the code that you are trying to optimize is effective in itself, or it slows down just because it is clumsily written, or because not the best execution path is chosen for it.

Comparison of streaming APIs and the good old for loop

Threads are a great new feature in the Java language, making it easy to re-do peeking code snippets by abandoning for loops in favor of more versatile, reusable code blocks that guarantee reliable execution. However, you have to pay for such amenities: when using threads, performance decreases. Fortunately, this price, apparently, is not too high. In the case of the most popular operations, you can get both a few percent acceleration and a slowdown of 10-30%, however, this point should be kept in mind.

In 99% of cases, performance degradation when using threads is more than compensated due to the fact that the code becomes much clearer. But in that 1% of cases, when the flow from you, perhaps, will be used in a very active cycle, it is worth thinking about a certain compromise in favor of performance. This is especially true for applications with high bandwidth, makes you think that working with streaming APIs is associated with active memory allocation (in this thread on StackOverflow we read that each new filter eats up another 88 bytes of memory), so the pressure on memory may increase. In this case, you have to run the garbage collector more often, which has a very negative impact on performance.

Parallel streams are another story. Despite how easy it is to work with them, they should be used only in rare cases and only after you have seen from the results of the profiling of parallel and sequential operations that the parallel runs faster. When working with small data sets (the size of the data set is determined depending on how costly the streaming operations are when working on it), the costs of distributing tasks, scheduling them among other threads, and then stitching the results after the processing of the stream ends will incomparably overlap gain in speed due to parallel computations.

You also need to pay attention to the exact environment in which your code runs. If we are talking about a highly parallelized environment (for example, a site), then you are unlikely to speed up its work by adding another stream there. In fact, under high loads, this situation may be even more vicious than non-parallel execution. The fact is that if the workload is parallel by its nature, then the program will most certainly use the remaining processor cores as efficiently as possible - that is, you spend resources on the separation of tasks, and you do not add computational power.

I made a series of control measurements. testList is an array of 100,000 items, consisting of numbers from 1 to 100,000, converted to strings and then mixed.

 // ~1 500 / public void testStream(ArrayState state) { List<String> collect = state.testList .stream() .filter(s -> s.length() > 5) .map(s -> "Value: " + s) .sorted(String::compareTo) .collect(Collectors.toList()); } // ~1 500 / public void testFor(ArrayState state) { ArrayList<String> results = new ArrayList<>(); for (int i = 0;i < state.testList.size();i++) { String s = state.testList.get(i); if (s.length() > 5) { results.add("Value: " + s); } } results.sort(String::compareTo); } // ~8 000 / //  :     10 000            testStream public void testStreamParrallel(ArrayState state) { List<String> collect = state.testList .stream() .parallel() .filter(s -> s.length() > 5) .map(s -> "Value: " + s) .sorted(String::compareTo) .collect(Collectors.toList()); }

So: threads help a lot with code support and increase its readability, and in most cases they neglect performance. However, it is necessary to take into account the possible costs in those rare cases when it is really necessary to squeeze out of the loaded cycle all the performance to a drop.

Transfer date and operation with it

Do not underestimate the costs arising, for example, when parsing a string with a date into a date object and formatting a date object into a date string. Imagine a situation where you have a list of a million objects (these are either regular strings or some objects that represent an element as a data field, backed by a string) - and the entire list needs to be corrected for a given date. If this date is presented as a string, you will first need to parse this string to convert it to a Date object, update the Date object, and then re-format it as a string. If the date is already represented as a Unix timestamp (or as a Date object, which is, in fact, just a wrapper around the Unix timestamp), then you will need to do a simple arithmetic operation, addition or subtraction.

My tests show that the program runs up to 500 times faster if you simply operate on a date object, rather than parse it, convert it to a string, and vice versa. Even if we simply exclude the parsing stage, hundredfold acceleration is still achieved. This example may seem far-fetched, but I'm sure you know of cases where date values were stored in the database as strings, and also returned as strings in API responses.

 // ~800 000 /c public void dateParsingWithFormat(DateState state) throws ParseException { Date date = state.formatter.parse("20-09-2017 00:00:00"); date = new Date(date.getTime() + 24 * state.oneHour); state.formatter.format(date); } // ~3 200 000 / public void dateLongWithFormat(DateState state) { long newTime = state.time + 24 * state.oneHour; state.formatter.format(new Date(newTime)); } // ~400 000 000 / public long dateLong(DateState state) { long newTime = state.time + 24 * state.oneHour; return newTime; }

So, always consider the costs associated with parsing and formatting date objects, and if there is no need to keep them as strings, it is much more sensible to represent the date as a Unix timestamp.

Row operations

Manipulating lines is perhaps one of the most common operations in any program. However, if you do it incorrectly, it can be expensive. That is why I pay so much attention to working with strings in this article on Java optimization. Below we look at one of the most frequent pitfalls. However, I want to further emphasize that such problems manifest themselves only when executing the fastest code fragments, or when one has to deal with a significant number of lines. In 99% of cases, none of the following will happen. However, if such a problem arises, it can have a deadly effect on performance.

Using `String.format` when simple concatenation could work

The simplest call to String.forma t is about 100 times slower than when manually concatenating values into a string. As a rule, this is acceptable, since on my machine we are still dealing with millions of operations per second. However, in the case of a loaded cycle that operates with millions of elements, a drop in performance can be noticeable.

However, there is one case where _ _ string formatting, and not concatenation, even in an environment with high performance requirements — I'm talking about debug logging. Consider two challenges occurring in this context:

 logger.debug("the value is: " + x); logger.debug("the value is: %d", x);

The second case (which at first glance may seem illogical) in production, it happens, works faster. Since it is unlikely that logging of debugging information will be enabled on your production servers, in the first case the program selects a new line, which is then not used (since the log is not output). In the second case, you need to load a constant string, after which the formatting step is skipped.

 // ~1 300 000 / public String stringFormat() { String foo = "foo"; String formattedString = String.format("%s = %d", foo, 2); return formattedString; } // ~115 000 000 / public String stringConcat() { String foo = "foo"; String concattedString = foo + " = " + 2; return concattedString; }

Not using row builder inside a loop

If you do not use the string builder inside the loop, then the code performance drops dramatically. In a simplified implementation, we would increase the line inside the loop using the += operator, thus attaching the new part of the line to the existing one. The problem with this approach is that with each iteration of the loop a new line will be allocated, and the old line at each iteration will have to be copied to a new one. Even by itself, this operation is costly, not to mention the extra load associated with the additional garbage collection required to create and discard such a number of rows. Using StringBuilder , we limit the number of memory allocations, which will allow us to greatly improve performance. In my tests, it was thus possible to speed up the program more than 500 times. If, when creating a row builder, you can, at a minimum, confidently assume what size the resulting string will be, you can speed up the code by another 10% by specifying the correct size in advance (in this case, you will not have to recalculate the size of the internal buffer and get rid of selection operations copy).

Also note that (almost) I always use StringBuilder , not StringBuffer . StringBuffer designed to work in multi-threaded environments and that is why it is equipped with internal synchronization. Costs for such synchronization have to be borne even in a single-threaded environment. If you need to grow a string with data coming from many streams (for example, in an implementation with journaling), this is one of the few situations where you should use StringBuffer , and not StringBuilder .

 // ~11    public String stringAppendLoop() { String s = ""; for (int i = 0;i < 10_000;i++) { if (s.length() > 0) s += ", "; s += "bar"; } return s; } // ~7 000    public String stringAppendBuilderLoop() { StringBuilder sb = new StringBuilder(); for (int i = 0;i < 10_000;i++) { if (sb.length() > 0) sb.append(", "); sb.append("bar"); } return sb.toString(); }

Using row builder outside of loop

I came across recommendations on the Internet to use row builder outside the loop — and this even seems appropriate. However, my experiments have shown that in fact the code runs three times slower than with += - even if the StringBuilder is out of the loop. Although += in this context turns into calls to StringBuilder made by javac , the code is much faster than by using StringBuilder directly, which surprised me.

If anyone has a version, why this happens - please share in the comments.

 // ~20 000 000    public String stringAppend() { String s = "foo"; s += ", bar"; s += ", baz"; s += ", qux"; s += ", bar"; s += ", bar"; s += ", bar"; s += ", bar"; s += ", bar"; s += ", bar"; s += ", baz"; s += ", qux"; s += ", baz"; s += ", qux"; s += ", baz"; s += ", qux"; s += ", baz"; s += ", qux"; s += ", baz"; s += ", qux"; s += ", baz"; s += ", qux"; return s; } // ~7 000 000    public String stringAppendBuilder() { StringBuilder sb = new StringBuilder(); sb.append("foo"); sb.append(", bar"); sb.append(", bar"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); sb.append(", baz"); sb.append(", qux"); return sb.toString(); }

So, the creation of lines is associated with obvious costs, so in the cycles it is necessary to avoid such practices whenever possible. This is easy to achieve - just use StringBuilder inside the loop.

I hope you find the tips on optimizing Java code outlined here useful. Once again, in most contexts, the techniques described here will not be useful to you. It makes no difference how many times per second you will have time to format a line — a million times or 80 million times, if you need to do just a few such operations.

But in those critical cases where we can really talk about millions of such operations, eighty-fold code acceleration can save you a lot of time.

Having written this article, I collected a zip-archive with all the data mentioned here, and below I give a conclusion after checking all the control points. All results were obtained on a PC with i5-6500. The code started with JDK 1.8.0_144, VM 25.144-b01 on Windows 10

All code can be downloaded here on GitHub .

Source: https://habr.com/ru/post/358898/

All Articles