Looking for Performance: Monitoring JVM performance under Linux with BPF

Sasha Goldstein, a specialist in low-level application optimization, as part of his report at the JPoint, will deviate a little from the usual .NET topic and talk about tools that help fight for Java application performance under Linux. What is this tool, who needs it and why, we decided to find out in advance and interviewed Sasha.

JUG.Ru Group: Tell us a few words about yourself and your work?

Sasha Goldstein: My name is Sasha Goldstein, for the last 10 years I have been working for the Israeli consulting company Sela as a CTO.
My work focuses on performance optimization, production diagnostics, monitoring, and all sorts of low-level tasks.
My typical work week is filled with a variety of tasks: I teach, correct mistakes or performance problems for clients, and also work on internal projects. I also enter the program committee of a couple of conferences: our own SDP (Tel Aviv, Israel), as well as DotNext (Moscow and St. Petersburg, Russia), which surprisingly takes a lot of time.
')

“The performance of most applications is not determined by hardware or the runtime environment” - Sasha Goldshtein about monitoring Java performance under Linux

JUG.Ru Group: You usually talk a lot about .NET performance. What pushed you towards Java?

Sasha Goldstein: Indeed, most of my work is related to C # and C ++ under Windows. I spent a lot of time optimizing and solving the identified .NET performance problems. However, in the work on low-level optimization and debugging within the framework of different technologies, common elements can be traced: tools can have different names, but the general principles, methodology and thought process are the same. In the past couple of years, I became closely acquainted with BPF, a Linux trace framework, and this led me to the idea of using BPF to analyze JVM performance.

JUG.Ru Group: What are the peculiarities of the struggle for Java performance against the background of .NET?

Sasha Goldstein: As I said, many things are identical. The performance of most applications is not determined by hardware or runtime (JVM, CLR, Python, or something else), but by the environment: features of database access, disk search speed and network request processing. For this class of applications, by and large, it does not matter what runtime environment you use. When it comes to low-level optimization, for example, minimizing memory consumption, optimizing individual algorithms, processor speed (CPU-bound), and the like, there are situations in which the difference between the platforms really matters, especially if you need to configure execution environment for your application. In general, the JVM is more flexible than the CLR; and, it seems to me, in recent years more effort has been invested specifically in optimizing various JVM implementations than in the Microsoft CLR.

JUG.Ru Group: When is the struggle for productivity really required, after all this task is “expensive” in terms of time costs? What factors clearly indicate that there are problems with performance?

Sasha Goldstein: Often, performance is not a functional indicator that needs to be achieved. But even if you are not building real-time systems or super-fast client applications, there are probably some minimal (reasonable) speed limits that your users will not be ready to cross. For example, a web API that takes 5 seconds to process a login request is likely to piss people off. There is also the question of cost: performance optimization usually means that you need less hardware resources, which means direct, direct cost savings, given the cloud-first policy adopted by many.
It is hoped that most people will have a process for setting performance targets and at least the simplest way to monitor and verify these indicators as the development process progresses.

JUG.Ru Group: How to start researching performance problems?

Sasha Goldstein: The crucial point is to have a good description of the system, for example, a functional block diagram. When you understand, relatively speaking, the “mechanics of work”: what are the main components and how they are interconnected, it is much easier to guess where to look for bottlenecks, as well as much easier to understand where to start looking for a problem. Tools are secondary. Before you run a bunch of tools, you need to understand what the various resources are, how they can be overloaded, and how to test the proposed hypotheses to make progress. For example, you can spend days optimizing CPU performance when executing some sorting algorithm, but after that you find that it takes 99% of the time to request data from the database, so more or less efficient sorting does not contribute to the total execution time.

JUG.Ru Group: Can you talk about the main features of the toolkit using the example of BPF?

Sasha Goldstein: BPF is a powerful kernel engine, introduced in the latest versions of Linux kernels and allows for the introduction of dynamic trace programs into the kernel. These programs are controlled safely and cannot lead to a system crash, nor do they require compiling and loading kernel modules. As a result, we have a trace framework that can work very close to the source of major events, in particular, to the processing of network packets, sending requests to the disk, processing hardware interrupts, and similar. Anticipating your question, I note that there are also some JVM-specific events that I will consider as part of the report at the JPoint : garbage collection, distribution of objects, blocking for the release of the monitor, and many others.
Moreover, BPF allows you to create tools in which aggregation occurs at the tracer level — for example, if you are worried about a histogram of delays (for example, HTTP request delays), you do not need to dump a million events, and then post-process to calculate the histogram. Instead, your BPF program provides aggregation in real time and gives only the final result for analysis.
There is a very powerful toolkit that is developed by people from Facebook, Netflix, Plumgrid (VMWare) and other companies (including with my modest participation :-)).

JUG.Ru Group: How difficult is it to implement in the workflow and development?

Sasha Goldstein: BPF is not difficult to use, because there are a lot of tools called by just one command line that can be used to identify performance problems. For example, there is a tool called mysqld_slower that outputs slow MySQL queries.
The only problem is that you need to install a new Linux kernel in order to use BPF tools. Most of the functionality was included in Linux 4.1 and 4.4 (which you have in Ubuntu 16.04, for example), but other functions require even newer versions, in particular, 4.9, which most people don't have in production yet. This of course can be circumvented by updating only the kernel, thanks to this approach of the company, such as Facebook, Netflix and others, got all the benefits of BPF.

JUG.Ru Group: Is it possible to give an example of “typical rakes” in working with performance, which BPF-based toolkits can fight with?

Sasha Goldstein: BPF tools are useful for diagnosing applications that are limited by processor capabilities, blocking time (locking, I / O), file access problems, slow database queries, network queries, garbage collection — in fact, a very wide range of problems. I will consider many of them in my report.

JUG.Ru Group: Are there any tasks that only this toolkit allows you to deal with?

Sasha Goldstein: Yes. When you need to process a large number of events with a tracer, BPF is irreplaceable. Even fairly simple scripts, such as CPU profiling, can be made much more efficient by using BPF profiling support. In most cases, solving problems, such as processing each incoming request and aggregating delay information, is not practical with other performance analysis tools.
In my report, we will look at blocking monitoring, DNS resolution, MySQL queries and a bunch of other problems that can be called typical for production systems.

JUG.Ru Group: Your report is more practical. Who is he primarily focused on?

Sasha Goldstein: My report is intended for developers and operating engineers (Ops Engineer), developing software for Linux. The focus will be on the JVM (because this is a JPoint!), So all the examples will be in Java. We’ll look at a bunch of examples that I hope will be useful for diagnosing problems with their own systems — and even if you don’t have a fresh enough version of the Linux kernel today, it will appear in the very near future. I think every Linux developer will one day find uses for BPF tools.

~~If you have questions, suggestions or comments - ask, Sasha is ready to answer them in the comments.~~

~~PS In addition to Sasha, at JPoint 2017 Alexey @shipilev Shipilev, Sergey Walrus Kuksenko, Vladimir vladimirsitnikov Sitnikov and Nikolai xpinjection Alimenkov will tell about the performance.~~ ~~What exactly?~~ ~~See the list of reports .~~

~~And if you live in Siberia and you don’t get to Moscow, we recommend that you look to JBreak 2017 .~~

UPD. On the fifth of November in St. Petersburg we do training with Sasha - “Profiling JVM Applications in Production”, registration and participation conditions are on the site .

Source: https://habr.com/ru/post/320620/

All Articles

Looking for Performance: Monitoring JVM performance under Linux with BPF

More articles: