“We can test Java better than Oracle” - an interview with Andrey Pangin from Odnoklassniki

Today I prepared for you a large interview with Andrei Pangin aka apangin , Odnoklassniki’s lead engineer. Andrey has been working as a JVM-engineer at Sun Microsystems for more than 6 years, including on the HotSpot team, and has been working for Odnoklassniki for the last 5 years, solving issues related to JVM and performance. So Andrey is rightfully considered one of the strongest JVM players in Russia.

Andrey is an expert in system programming, engaged in storage systems and information transfer systems. He put down the bricks that form the basis of the Odnoklassniki portal and provide reliability and speed of services.
')
This is what we talked about with Andrei:

what is the cost of moving from Java 7 to Java 8;
what happens to sun.misc.Unsafe;
Odnoklassniki architecture;
engineering tradeoffs, sharding and GC;
data storage systems and Cassandra;
what are the classmates ahead of the rest, and what else should you learn from Google;
how to become a cool system programmer.

(I know that the beginning has been delayed. We will work on swinging faster and getting into the topic.)

For those who once again have no time to watch the video, under the cut decrypt interview.

About moving from Java 7 to Java 8

- Working in the depths of the Sun, you looked at Java from the inside, and going to Odnoklassniki, began to look from the client side. Have you seen (as an application developer) Java 6, 7, 8. Your feeling: has it become better, worse? Do you have, for example, the feeling that JDK 8 is a product that is clearly better than JDK 7, or maybe vice versa?

- There are bugs in all versions, they are already used to this. It is clear that the JVM is a very complex system. And sometimes it is even impossible to predict in advance how the JIT compiler will work in a particular scenario. The most important thing is that we have 99% of the portal working on it all, and there have not been any major falls since 2013. Therefore, we are satisfied with Java.

- And what percentage of Odnoklassniki, if not secret, works on Java 7, and what percentage on Java 8?

- We still have systems even in Java 6. Fortunately, there are very few of them. There are some services that have not been restarted for years. What is the point to upgrade them? Work and work. And so, everything that we are launching now is new, we immediately start on JDK 8. Those services, those services that are often deployed, have already almost all moved to JDK 8. Now, probably, about 60 percent in the G8 are working. And every week something new translates.

- What are the difficulties when migrating from Java 7 to Java 8?

- Surprisingly, we saw fewer problems when switching from Java 7 to Java 8 than when I migrated from Java 6 to Java 7. Some things worked differently there: for example, sorting started throwing exceptions where it wasn’t , there were changes in Unicode support, etc.

When switching to Java 8, there were no such problems. But we ran into bugs in the JIT compiler.

In general, sometimes it seems that if there is some kind of bug, then we will definitely attack it: apparently, we have so many servers and such loads that we can better test Java than Oracle itself.

For example, now we have disabled in our production a multi-level compilation that the guys from Oracle so diligently did: the C1 compiler has a critical bug, so we immediately compile in C2.

- Has this bug already been fixed?

- Yes, it seems like in JDK 8u60 they solved the problem. We must pay tribute, in the JIT compiler, the guys from Oracle rather quickly clean up the problems. But there were serious vulnerabilities: you could write the simplest Java code that would drop the entire virtual machine with a crash. By the way, in version 8u40, two bugs were fixed in the compiler, which were fixed by me.

About virtualization, Unsafe and Odnoklassniki

- How big is the Odnoklassniki project as a system? How many subsystems are there, layers?

- Lots of. There are about two hundred modules, and the largest ones have ten thousand Java class files. You can find comments in the source, dated even 2004 year. But, as you know, Classmates began on C #, and then a team of several people literally rewrote everything in a couple of months of tense days and sleepless nights. That is, from the previous version of Odnoklassniki nothing is left.

- Now Odnoklassniki is a highly loaded Java project, the largest Java project in Russia. Are there many similar projects in the world?

“I know that by traffic we are entering the world Top-100 according to Alexa. And in Russia now is the seventh, as I recall. Probably from Java projects we are one of the largest.

- VKontakte and Facebook, which had a lot of PHP code in large parts, released their virtual machines for PHP. Where does this trend for virtualization come from?

- Virtualization is convenient! In a broad sense, it gives you work with the layer you are used to. The final developer should not care about which microinstructions of the architecture this pours out and how the processor is processed.

- I remember in 2012 you talked a lot about Unsafe. And now, probably, only lazy does not speak about Unsafe in the Java-party. I want to blame you for promoting unsafe programming techniques.

- I do not promote them - I use them! Because not all that we need is in Java.

Java was originally conceived as a hardware-independent platform: I wrote it once, launch it where you want. But here we are running Odnoklassniki on 64-bit Linux, on Intel's processors: and we don’t need such portability of Java, but we want to use as much as possible the capabilities of the operating system and the hardware on which we run. And for this you have to break through the loopholes deep into the operating system.

Here, offhand, an example. We have a lot of caches - in general, out of 8,000 servers that Odnoklassniki has, almost half are all sorts of caches. And then when the snapshots are written, the data from the RAM is thrown onto the disk. If this is done purely by Java, wildly increases the consumption of RAM. And only a Linux-specific hint can tell her: “Don't, don't cache this data, we won't need it in the near future.”

- It can not be done just setting up Linux'a? I just didn’t quite understand why to climb it into the application, why is this pen on its side?

- For Linux, everything is the same: that Java files are written, that other processes are in the system. The programmer himself must distinguish what he wants to cache, and this is not. Therefore, the initiative must come from the application.

About high loads

- Which parts of Odnoklassniki are the most heavily loaded?

- One of the most loaded is the Tape service, the page that opens as a news feed when a user visits the main page. Here, for each data is collected from different sources.

Another example is the instant delivery of push alerts about friendships, news, messages, gifts. There on one server accounts for up to 50,000 requests per second.

Part of the push notifications is actually implemented for long polling - the client side has long http requests that hang for either 10 minutes or until data arrives. And on the side of the backend there is a separate service, which at each moment of time knows where, on which frontend or frontend, which user is sitting. That is, a person can have several clients open: on a mobile device, on the web version of the portal. There are about a thousand of such machines for you to understand.

- It turns out that this is a significant percentage of all eight thousand servers. Why not use WebSocket'y, for example?

- unlike WebSocket's, Long polling is supported almost everywhere. Of course, there are outdated, ancient browsers. Internet Explorer 8, for example. But from him we soon want to give up.

About normalization and denormalization

“In the Enterprise world, there are usually a lot of Java EE, a lot of Hibernate, a Spring, and so on. Do you have any technology?

- One of our largest modules, which is responsible for most of the business logic, is traditionally called odnoklassniki-ejb. But in fact, today from EJB there is absolutely nothing left. I set myself this task last year - to cut EJB from the project. We had a three-day hackathon, where developers could choose for themselves any project for which there usually is not enough time. And so I decided to completely get rid of Enterprise in our main module. And now this is a regular Java application. Like this.

And about Spring - Spring we have a lot.

- And what about Hibernate?

- No, no, not him! We need a very precise, complete control over what requests and how we execute. We have part of the systems, part of the storage, traditionally remained on the Microsoft SQL Server, but now we are moving more and more towards NoSQL solutions.

In order for SQL databases to cope with the load, it is necessary to abandon a number of possibilities. In particular, we do not use joins, we do not use triggers, stored procedures.

- That is, students are taught to "normalize", then they come to you, and you say "denormalize"?

- Something like that, yes. Of course, where there is an opportunity to transfer the load from the SQL servers to the business logic server, we transfer it: simply because we pay for the processor power that our SQL servers run on.

- Why? And what's wrong with join?

“Our bases are distributed and partrated.” How will you join if you have 1/16 of the data on each server?

- Are modern sharding systems working poorly? It would seem, surely some good Enterprise ...

- Everything that is called an enterprise, in fact, does not work on our loads. Yes, it shows itself well in the banking sector, somewhere else, where the load is lower, but more serious requirements for reliability. We are also satisfied with eventual consistency in many systems: what difference does it make, for example, your friend sees a second sooner or later that you changed your avatar?

About engineering compromises, sharding and a good garbage collector

- In general, I really like the topic of trade-offs. Let's talk about it? What are their classmates? I understand that performance is at the forefront.

“Anyway, even if you're a front-end developer, you have to think about performance.”
Here come two lists, you need to somehow combine them into one.

Yes, it is easier to write the simplest quadratic algorithm. But when you know that your service will pick up popularity later, it will grow, the lists will come in 10 thousand and 100 thousand elements each ... you start to think. After all, the quadratic algorithm will slow down.

- But other teams probably have other priorities: performance, for example? And, again, the fact that there are no joines: if the data is denormalized, then a lot of disk space is spent?

- Where how. This, again, is trade-off, as you say. If we expect that we will occupy a little more space, if we denormalize and store it right in the fields of the same table. Or we do join on the business logic side. We get the ID, and then climb on them to other subsystems.

I like to give a simple and clear example: how to get a list of all your friends - with names, avatars and other things? First, a request is made to a separate subsystem of the relationship graph, and your ID gets your friends ID. The link graph no longer possesses information other than ID. Having received an array of IDs, we are already fulfilling the second request for user caches, which retrieve information about the user just by this ID.

This data, of course, shardyatsya. In our remoting module, we have implemented a kind of MapReduce, that is, the system itself can distribute requests by shards, execute them in parallel, and then collect them. The keys know which shard they are stored on.

- In the scenario described by you, we made two requests at least, but everything must somehow run very quickly inside. This is a call to the internal network, and the network means many jumps at once, because one subsystem refers to another ... In such a call chain there will be many network requests!

- Of course. But, in fact, the average waiting period for an Ajax request is 5 milliseconds. This is not counting the channel that goes from the user to our frontends. That is, this time inside the portal. It is clear that 5 milliseconds is the average for the hospital. There are requests and 50 milliseconds, there are very short.

- That is, what percentile we consider? Ninetieth?

- Ninetieth. So that you understand, one remote request to the server in the same data center takes 300 microseconds to go to us, and to the server to another in one millisecond. It is clear that garbage collection happens: there can be 100 and 200 milliseconds. But they are rare. We are struggling with long GC-pauses. One second pause is already critical for most of our system.

- Which GC is used?

- For the most part, well-rejected CMS. It generally works better than even Garbage-First. But we also use G1 - for example, in our NewSQL self-written solution, which came to replace the SQL server, in order to guarantee the required response time.

That is, on average, the G1 works a little worse than the plugged CMS, but at the same time it gives more guarantees. For CMS, say, 50 milliseconds, on average, there is a delay, but sometimes there shoots up to 300–400 milliseconds. And G1 collects, maybe more often, and it has an average pause of up to 150 milliseconds, but he was given a limit of 200 milliseconds, and he tries to really withstand it.

- Approximately imagining what Java is, I understand that if we command: “Please, G1, pause 200 milliseconds,” then in reality there will not be 200 milliseconds. More precisely, it will, but with some probability. How often garbage collections get out of these conditional 200 milliseconds?

- Very rarely. To the surprise. Starting with JDK 8u40, the G1 has become a good collector. If earlier it could not even unload unused classes without a complete build, then from this version it is already quite a production quality.

- The JDK9 Garbage-First will be the default collector. JDK9 comes out a year later, in September 2016. Will you be switching to Java 9?

- Most likely, we will, in order to receive updates, when the support of the G8 will cease. Actually it happened with Java 7: we started to move with it, because we want to receive new updates. I myself faced with the fact that we had to patch our version of JDK7 in order to take back quite critical fixes in the JIT compiler, which were fixed only in JDK8.

- Why not take some conditional Azul, which made most of the business that they are engaged in backports and support of the old Java?

- To give an opportunity to earn Azul at the expense of us? :) What's the point? Now we have experts who deal with it.

Java performance

- For a long time there was no news about performance breakthroughs in JDK, JRE. Did you have such that after switching to any version, the performance increased significantly?

- Such that simply with the replacement version - no, it was not.

- And what is the reason? What are the chances that some cool optimization in JIT will be released tomorrow, which will give + 20% performance?

- This probability is close to zero. Although interesting optimizations do happen, recently vectorization was done in cycles.

- Addition and multiplication by integers? It would seem a simple thing ...

- Yes, we are talking about integer operations with an array in a loop. Even now it does not work in all cases. But at least there is already such an optimization. I mean, now all these optimizations give percentages of a few, fractions of percent. There is no question of any sudden jumps.

- A question about security. At some point, until 2012, the Java motto was Compatibility-First. Since 2012, it has become Security-First. So you, as a portal, have felt it somehow?

- First of all, for us the problem is the bugs that lead to the crashes of the JVM. Otherwise, there has never been a case when we were hacked, knowing some Java vulnerabilities. It is much easier to find some kind of hole in the API.

About Storages and Cassandra

- Let's move away from Java and talk about storage systems. How does it work for you? What file systems are used?

- A lot of different: and samopisnyh, and standard. Today, probably most of all Cassandra-based repositories.

- Why Cassandra?

- Others simply do not work: there is no filelover, the ability to replicate almost manually. And replication is an important business requirement. Now we are striving to ensure that the entire functionality of the portal works entirely even if one of the data centers is completely abandoned.

- And how many data centers?

- Three. And if one flew out, users will still be able to log in, but some services may not be available. We began work on ensuring reliability with key services, and now we are cleaning up tails.

- Tell me about the process of transition to a distributed system with three data centers, please. How does it work, who does it?

- The libraries, storages and tools themselves are made by the platform team. For example, for heavy content — photos, videos, music — samopisnye repositories are used. Recently, colleagues from EMC came to our office to visit us - they are known for their information storage solutions. They told about their decisions, shared their experience. But as it turned out, they can not offer us anything new compared to what we already have.

- Yes, EMC are interesting guys. And what, by the way, in your opinion, the technical expertise of Odnoklassniki? Is there a world-class experience?

- The first thing that comes to mind is Cassandra. We are contributing to it, and we are constantly ahead compared to what is in the master branch of Cassandra now. Global indices, which are just being made there, are already in place and are being used.

We also have a strong expertise in recommendation systems. Initially, we had a recommendation system for music, now it is already bolted to many other services: video, groups. The main task of the portal is to show people the content they are interested in and not show the one that is not interesting to them. This encourages the user to hang out at Odnoklassniki for longer.

Well, we have a strong JVM expertise, of course.

- And where do you want the expertise, what is missing today, in your opinion?

- Maybe in terms of image recognition. There is something to learn from Google.

- As I understand it, there is a certain team of the platform that does everything related to storage, distribution, and caches. And you present your work to other teams in the form of libraries, APIs, services, right?

- Exactly. We give ready-made solutions. But the developer from the other team, of course, should know some peculiarities: to understand which requests are light, which are heavy, what can be done, what cannot.

- Do you describe it to him in the form of javadoc or somehow transfer this knowledge?

- There is such a practice when the authors of these decisions hold lectures and seminars. Although not very often: mainly for beginners, but also experienced developers, as a rule, can learn something new.

And we still have a code review procedure: if a person took my decision and built something on the basis of it, then he will most likely come to me later and ask: “Andrei, is everything okay?” If anything, I will point out where and what to fix.

- And what are you doing to ensure that your API cannot be used incorrectly?

- Personally, I am not doing very well in this regard: I just try to make the code as concise and simple as possible. And if you use it incorrectly - you are “angry spiteful Buratino” to yourself - go and look at the documentation or the lecture notes to figure out how to do it.

- But the documentation is, in principle, up to date?

- Yes, I am very glad that I once wrote a detailed FAQ on our remouting and serialization.

We simply use our own system: we switched to it from JBoss Remoting in time to be able to make online updates without downtime services, and resolve situations when a version with new versions of classes works in one part of the portal, and with old versions on the other. .

And until now, when people come to me and ask: “How will it be serialized so?”, “Is it possible to do such a transformation?” I just send it to a page in our internal Wiki.

- Does JBoss Remoting not know how? Or copes worse?

“It's even worse than standard serialization.” It supports some changes, but to a limited extent: there you can delete fields, you can add new ones, but they will be initialized with the default value.

But, let's say, change the type from int to long. There was a field of flags, there was a shortage of 32 - changed to a 64-bit field. This is a typical case, however, standard serialization does not support it.

How to become a cool system programmer?

- What do you advise to read, where to look to a person who is interested in the Java platform itself and its low-level details?

- About JVM and the insides of HotSpot is not much written in some third-party sources. Fortunately, this is an OpenSource-product: you can download, watch. Sometimes in the HotSpot code there are more comments than the code itself.

- This is C ++ mostly, huh?

- Yes. Well, there are general concepts of how JVM virtual machines work. Here I advise you to see an excellent series of lectures by Oleg Pliss on the St. Petersburg JUG.

- Maybe you will advise some more books?

- I will not recommend books. Better to ask questions on StackOverflow. I now sit there too, and you can ask me.

I signed up for JVM and Java Performance related issues, and occasionally answer non-trivial questions. And sometimes I myself learn something new from there.

There, of course, is full of inadequate questions and answers, but there are different mechanisms for this, the reputation is the same. But there you can meet the world's leading experts: from time to time Brian Goetz himself looks in there and comments on something.

- And Habr? Now, I see you rarely write there.

- It seems to me that the quality of content on Habré is slowly falling ... And I’m not only talking about the Java hub.

First, it affects the fact that Habr was divided into several subprojects. Accordingly, the audience of each has decreased. Secondly, when some kind of inadequacy can get you in one comment, of course, the motivation to write further disappears.

- Your last article is about how deadlock arises during the parallel work of the classeleaders. Do you think that the author of this code from Google should have fixed it, or is it “a bug and a bug, everyone has bugs”?

- They fixed it, and we must pay tribute, quite quickly. He just belongs to this class of bugs that are completely unobvious. You just can't find it, looking at the code.

- Is this static analyzer can not track? It would seem that the simple rule is that this class is a superclass, and this subclass is quite easily done using static analysis. In principle, this is a rule that anyone can catch.

- I agree. Just there was no such rule until now. Our mutual acquaintance lany has already promised to add the corresponding warning-s to FindBugs.

“But creative plans — reports, articles, books — do you still have?”

- There are many ideas, I want to write about many things, but there is not enough time. There are lots of interesting things about the insides of the JIT compiler, how it behaves in completely unpredictable ways, and for which threads you can pull it.

I also want to write how to work with signals in Linux. Traditionally, the signal is intercepted only in case of any errors, but here the HotSpot JVM uses for its internal purposes a lot of interesting points related to the signals. This is quite a good way for system software. Such as Java Runtime, for example.

In conclusion, as always, useful links:

Source: https://habr.com/ru/post/259415/

All Articles