Size of java objects. We use the knowledge gained

In the previous article, many commentators disagreed on the need for knowledge of the size of objects in java. I strongly disagree with this opinion and therefore have prepared several practical techniques that can potentially be useful for optimization in your application. I want to immediately note that not all of these techniques can be applied immediately during development. To give more drama, all calculations and figures will be given for 64-bit HotSpot JVM.

Denormalization model

So let's consider the following code:

class Cursor { String icon; Position pos; Cursor(String icon, int x, int y) { this.icon = icon; this.pos = new Position(x, y); } } class Position { int x; int y; Position(int x, int y) { this.x = x; this.y = y; } }

And now we will denormalize:

 class Cursor2 { String icon; int x; int y; Cursor2(String icon, int x, int y) { this.icon = icon; this.x = x; this.y = y; } }

It would seem - got rid of the composition and all. But no. An object of the Cursor2 class consumes approximately 30% less memory than an object of the Cursor class (in fact Cursor + Position). This is not an obvious consequence of decomposition. At the expense of the link and the header of the extra object Perhaps this does not seem important and ridiculous, but only as long as you have few objects, and when the bill goes to millions, the situation changes drastically. This is not a call for the creation of huge classes of 100 fields. In no case. This can be useful only if you have come close to the upper boundary of your RAM and you have many similar objects in memory.

We use the offset in our favor

Suppose we have 2 classes:

 class A { int a; } class B { int a; int b; }

Class A and B objects consume the same amount of memory. Here you can immediately draw 3 conclusions:

Sometimes situations arise when you think - “is it worth adding another field to a class or saving and calculating it later on the move?”. Sometimes it’s stupid to sacrifice processor time to save memory, given that there can be no savings at all.
Sometimes we can add a field without wasting memory, and store additional or intermediate data in the field for calculations or caches (for example, the hash field in the String class).
Sometimes there is no point in using byte instead of int, since due to the alignment, the difference can still be leveled.

Primitives and shells

Once again. But if in your class the field should not or cannot accept null values, safely use primitives. Because it is very often something like:

 class A { @NotNull private Boolean isNew; @NotNull private Integer year; }

Remember, primitives on average occupy 4 times less memory. Replacing an Integer field with an int will save 16 bytes of memory per object. And replacing one Long with a long is 20 bytes. It also reduces the load on the garbage collector. Generally a lot of advantages. The only price is the absence of null values. And then, in some situations, if the memory is very much needed, you can use certain values as null values. But this may entail additional. the costs of revising the application logic.
')

Boolean and boolean

Separately, I would like to highlight these two types. The thing is that these are the most mysterious types in java. Since their size is not defined by the specification, the size of the logical type is completely dependent on your JVM. As for the Oracle HotSpot JVM, all of them have 4 bytes for the logical type, that is, the same number as for int. For storing 1 bit of information you pay 31 bits in the case of boolean. If we talk about the boolean array, most compilers do some optimization, and in this case, boolean will occupy a byte per value (well, don’t forget about BitSet).
And finally - do not use the type Boolean. It is hard for me to think of a situation where it may actually be required. It is much cheaper from the point of view of memory and easier from the point of view of business logic to use a primitive that would accept 2 possible values, and not 3, as is the case in Boolean.

Serialization and deserialization

Suppose you have a serialized model of the application and it takes 1 GB on the disk. And you have a task to restore this model in memory - simply deserialize. You must be prepared for the fact that, depending on the structure of the model, it will take from 2GB to 5GB in memory. Yes, yes, all again because of the same headers, offsets and links. Therefore, it may sometimes be useful to contain large amounts of data in resource files. But this, of course, very much depends on the situation and this is not always the way out, and sometimes it is simply impossible.

Order matters

Suppose we have two arrays:

 Object[2][1000] Object[1000][2]

It would seem - no difference. But in reality this is not so ... From the point of view of memory consumption, the difference is enormous. In the first case, we have 2 references to an array of thousands of elements. In the second case, we have a thousand references to arrays with two elements! In terms of memory, in the second case, the amount of memory consumed is 998 more link sizes. And this is about 7kb. So, out of the blue you can lose a lot of memory.

Link Compression

It is possible to reduce the memory that is used by references, headers and offsets in java objects. The fact is that a long time ago when migrating from 32-bit architectures to 64-bit architectures, many administrators, and just developers, noticed a drop in the performance of virtual java machines. Moreover, the memory consumed by their applications during migration increased by 20-50%, depending on the structure of their business model. Which, naturally, could not upset them. The reasons for the migration are obvious - applications no longer fit in the available address space of 32-bit architectures. Who does not know - in 32-bit systems, the size of a pointer to a memory cell (1 byte) takes 32 bits. Therefore, the maximum available memory that 32-bit pointers can use is 2 ^ 32 = 4294967296 bytes or 4 GB. But for real-life applications, the 4 GB volume is unattainable due to the fact that part of the address space is used for installed peripherals, for example, video cards.
The developers of java did not lose their head and such a notion as link compression appeared. Usually, the link size in java is the same as in the native system. That is 64 bits for 64 bit architectures. This means that in fact we can refer to 2 ^ 64 objects. But such a huge number of pointers unnecessarily. Therefore, the developers of virtual machines decided to save on the size of the links and introduced the -XX option: + UseCompressedOops. This option allowed to reduce the size of the pointer in 64-bit JVM to 32 bits. What does this give us?

All objects that have a link now occupy 4 bytes less for each link.
The header of each object is reduced to 4 bytes.
In some situations, reduced alignments are possible.
Significantly reduced memory consumption.

But two small minuses appear:

The number of possible objects rests on 2 ^ 32. This item is hardly a minus. Agree, 4 billion objects are very, very much. And considering that the minimum object size is 16 bytes ...
Appear add. the cost of converting JVM links to native and back. It is doubtful that these costs can somehow have a real impact on performance, given that these are literally 2 register operations: shift and summation. Details can be found here.

I am sure that many of you have a question, if the UseCompressedOops option has so many advantages and almost no disadvantages, then why is it not enabled by default? In fact, starting with JDK 6 update 23, it is enabled by default, as well as in JDK 7. It first appeared in update 6p.

Conclusion

I hope I managed to convince you. I happened to see some of these techniques on real projects. And remember, as Donald Knuth used to say, premature optimization is the root of all ills.

Source: https://habr.com/ru/post/136883/

All Articles